Recipe for comparing NER model and manual annotation

I have a saved dataset from ner.manual, and then I trained a spaCy NER model based on it. Now I would like to compare the saved annotation with the model prediction on the same text and then modify the saved annotation based on resolution on the delta between the two (to hopefully improve the quality of the saved dataset). The closest built-in recipe I could find is ner.eval-ab but that is to compare two models whereas here I want to compare a single model with a saved annotated dataset. If I need to write my own recipe for this, do you have any tips for me to get started? Thanks!

Hi! This is definitely a good idea and workflow :+1: There's currently no exact recipe for this in Prodigy, but you could definitely write our own. One workaround that's a clever hack and should work: the review recipe lets you view multiple annotations created on the same input data in one UI and create one final decision of the correct answer. So this is pretty much exactly what you want to do.

So you could run your trained model over your raw data, save its predictions to a new Prodigy dataset and then run review with your annotations + your auto-generated model dataset. You can either generate the JSON programmatically based on the model outputs, or just run ner.correct with your trained model and accept everything (if your dataset isn't too large, this might actually be easiest and it ensures the format is correct and ready to use with review).

When you run review, both versions of the same example will then be merged and you'll be able to compare them, and create a final corrected version at the top. By default, Prodigy will auto-highlight the version that occurs most frequently in the annotations, which won't really matter in your case, since you'll always have two. If there's a tie, the first version (first dataset you specified on the CLI) will be used, so you can use the order of datasets to decide which annotations are pre-highlighted in the UI (model predictions or your original data).

Thanks for the detailed information! I followed your advice and saved the model predictions into a separate Prodigy dataset (I tried both using Python script and using ner.correct accepting all, and both led to the same output), and the ran

prodigy review reviewed_dataset manual_dataset,pred_dataset ...

and it's still not providing me exactly what I was looking for. First I ran the above command with

--view-id ner

and it was showing annotated sample from one input dataset at a time with same sample being shown consecutively: sample 1 from manual_dataset, sample 1 from pred_dataset, sample 2 from manual_dataset, sample 2 from pred_dataset, etc.

Then I tried with

--view-id ner_manual

and it was showing annotation from both input datasets for a given sample on the same page. This is what I want, but it's not highlighting the annotations with conflicts between the two datasets which is what I am mainly looking for. The page has 3 text boxes arranged vertically, where the top one looks like the normal ner recipe output and I am guessing it's the one picked by the review recipe, and the two text boxes below are a bit greyed out with smaller fonts, and they appear are for the annotations from individual input datasets for that sample.

I hope I described clearly and would like to check if this is what is expected from the review recipe or if I mis-used the recipe somehow. Thanks again!

Ah yes, that's the expected output of review: it will show you all available versions and then one editable example at the top. The different version are shown separately and greyed out, so you can compare them. What exactly were you looking for in terms of visualisation? It can sometimes be difficult to visualise conflicting annotations in the same example, because they may obverlap.

If you want to see both annotations together in one example with potential overlaps, one alternative approach could be to use the new spans_manual UI introduced in the upcoming version (currently available as a nightly prerelease): ✨ Prodigy nightly: spaCy v3 support, UI for overlapping spans & more

In that case, you could create a single example with different labels for each input source (model, data), e.g. ORG_MODEL and ORG_DATA etc. Those spans would then all be highlighted in the same UI, and you could use different colours for them in your theme so you can tell them apart. And you could then use regular labels like ORG to add your own annotations, if they differ. At the end, you'd just need a postprocess that normalises all the final labels so they're all ORG, e.g. via a before_db callback. This could potentially be a nice built-in workflow as well – I'll try this out!

What exactly were you looking for in terms of visualisation? It can sometimes be difficult to visualise conflicting annotations in the same example, because they may obverlap.

I would like the visualization to highlight the diff between the two sets of annotations (while greying out the matched labeled spans), and I agree it's not obvious what to render when there is overlap when merging the two sets of annotations.

The spans_manual does seem to meet my current need. I will definitely give it a try. Thanks for the detailed information on how to use it!