Invalid data for component 'ner' after ner.eval-ab

Hi,

Firstly, amazing product. As developer with no ML experience prodigy and spaCy have really softened then learning curve and produced some amazing results.

The problem:

I have a dataset with ~2000 manual annotations, which could used to train a model just fine. I then tried out the ner.eval-ab recipe to check some difference in my model before and after some extra annotations.

The problem is that I used my manual (gold) annotations dataset to store the ner.eval-ab results (.approx 20 of them). Now if I run the train command using that dataset i get the following error:

Command:
train ner i_data_v3 ./assets/models/i_model_v3 --output ./assets/models/i_model_v3_3 -n 15 --eval-split 0.25 --dropout 0.3

:heavy_check_mark: Loaded model './assets/models/i_model_v3'
✘ Invalid data for component 'ner'

text field required

{'id': 5, 'input': {'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed'}, 'A': {'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed', 'spans': [{'start': 0, 'end': 7, 'label': 'AMOUNT'}, {'start': 13, 'end': 42, 'label': 'PRODUCT'}, {'start': 72, 'end': 95, 'label': 'PREP'}], 'id': 'A'}, 'B': {'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed', 'spans': [{'start': 0, 'end': 7, 'label': 'AMOUNT'}, {'start': 8, 'end': 70, 'label': 'PRODUCT'}, {'start': 72, 'end': 95, 'label': 'PREP'}], 'id': 'B'}, 'mapping': {'A': 'accept', 'B': 'reject'}, 'options': [{'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed', 'spans': [{'start': 0, 'end': 7, 'label': 'AMOUNT'}, {'start': 13, 'end': 42, 'label': 'PRODUCT'}, {'start': 72, 'end': 95, 'label': 'PREP'}], 'id': 'A'}, {'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed', 'spans': [{'start': 0, 'end': 7, 'label': 'AMOUNT'}, {'start': 8, 'end': 70, 'label': 'PRODUCT'}, {'start': 72, 'end': 95, 'label': 'PREP'}], 'id': 'B'}], '_input_hash': -381308612, '_task_hash': 1743163633, '_session_id': 'i_data_v3-roy', '_view_id': 'choice', 'accept': ['B'], 'answer': 'accept'}

I reviewed the json above and all text properties appear valid.

Questions:

  1. I assume it's best practice to store ner.eval-ab results in a different dataset (wasn't thinking when i tried it out)?

  2. Can/should ner.eval-ab results be used to train a model? or is it best used just for manually comparing 2 models?

  3. The only process i can see to solve this is to run db-out to get a .jsonl file, then remove the ner.eval-ab results manually, then run db-in. I there a better process to resolve this?

  4. Is the above expected? If so, can i suggest maybe adding a note to the https://prodi.gy/docs/recipes#ner-eval-ab doc?

Prodigy install file/version: prodigy-1.9.9-cp36.cp37.cp38-cp36m.cp37m.cp38-win_amd64.whl

Thanks,
Roy

Hi and thanks! :smiley:

Yes, that's likely the problem here. In general, we'd always recommend using a different dataset for different experiments – it's always easy to merge data later on, but splitting datasets is always trickier.

In this case, your dataset ended up with mixed data in different formats, so the train recipe complains because your evaluation results don't specify an explicit text and explicit annotations to train from – just the A/B comparison in the choice format and your decision, a selected option. So it doesn't know what any of it "means" and what your intention is when you're training from it.

The ner.eval-ab recipe is intended to compare the output of two models, yes – what you do with the result is up to you. Most of the time you probably just want to use it to get a quick sense of which model is doing better, get some quick numbers and verify that you're on the right track.

If you do think the suggestions you accepted here are good and would be useful as training data, you could definitely convert them. 'accept': ['B'] in your data indicates which entry you selected and the corresponding key (B in this case) holds the text and annotations. So you could write a script that extracts this and adds it to your data.

There's nothing that really speaks against doing it – it's just not a workflow that's intended out-of-the-box, becaue A/B annotations aren't necessarily assumed to be "correct". You're just giving feedback on which one you think is better.

That's one option, yes. The other one would be to just connect to your database in Python, load the dataset and filter out all examples that have '_view_id': 'choice' or contain a key 'mapping' (which is unique to the A/B comparison data). Then you add the result to a new set.

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("i_data_v3")
filtered = [eg for eg in examples if "mapping" not in eg]
db.add_dataset("i_data_v3_new")
db.add_examples(filtered, datasets=["i_data_v3_new"])

Sure! Maybe just a note that the data you create with the recipe is stored in the database for reference and so you can reproduce your experiments?

Thanks for the speedy response :slight_smile: . All your answers make perfect sense.

Maybe a slight adjustment to the Saves sections of the recipe doc?

Both ner.eval-ab and ner.manual say Saves: annotations to the database. A slight adjustment to that text for ner.eval-ab might be more accurate? An example of the json output might help visualise it a little as well. Although it might be overkill if this isn't a common problem.

Personally i would also find the out-the-box recipes the output is compatible extremely useful. As you have stated ner.eval-ab isn't directly compatible with train ner.

It would also help in other cases I have found it difficult to understand. Such as when/how binary data should be used from ner.teach. For example: a simple note stating that the output of ner.teach is most likely used with the --binary flag when training a new model:

prodigy teach ner dataset model_name --output new_model_name --binary

(note: i'm still not 100% sure if that is correct though? Should binary/teach data be used with with the --binary flag or maybe the --ner-missing as well? I have had mixed results updating existing models with binary teach data that represent 1 label out of the N amount a model knows about)