Hi,
Firstly, amazing product. As developer with no ML experience prodigy and spaCy have really softened then learning curve and produced some amazing results.
The problem:
I have a dataset with ~2000 manual annotations, which could used to train a model just fine. I then tried out the ner.eval-ab
recipe to check some difference in my model before and after some extra annotations.
The problem is that I used my manual (gold) annotations dataset to store the ner.eval-ab
results (.approx 20 of them). Now if I run the train command using that dataset i get the following error:
Command:
train ner i_data_v3 ./assets/models/i_model_v3 --output ./assets/models/i_model_v3_3 -n 15 --eval-split 0.25 --dropout 0.3
Loaded model './assets/models/i_model_v3'
✘ Invalid data for component 'ner'
text field required
{'id': 5, 'input': {'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed'}, 'A': {'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed', 'spans': [{'start': 0, 'end': 7, 'label': 'AMOUNT'}, {'start': 13, 'end': 42, 'label': 'PRODUCT'}, {'start': 72, 'end': 95, 'label': 'PREP'}], 'id': 'A'}, 'B': {'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed', 'spans': [{'start': 0, 'end': 7, 'label': 'AMOUNT'}, {'start': 8, 'end': 70, 'label': 'PRODUCT'}, {'start': 72, 'end': 95, 'label': 'PREP'}], 'id': 'B'}, 'mapping': {'A': 'accept', 'B': 'reject'}, 'options': [{'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed', 'spans': [{'start': 0, 'end': 7, 'label': 'AMOUNT'}, {'start': 13, 'end': 42, 'label': 'PRODUCT'}, {'start': 72, 'end': 95, 'label': 'PREP'}], 'id': 'A'}, {'text': '1 whole rich fruity dried chili like ancho, mulatto, negro, or pasilla, stems and seeds removed', 'spans': [{'start': 0, 'end': 7, 'label': 'AMOUNT'}, {'start': 8, 'end': 70, 'label': 'PRODUCT'}, {'start': 72, 'end': 95, 'label': 'PREP'}], 'id': 'B'}], '_input_hash': -381308612, '_task_hash': 1743163633, '_session_id': 'i_data_v3-roy', '_view_id': 'choice', 'accept': ['B'], 'answer': 'accept'}
I reviewed the json above and all text properties appear valid.
Questions:
-
I assume it's best practice to store
ner.eval-ab
results in a different dataset (wasn't thinking when i tried it out)? -
Can/should
ner.eval-ab
results be used to train a model? or is it best used just for manually comparing 2 models? -
The only process i can see to solve this is to run
db-out
to get a .jsonl file, then remove thener.eval-ab
results manually, then rundb-in
. I there a better process to resolve this? -
Is the above expected? If so, can i suggest maybe adding a note to the https://prodi.gy/docs/recipes#ner-eval-ab doc?
Prodigy install file/version: prodigy-1.9.9-cp36.cp37.cp38-cp36m.cp37m.cp38-win_amd64.whl
Thanks,
Roy