I just got a personal license and am trying to annotate text data for NER training with multiple entities (custom entities), for now I just want data annotation, and care less about model training.
I followed the tutorial and did the following
(1) created dataset:
prodigy dataset my_data “My data for Annotation”
(2) I have jsonl file with few records in json format – “tmp.jsonl”
(3) I tried ner.teach but I am not able to provide custom label, also ner.teach opens up web interface for one entity
prodigy ner.teach my_data en_core_web_sm tmp.jsonl
(4) I tried mark, but this doesn’t load any data
prodigy mark my_data tmp.jsonl --view-id ner_manual
All I wanted is to just annotate text on a set of custom entities.
You’re probably looking for the ner.manual recipe, which streams in your text, tokenizes it and asks you to label one or more spans of text. The --label argument lets you pass in one or more labels to annotate. For example:
Some background on the other recipes: Because ner.teach only asks you for binary feedback, it also only asks you for one entity at a time. You can still annotate multiple labels and multiple entities in the same text – just in separate steps. The mark recipe on the other hand streams in whatever you give it, without transforming your data. So if you set --view-id ner_manual, it also expects the data to be tokenized and in the correct format for the manual interface (see your PRODIGY_README.html for details if you’re interested).
No, what you’re doing is correct. The my_data dataset is where the answers will be saved when you submit them in the app.
In the “Progress” in the sidebar it says that the total number of annotations in your dataset is 5 – did you already annotate something with that dataset previously? And what happens if you create a completely new dataset and try again?
If that doesn’t work, what’s in your temp.jsonl? Can you show an example? And can you run the command with PRODIGY_LOGGING=basic and see if there’s anything suspicious in the logs? For example:
Yes I previously annotated my_data. But then, I dropped the dataset and re-created it with the same name, seems like dropping dataset doesn’t delete the table in SQLite.
When I created a dataset with a different name, it worked.