I have 3000 examples, and there are three categories. I’ve annotated the samples in a separate notebook with Spacy’s phrasematcher and matcher. My labels are not 100% correct (maybe 70-80%), so I want to correct them in prodigy, and when I have these examples, I want to use ner.teach. But my question is:
How can I inspect and fix every single instance in prodigy?
My dataset is a jsonl file in the following form (I’ve changed the actual texts)
{"text": "cancer type b. lorem ipsum.. ", "spans": [{"start": 14, "end": 48, "tokens_start": 3, "token_end": 6, "label": "DISORDER"}, {"start": 90, "end": 124, "tokens_start": 16, "token_end": 19, "label": "NEG_DISORDER"}, {"start": 170, "end": 189, "tokens_start": 31, "token_end": 33, "label": "DISORDER"}]}
{"text": "sinus rhythm. since the previous measurement - no cancer is seen ", "spans": []}
When im using:
ner.manual my_dataset diseasemodel patientrecors.jsonl --label
"DISORDER,NEG_DISORDER,UN_DISORDER"
it says “No tasks available.” in the app.
I can use prodigy ner.make-gold, but I’m not sure if that’s the best way if I want to make sure to label all samples. Can prodigy keep track of fully annotated examples?
Another question: if my model is trained initially with a single label (“DISORDER”), does it automatically add two new tags to the model when I’m using make-gold or manual to (I’m using a blank spacy NER model).