Hi everyone! We are using the ner.teach recipe to generate the binary accept/reject data from our texts. Is there any way to train this using spacy? We currently train our models in databricks using spacy, so if there is a way to import that data and leverage our existing pipelines that would be great!
If not, is there a way to run the prodigy train command directly in a notebook or python script?
You can use the data-to-spacy command to convert the annotations to spaCy's training format, and set --ner-missing to treat all unannotated tokens as missing values (so you can represent partial annotations). However, for the binary annotations collected with the active learning recipes, it does mean you're losing some information from the rejected examples. I've explained the difference of the two update mechanisms in more detail here:
spaCy does let you represent "negative" labels with an exclamation mark, though – for instance, !B-PERSON for a token that's not the beginning of a PERSON entity. So you could experiment with that to incporporate some of the rejected answers, if your results end up looking worse.