Hello,
I've just acquired Prodigy with the plan to improve data annotation for an already existing chatbot NLU app (NER as well as intent / text classification). Meaning, I started out with procedurally created gold/complete training data (~30 intents, ~10 entity types, some overlap with the pre-trained spaCy model (ORG, PERSON, LOC ...), ~10000 examples) and train/fine-tune a spaCy NER model. I have then been using a custom annotation loop where I just take the lowest confidence user logs and laboriously correct the full annotations in a home-made sub-par interface. I was hoping to use Prodigy and its accept/reject approach to speed this up dramatically.
The more I look into the forum and documentation, it seems that training a model with both gold and binary training data isn't properly supported? Using ner.batch-train
, it seems I have to specify that the data is either one or the other, using the --no-missing flag
. And there doesn't seem to be any easy way to use spaCy with binary training data.
Is there any way to use both types of data effectively, or is this anywhere on the Prodigy roadmap?