Hello,
I have an existing NER model trained using AllenNLP and I'd like to use it in the ner.correct recipe. I've seen previous posts here on integrating custom models by jamming them into a SpaCy pipeline. What wasn't clear to me is how prodigy knows to use it for NER... is it by naming it "ner" or is it by virtue of the model being last in the pipeline? Should the model call return a spacy.token.Doc object with ents attached to it or can it just output a list of dictionaries for Prodigy to be able to use those outputs in the labeling task?
Hi! You don't have to integrate your AllenNLP model into spaCy to use it with Prodigy – you can also just run it directly in a custom Prodigy recipe and use it to pre-set the "spans" in the data. Here's an example that shows pretty much exactly what you're looking for: Named Entity Recognition · Prodigy · An annotation tool for AI, Machine Learning & NLP The only part you have to plug in is the part that runs your model over a text and outputs the detected entities. See hrere for the full JSON format – that's what your stream needs to send out if you want to annotate with the ner_manual interface. But everything else is up to you
Wrapping other models as spaCy components can often be useful because it gives you a single unified API for your NLP pipeline. So you can write your code to expect the Doc object data structure as the "single source of truth". But as I said, it's certainly not a must for Prodigy. For completeness, to clarify the other questions:
Prodigy will source the entities it suggests from the doc.ents – it's agnostic to how they got there. Typically, the entity recognizer would be the component in the pipeline setting those annotations, but they could also come from a rule-based component (e.g. spaCy's EntityRuler) or something entirely custom.
That's really helpful, thank you! This is definitely easier than a custom pipeline component. A custom Prodigy recipe makes sense for this case. I think I'd also like to be able to use the ner.teach recipe to update this AllenNLP model I have. Could you point me to the API I should expose in my custom model in order make use of this? Thanks again.
On an unrelated note, the two ExplosionAI products I've interacted with so far, SpaCy and Prodigy are fantastic!
Here are some examples of custom Prodigy recipes with custom models in the loop: Named Entity Recognition · Prodigy · An annotation tool for AI, Machine Learning & NLP There are essentially two parts your recipe needs to return: a stream of examples to annotate (a generator, so the suggestions can change as the model is updated) and an update callback that receives answers and updates the model. The specifics here depend on your model implementation: you want to choose a model that is sensitive enough to small updates in small batches, but also not too sensitive so a single decision won't throw it off. This may take some experimentation.
Also, here are some related recent threads on similar topics: