How to load a Hugging Face model's annotations in Prodigy


I have a HF model that can annotate the named entities I am interested in. I want to run rel.manual on top of those annotations to 1) correct some of the NER label annotations , and 2) annotate the relationships between those entities.

What is the best way to do this?


Hi @ale,

Have you seen Prodigy-HF plugin? Particularly the hf.ner.correct recipe that lets you load a HF model and correct its predictions.
Once you have a curated NER dataset, you could proceed with REL annoatations using rel.manual and the NER-preannotated dataset as input.

Hi @magdaaniol,

I had browsed the Prodigy-HF plugin extensions but couldn't find one that fits my use case. I was hoping to correct the NER labels using rel.manual and the --span-labels option. The model predictions are mostly good, so I wanted to load them directly into the RE task and if I find one that is incorrect just correct it on the spot. I have thousands of examples to annotate for relationships, so it would make a big difference to do the RE task starting with the model's NER annotations instead of having to review all NER labels first. Is there a way to avoid having to go through all of the examples with hf.ner.correct for this? Is there a way to load the model annotations directly into the rel.manual interface?


Hi @ale,

I see your point. In that case you could try loading your HF model in a spaCy pipeline and use this pipeline as spacy-model in rel.manual with the --add-ents option.

spacy-huggingface-pipelines package provides spaCy components for pre-trained HF models for token (and text) annotations. Hopefully you can easily wrap your your model with the help of hf_token_pipe component.
Once you've saved your pipeline to disk (using nlp.to_disk("./path_to_my_pipeline") for example), you can just specify path to this folder as spacy-model in rel.manual command.

You'd get the warning

⚠ Adding entities typically requires a named entity recognizer, but no
'ner' component was found in the pipeline of model 'en_pipeline'.

but the spans will be added to re.manual anyway as they are available as ents attribute of the doc (if you won't modify the annotate_spans_key when initializing hf_token_pipe)