Hi Andrey,
You can definitely achieve similar results with a custom ner correct recipe and the new spacy-llm
library. spacy-llm
let's you integrate an LLM (hosted or local) as a spaCy component. It will take care of prompt generation and parsing and store the LLM annotation results on the Doc object just like any other spaCy pipeline.
To use a LLM with spaCy you’ll need to start by creating a configuration file that tells spacy-llm
how to construct a prompt for your task. Please see spacy-llm
docs for details, but for NER it could look like this:
[nlp]
lang = "en"
pipeline = ["llm"]
[components]
[components.llm]
factory = "llm"
[components.llm.task]
@llm_tasks = "spacy.NER.v2"
labels = ["PERSON", "ORGANISATION", "LOCATION"]
[components.llm.backend]
@llm_backends = "spacy.Dolly_HF.v1"
# For better performance, use databricks/dolly-v2-12b instead
model = "databricks/dolly-v2-3b"
Then, from your custom recipe you could assemble the nlp pipeline like so:
from spacy_llm.util import assemble
# Assemble a spaCy pipeline from the config
nlp = assemble("config.cfg")
# Use this pipeline as you would normally
doc = nlp("I know of a great pizza recipe with anchovis.")
print(doc.ents) # (pizza, anchovis)
Once you have processed your examples with the LLM loaded pipeline, you can use it as input to ner_manual interface for annotators to correct the LLM annotations just like it's done with openai.ner.correct.
In the very near future we are going to release Prodigy built-in spacy-llm
recipes, but for now the same results can be achieved with a just a little bit custom scripting thanks to spacy-llm
.
Let us know how it goes and if you need any assistance!