Training a new model, using OpenAI API

Hi,

I plan to train a new model on top of an existing model en_core_web_lg for 13 odd entities. I'll do that one by one.

However, I was wondering if there is any out of the box support for OpenAI API to be used which can help in predicting the entities when annotating the data. This might save us a lot of time.

Thanks.

Hi @tushar,

Sure! Prodigy comes with a battery of recipes that leverage LLMs of your choice to pre-annotate the dataset for you.

Since you're working with NER, please check the ner.llm.correct and ner.llm.fetch recipes. The first recipe preannotates the data as you annotate in batches, while the other one preannotates and saves the preannotated dataset so that you can correct it without needing to make calls to the LLM API while you annotate.

These recipes use spacy-llm under the hood which means that your can specify the LLM API, customize the prompt (although it comes with a built-in NER-tuned prompt), leverage prompting techniques such as chain-of-thought, seed the prompt with examples and more.

Please see our docs on annotating LLMs to see what's possible and let us know if you have any specific questions as you go :slight_smile: