Training a new model, using OpenAI API

tushar · January 11, 2025, 2:50am

Hi,

I plan to train a new model on top of an existing model en_core_web_lg for 13 odd entities. I'll do that one by one.

However, I was wondering if there is any out of the box support for OpenAI API to be used which can help in predicting the entities when annotating the data. This might save us a lot of time.

Thanks.

magdaaniol · January 14, 2025, 12:49pm

Hi @tushar,

Sure! Prodigy comes with a battery of recipes that leverage LLMs of your choice to pre-annotate the dataset for you.

Since you're working with NER, please check the ner.llm.correct and ner.llm.fetch recipes. The first recipe preannotates the data as you annotate in batches, while the other one preannotates and saves the preannotated dataset so that you can correct it without needing to make calls to the LLM API while you annotate.

These recipes use spacy-llm under the hood which means that your can specify the LLM API, customize the prompt (although it comes with a built-in NER-tuned prompt), leverage prompting techniques such as chain-of-thought, seed the prompt with examples and more.

Please see our docs on annotating LLMs to see what's possible and let us know if you have any specific questions as you go

tushar · January 18, 2025, 2:09am

Thanks a ton. I'll give it a try.

Topic		Replies	Views
Introducing recipes to bootstrap annotation via OpenAI GPT3 ner	26	2638	July 19, 2023
Annotating custom entities in job descriptions usage , custom , hr	9	1159	June 2, 2019
Recipe "ner.openai.correct" uses openai models with low token limit ner	1	310	July 14, 2023
Training few new entities: Result very low usage , ner , spacy	3	17	January 29, 2025
annotating entities in text documents usage , ner , solved	15	9928	November 28, 2017

Training a new model, using OpenAI API

Related topics