Introducing recipes to bootstrap annotation via OpenAI GPT3

shabbirrafiq · May 2, 2023, 8:33pm

I am planning to use spans.correct but I thought GPT3 would be better. Hence, was looking for a way to implement it. Will try spacy first then if it does not work then will look at other options.
Thank you for the prompt response

koaning · May 15, 2023, 12:19pm

Figured I'd mention it here to folks: last week Explosion released spacy-llm which makes it easy to integrate large language models in a spaCy pipeline. This should also make it much easier to re-use spaCy pipelines in Prodigy.

Feel free to check it out here:

ljvmiranda921 · May 18, 2023, 11:08am

Just wanted to share some Prodigy x LLM annotation side projects I did!

The first blog post involves using an LLM-assisted textcat annotation interface for argument mining! Here, I explored how we can use language models to augment the annotation process on tasks that require some nuance and chains of reasoning. I tried different prompting "styles" such as standard zero-shot and chain-of-thought reasoning.
The second blog post attempts to ingest a large annotation guideline (a PDF document) and incorporate it into a prompt. Aside from Prodigy, I also used langchain. Here, I discussed a method on how I was able to fit a very long document given a smaller token limit. In the future, I find it interesting to explore how well annotation guidelines actually "capture" the phenomena itself.

Hope these blog posts inspire you to try out some LLM-enhanced annotation workflows!

ryanwesslen · May 19, 2023, 7:43pm

Update May 19: We've recently released v1.12 alpha, which includes LLM components like OpenAI recipes into Prodigy. Let us know if you have any feedback!

There is also a preview docs site:

We're excited to see what you can build with them

cheyanneb · July 14, 2023, 3:31pm

I'm unable to reference gpt-4 in openai_textcat.py and openai_ner.py. The latest model I can call is legacy text-davinci-003. Can I reference gpt-4 or gpt-3.5? Or something we can call here: GitHub - explosion/spacy-llm: 🦙 Integrating LLMs into structured NLP pipelines?

def textcat_openai_correct(
    dataset: str,
    filepath: Path,
    labels: List[str],
    lang: str = "en",
    model: str = "gpt-4",
    batch_size: int = 10,
    segment: bool = False,
    prompt_path: Path = DEFAULT_PROMPT_PATH,
    examples_path: Optional[Path] = None,
    max_examples: int = 2,
    exclusive_classes: bool = False,
    verbose: bool = False,
):

cheyanneb · July 18, 2023, 2:35pm

I'm also looking for a way to combine llm NER predictions with spaCy ner.correct default entities in the same recipe -- basically one task that takes an utterance and predicts the list of default spaCy entities with additional custom labelsI have defined that are predicted by gpt-3.5 or gpt-4. The annotator would then review and correct.

koaning · July 19, 2023, 8:34am

@cheyanneb have you seen this blogpost?

It uses the spacy-llm project in Prodigy to help generate these kinds of review interfaces.

At the moment, and also for the long term, I would recommend using spacy-llm. The next version of Prodigy (v1.13, which shouldn't take too long) will support spacy-llm directly as an alternative for these OpenAI recipes. The spacy-llm project has a bunch of benefits, like support for more backends as well as a proper caching mechanism. But it's also a project that's easier to update when, as shown recently, OpenAI decides to deprecate some of their main endpoints.

Let me know if the blogpost does not help in the meantime!

Topic		Replies	Views
Training a new model, using OpenAI API usage , ner , spacy	2	40	January 18, 2025
annotating entities in text documents usage , ner , solved	15	9932	November 28, 2017
Annotating custom entities in job descriptions usage , custom , hr	9	1160	June 2, 2019
openai key and org ID ner	1	474	August 9, 2023
Recipe "ner.openai.correct" uses openai models with low token limit ner	1	310	July 14, 2023

Introducing recipes to bootstrap annotation via OpenAI GPT3

Related topics