I am planning to use spans.correct but I thought GPT3 would be better. Hence, was looking for a way to implement it. Will try spacy first then if it does not work then will look at other options.
Thank you for the prompt response
Figured I'd mention it here to folks: last week Explosion released spacy-llm which makes it easy to integrate large language models in a spaCy pipeline. This should also make it much easier to re-use spaCy pipelines in Prodigy.
Feel free to check it out here:
Just wanted to share some Prodigy x LLM annotation side projects I did!
-
The first blog post involves using an LLM-assisted textcat annotation interface for argument mining! Here, I explored how we can use language models to augment the annotation process on tasks that require some nuance and chains of reasoning. I tried different prompting "styles" such as standard zero-shot and chain-of-thought reasoning.
-
The second blog post attempts to ingest a large annotation guideline (a PDF document) and incorporate it into a prompt. Aside from Prodigy, I also used langchain. Here, I discussed a method on how I was able to fit a very long document given a smaller token limit. In the future, I find it interesting to explore how well annotation guidelines actually "capture" the phenomena itself.
Hope these blog posts inspire you to try out some LLM-enhanced annotation workflows!
Update May 19: We've recently released v1.12 alpha, which includes LLM components like OpenAI recipes into Prodigy. Let us know if you have any feedback!
There is also a preview docs site:
We're excited to see what you can build with them
I'm unable to reference gpt-4
in openai_textcat.py
and openai_ner.py
. The latest model I can call is legacy text-davinci-003
. Can I reference gpt-4
or gpt-3.5
? Or something we can call here: GitHub - explosion/spacy-llm: ๐ฆ Integrating LLMs into structured NLP pipelines?
def textcat_openai_correct(
dataset: str,
filepath: Path,
labels: List[str],
lang: str = "en",
model: str = "gpt-4",
batch_size: int = 10,
segment: bool = False,
prompt_path: Path = DEFAULT_PROMPT_PATH,
examples_path: Optional[Path] = None,
max_examples: int = 2,
exclusive_classes: bool = False,
verbose: bool = False,
):
I'm also looking for a way to combine llm NER predictions with spaCy ner.correct
default entities in the same recipe -- basically one task that takes an utterance and predicts the list of default spaCy entities with additional custom labelsI have defined that are predicted by gpt-3.5 or gpt-4. The annotator would then review and correct.
@cheyanneb have you seen this blogpost?
It uses the spacy-llm project in Prodigy to help generate these kinds of review interfaces.
At the moment, and also for the long term, I would recommend using spacy-llm. The next version of Prodigy (v1.13, which shouldn't take too long) will support spacy-llm directly as an alternative for these OpenAI recipes. The spacy-llm project has a bunch of benefits, like support for more backends as well as a proper caching mechanism. But it's also a project that's easier to update when, as shown recently, OpenAI decides to deprecate some of their main endpoints.
Let me know if the blogpost does not help in the meantime!