ner.openai.correct Time out issue

Hello, I am very new to prodigy. I am getting issues using the ner.openai.correct service. It works for a while and then annoyingly can get up to 200 approved items and then times out with an error. Any thoughts or suggestions?

= Task exception was never retrieved

future: <Task finished name='Task-86' coro=<RequestResponseCycle.run_asgi() done, defined at /home/paul/anaconda3/envs/prodigy-dev-2/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py:402> exception=ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=20)"))>

Traceback (most recent call last):

Hi Paul!

Just from a glance, that looks like a timeout from OpenAI. I've seen these happen before and it's usually not something that we can control. OpenAI can get big bursts of traffic, which also means that the latency can go all over the place on their side ... even to the extend that it causes a timeout once in while to an average user. Our code already has a retry mechanic in place, but that's not always enough.

Just to check, is there a specific moment where this issue pops up consistently? Or is it somewhat random?

An alternative for now might be to use ner.openai.fetch instead. This recipe allows you to fetch examples upfront, and you can always resume a previous run via the --resume flag. These results can then be used in a ner.manual recipe without having to worry about the OpenAI connection.

Would this remedy the situation for now? I could dive into our code and see if we might be able to increase the wait time in our retry-mechanic, but I fear that this might only remedy a small part of the problem.

Thanks, it seems to be random, time-wise. I had it fail after 20 records, then 200. I will give fetch ago. It would be good if the code could pause/retry, as this is the best current LLM we have at the moment. I am on fibre, so I assume it is OpenAI's end as they still struggle with performance. Any plans to support locally installed open-source LLMs? Paul

We already have a mechanic in there that does that :sweat_smile: if I recall correctly the base behavior is to really try retry 10 times in total per run.

One tip, when you use the fetch recipe, make sure you consider the --resume command. That way, you'll be able to pick up the progress from before. That means that if the connection fails, you'll still be able to pick up where it left off.

Yes! Version 1.13 of Prodigy will support spacy-llm which will be able to connect to all sorts of backends for all sorts of tasks. We'll likely be able to replace all the OpenAI recipes with recipes that allow you to pick your own provider, as well as have more predefined tasks available.

1 Like