I’m trying to use the nyt api to train (actually test it out) NER for person names. I managed to configure my api key, but for some reason, I keep getting httpErrors while the system is loading or when it is running.
I’ve tried several combo’s for the command. One that manages to start the labeling process:
prodigy ner.teach dr_mr_names en_core_web_lg 'Brussels' --api nyt --label PERSON
But than it fails after a while with an error:
requests.exceptions.HTTPError: 429 Client Error: for url: https://api.nytimes.com/svc/search/v2/articlesearch.json?q=Brussels&api-key=xxx&page=18
if I try a command like this:
prodigy ner.teach dr_mr_names en_core_web_lg --api nyt --label PERSON --patterns person_patterns.jsonl
it fails immediately (the web server wont start)
requests.exceptions.HTTPError: 429 Client Error: for url: https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=xxx&page=1
What is wrong?
Hi! The first command is definitely correct – it needs some query to look for, otherwise the request isn’t valid. I just searched for the
429 error code and it seems like it’s a “Too Many Requests” error: “The user has sent too many requests in a given amount of time (“rate limiting”).”
This section on the NYT developers page shows how to check your rate limits. It’s possible that the error was caused by the loader making too many requests in total or within a second. This can, for example, happen if there aren’t enough entitiy suggestions for
PERSON in the query “Brussels”, so Prodigy keeps requesting the next page.
Btw, a quick note on the live APIs: They’re mostly intended for testing purposes, e.g. to quickly stream in some live examples and see what happens. We tried to pick APIs that are publicly available and provide a free trial – but because they’re third-party APIs, they often come with their own restrictions regarding the rate limiting and whether you can use the content commercially. So once you’re getting more serious about annotating, I’d recommend sourcing your data upfront, saving it to JSONL or a different format supported by Prodigy and then reading it in from a file
Ok thanks. So the second command fails cause it needs a query text?
Yes, I was trying the live api to quickly get some data in the system without too much hassle, but I can definitely download a dataset and push it through the system as offline text.