I have a database (and an API) that could extract sentences containing specific words (ie: http://localhost:8080/word/10 will give me 10 sentences containing “word”)
I can add or adapt any endpoint I need.
I would like to do active learning (using ner.teach) using my database and I saw in your documentation that you provide support for that (reddit, twitter) with --api flag.
But I did not see the documentation to adapt my API and configure prodigy to use my own service.
Hi! The most elegant way to make your own loader available is to create a small Python package and expose its functions via the prodigy_loaders or prodigy_apis entry points. You can find more info on this in the “Entry points” section in your PRODIGY_README.html. This will let you write --api my_api and Prodigy will find the loader automatically.
That said, if you’re just getting started, it might be a little overkill and it’s probably easier to start with a custom recipe or loader script. You can find more info and example of this in the loaders section here.
A loader is usually a Python generator that loads the data (duh) and yields dictionaries in Prodigy’s JSON format. For example, {"text": "Some text"}. Because it’s just a regular Python function, it’s pretty flexible – you can make one API request, several API requests, keep state (like, the page number used for the request) and so on. Here’s a dummy example:
def your_custom_loader():
data = requests.get("http://my-api.com/endpoint").json()
for record in data:
# Let's assume your API returns the text as "raw_text"
yield {"text": record["raw_text"]}