Connecting to my own database using --api


I have a database (and an API) that could extract sentences containing specific words (ie: http://localhost:8080/word/10 will give me 10 sentences containing “word”)

I can add or adapt any endpoint I need.

I would like to do active learning (using ner.teach) using my database and I saw in your documentation that you provide support for that (reddit, twitter) with --api flag.

But I did not see the documentation to adapt my API and configure prodigy to use my own service.


prodigy ner.teach [dataset] en_core_web_sm "word" --api my_api --label ORG

Do I miss something ?


Hi! The most elegant way to make your own loader available is to create a small Python package and expose its functions via the prodigy_loaders or prodigy_apis entry points. You can find more info on this in the “Entry points” section in your PRODIGY_README.html. This will let you write --api my_api and Prodigy will find the loader automatically.

That said, if you’re just getting started, it might be a little overkill and it’s probably easier to start with a custom recipe or loader script. You can find more info and example of this in the loaders section here.

A loader is usually a Python generator that loads the data (duh) and yields dictionaries in Prodigy’s JSON format. For example, {"text": "Some text"}. Because it’s just a regular Python function, it’s pretty flexible – you can make one API request, several API requests, keep state (like, the page number used for the request) and so on. Here’s a dummy example:

def your_custom_loader():
    data = requests.get("").json()
    for record in data:
        # Let's assume your API returns the text as "raw_text"
        yield {"text": record["raw_text"]}
1 Like

Thanks a lot Ines, I’ll try that !

1 Like