Run through python script.


(Akshita Sood) #1

How to run Prodigy commands from python script instead of cmd?

New entity model ruins other entities
(Ines Montani) #2

Yes, that’s possible – check out the documentation of the prodigy.serve function in your PRODIGY_README.html. The function is currently very simple, because we weren’t sure how useful it’d be to people – so you currently have to pass in all recipe arguments in order as positional arguments (or None if you don’t want to set them). For example:

import prodigy

prodigy.serve('ner.teach', 'dataset', 'en_core_web_sm', 'data.jsonl', 
              None, None, ['PERSON', 'ORG'], None, None)

Alternatively, you can also call Prodigy in a subprocess, or use a library like fabric3 (the Python 3-compatible fork of Fabric) to build more complex command pipelines. The solution you choose really depends on what you’re trying to do, and what workflow you prefer.

(Akshita Sood) #3


(Akshita Sood) #4

How to use db-in from python script?

(Ines Montani) #5

@akshitasood63 I moved your question here because it fits better in this topic than the one on catastrophic forgetting.

You can check out the source of the db-in command in, or see the PRODIGY_README.html for the API docs of the database methods. This will let you interact with the database from within a Python script. Here’s a simple example:

from prodigy import set_hashes
from prodigy.components.db import connect

db = connect()  # this uses the DB settings in your prodigy.json

# load your examples – from a file or however else you want to
# just make sure they're in Prodigy's JSONL format. You can also use
# one of the built-in loaders like JSONL or CSV (see API docs)
examples = [{'text': 'Hello world', 'answer': 'accept'}, 
            {'text': 'Another example', 'answer': 'reject'}]

# hash the examples to make sure they all have a unique task hash
# and input hash – this is used by Prodigy to distinguish between
# annotations on the same input data
examples = [set_hashes(eg) for eg in examples]

# add examples to the dataset
db.add_examples(examples, datasets=['your_dataset_name'])

Note that the add_examples method expects the dataset to already exist in the database. If you want to add examples to a new set, you’ll need to create it first:


(Akshita Sood) #6

Great.Thanks for explaining.
And what about db-out?
If I want to save annotations from ner.teach recipe.

(Ines Montani) #7

Yes, that’s no problem either (see the source of the db-out recipe or the database methods in the README). The get_dataset method takes the name of a dataset, and returns a list of examples, which you can then save to a file – for example, JSON or JSONL (newline-delimited JSON, Prodigy’s preferred format):

examples = db.get_dataset('your_dataset_name')

examples will be a list of dictionaries, with each dictionary describing one annotation example.

(Akshita Sood) #8

Really helpful.Thanks a lot

(Akshita Sood) #9

Self-written recipe does not work using this function.
May I know the reason behind it?

(Ines Montani) #10

If you’re using a custom recipe and you want to load it by its name (e.g. custom-recipe), it needs to be registered globally first, so Prodigy knows which function to call. This is usually taken care of by the @prodigy.recipe decorator. So if you register your custom recipe first, calling prodigy.serve should work. For example:

import prodigy

def custom_recipe(dataset):  # etc.
    return {'dataset': dataset}  # etc.

prodigy.serve('custom-recipe', 'your_dataset_name')

The @prodigy.recipe decorator will register the recipe custom-recipe, so prodigy.serve can find it. Of course, you could also keep all your custom recipes in a separate module and import them from there – as long as you do that before calling prodigy.serve.

(Akshita Sood) #11

@ines After I am done with all the annotations, and I have saved them using Ctrl+S, How do I break the serve function without explicitly breaking the script using Ctrl+C ?

I just want my rest of the script to be executed after the annotations are complete.


(Ines Montani) #12

By “the rest of the script”, you mean other code placed after prodigy.serve? You could probably catch KeyboardInterrupt, then execute your other logic and then terminate the process manually. Or you could just write your own logic that serves the app and includes hooks for starting and stopping – if you look at the, you’ll see that it’s really pretty straightforward and doesn’t need a lot of code.