recipe proposing list of custom chosen sentences for manual new

I have recently bought a license of Prodigy to annotate a dataset. I realised that I could speed up the process if I could chose which sentences to present with my own algorithm (somewhat close to regex but not quite). Is there a way to easily plug such a logic?

I also want to use manual annotation style where I select myself the text to annotate.

I understand that the recipes are the way to go but I’m not sure how to do it though.

Sure – that should be pretty easy! All you need to do is wrap the incoming stream of examples (a generator that yields dictionaries) in a filter function that uses your algorithm to check whether the example should be presented for annotation or not.

Here’s a simple, standalone example of how a custom recipe using the ner_manual interface could look. You can also find more details on the recipe components and decorator in the custom recipes workflow, in your PRODIGY_README.html or in the source of prodigy.recipes.ner.

import prodigy
from prodigy.components.preprocess import split_tokens
from prodigy.components.loaders import JSONL
import spacy

def filter_stream(stream):
    # iterate over the stream of examples and filter it using your
    # custom algorithm, and yield the example if it should be shown
    for eg in stream:
        if YOUR_CUSTOM_LOGIC(eg['text']):
            yield eg

@prodigy.recipe('custom-ner.manual')
def custom_ner_manual(dataset, spacy_model, data_path):
    nlp = spacy.load(spacy_model)  # load the model for tokenization
    stream = JSONL(data_path)  # or any other way of loading in your data
    stream = filter_stream(stream)  # filter stream using your logic above
    # use the model to tokenize your stream's texts and add a "tokens"
    # key to each example, to allow token-based manual selection
    stream = split_tokens(nlp, stream)
    labels = ['CAT', 'DOG', 'RABBIT']  # your label set – load this in however you like

    return {
        'view_id': 'ner_manual',     # use the manual NER interface
        'dataset': dataset,          # save annotations in this dataset
        'stream': stream,            # load examples from this stream
        'config': {'labels': labels}  # make your label set available to the front-end
    }

You can then call your custom recipe like this:

prodigy custom-ner.manual my_dataset en_core_web_sm data.jsonl -F recipe.py

Since you’re only looking to replace the stream of examples, you could also solve this even simpler and wrap the built-in ner.manual recipe. Since recipes are simple Python functions that return a dictionary of components, you can also call them in Python with the respective arguments and return the result (the components) by your custom recipe. The source argument is usually a string (e.g. a file path), but it can also be a generator. This means that you can overwrite a recipe’s source with an already initialised stream.

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.recipes.ner import manual as ner_manual  # import recipe function

@prodigy.recipe('custom-ner.manual')
def custom_ner_manual(dataset, spacy_model, label, data_path):
    stream = JSONL(data_path)  # load your data however you like
    stream = filter_stream(stream)  # filter it using your custom logic
    # call the ner_manual function with the same arguments as the CLI version
    # this will return a dictionary of components
    components = ner_manual(dataset, spacy_model, source=stream, label=label)
    return components  # return the components by the custom recipe
2 Likes

Thank you for your answer.

Incidentally the server starts on 0.0.0.0:8080. Where can I configure IP and port please? Thanks.

You can change both of this in your prodigy.json – see this page or the PRODIGY_README.html you can download with the library for more details.

Thank you.