Sure – that should be pretty easy! All you need to do is wrap the incoming stream of examples (a generator that yields dictionaries) in a filter function that uses your algorithm to check whether the example should be presented for annotation or not.
Here’s a simple, standalone example of how a custom recipe using the ner_manual
interface could look. You can also find more details on the recipe components and decorator in the custom recipes workflow, in your PRODIGY_README.html
or in the source of prodigy.recipes.ner
.
import prodigy
from prodigy.components.preprocess import split_tokens
from prodigy.components.loaders import JSONL
import spacy
def filter_stream(stream):
# iterate over the stream of examples and filter it using your
# custom algorithm, and yield the example if it should be shown
for eg in stream:
if YOUR_CUSTOM_LOGIC(eg['text']):
yield eg
@prodigy.recipe('custom-ner.manual')
def custom_ner_manual(dataset, spacy_model, data_path):
nlp = spacy.load(spacy_model) # load the model for tokenization
stream = JSONL(data_path) # or any other way of loading in your data
stream = filter_stream(stream) # filter stream using your logic above
# use the model to tokenize your stream's texts and add a "tokens"
# key to each example, to allow token-based manual selection
stream = split_tokens(nlp, stream)
labels = ['CAT', 'DOG', 'RABBIT'] # your label set – load this in however you like
return {
'view_id': 'ner_manual', # use the manual NER interface
'dataset': dataset, # save annotations in this dataset
'stream': stream, # load examples from this stream
'config': {'labels': labels} # make your label set available to the front-end
}
You can then call your custom recipe like this:
prodigy custom-ner.manual my_dataset en_core_web_sm data.jsonl -F recipe.py
Since you’re only looking to replace the stream of examples, you could also solve this even simpler and wrap the built-in ner.manual
recipe. Since recipes are simple Python functions that return a dictionary of components, you can also call them in Python with the respective arguments and return the result (the components) by your custom recipe. The source
argument is usually a string (e.g. a file path), but it can also be a generator. This means that you can overwrite a recipe’s source
with an already initialised stream.
import prodigy
from prodigy.components.loaders import JSONL
from prodigy.recipes.ner import manual as ner_manual # import recipe function
@prodigy.recipe('custom-ner.manual')
def custom_ner_manual(dataset, spacy_model, label, data_path):
stream = JSONL(data_path) # load your data however you like
stream = filter_stream(stream) # filter it using your custom logic
# call the ner_manual function with the same arguments as the CLI version
# this will return a dictionary of components
components = ner_manual(dataset, spacy_model, source=stream, label=label)
return components # return the components by the custom recipe