Creating a custom recipe to integrate bespoke model

daniyalSelani · November 11, 2019, 2:10pm

I want to use ner.match but with my custom ner model.
My model takes in a text and outputs the span of the recognized term and the label associated with it.
I want the ner.match recipe to show the highlighted term along with its label from my model.
How do i wrap the ner.match recipe to achieve this. And if this cannot be done by simply wrapping the ner.match recipe, how do i create my custom recipe to achieve the same result.
Thank you!

ines · November 11, 2019, 4:31pm

Hi! I think you might find it easier to write your own recipe, since it'll make it easier to see what's going on, and the logic itself isn't that complicated.

Here's a simplified version of the ner.match recipe with some comments that explain what's going on:

github.com

explosion/prodigy-recipes/blob/master/ner/ner_match.py

# coding: utf8
from __future__ import unicode_literals

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.models.matcher import PatternMatcher
from prodigy.components.db import connect
from prodigy.util import split_string
import spacy


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe('ner.match',
    dataset=("The dataset to use", "positional", None, str),
    spacy_model=("The base model", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    patterns=("Optional match patterns", "option", "p", str),
    exclude=("Names of datasets to exclude", "option", "e", split_string),

This file has been truncated. show original

In the recipe above, it uses the pattern matcher and spaCy to add the pattern matches to your stream. But instead, you can also write a function that takes a text and returns the start and end character offsets and the label. For each span, you can then yield out a dictionary with the "text" and "spans". Here's an example:

def get_stream(stream):
    for eg in stream:
        spans_from_model = get_spans_from_your_model(eg["text"])
        for start_char, end_char, label in spans_from_model:
            # Let's assume your function returns a tuple of the start and end
            # offset and the label. For each span, we now create a new task
            # and send it out
            spans = [{"start": start_char, "end": end_char, "label": label}]
            yield {"text": eg["text"], "spans": spans}

In your recipe, you can then load your data (however you want) and create your stream:

stream = JSONL(source)
stream = get_stream(stream)

So the most basic version of your recipe could look like this (plus the get_stream function of course):

import prodigy
from prodigy.components.loaders import JSONL

@prodigy.recipe('custom.ner.match',
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
)
def custom_ner_match(dataset, source):
    stream = JSONL(source)
    stream = get_stream(stream)

    return {
        'view_id': 'ner',       # Annotation interface to use
        'dataset': dataset,     # Name of dataset to save annotations
        'stream': stream,       # Incoming stream of examples
    }

daniyalSelani · November 12, 2019, 10:40am

Thank you so much!
worked like a charm
Follow up question:
Is it possible to create a custom recipe that combines the functionality of ner.match and ner.manual?

ines · November 12, 2019, 11:36am

Yay, glad to hear it worked!

Sure You should only have to change a few small things:

use the ner_manual view ID instead of just ner
add a "config": {"labels": [...]} to the components returned by your recipe that defines the full label scheme you can select
make sure each incoming example is tokenized and has a "tokens" property (to allow quick highlighting that "snaps" to token boundaries)
only send out one example per text (instead of one example per span) because you probably want to see all matches at once, right?

For tokenization, Prodigy has a built-in add_tokens helper. You can also see an example of it in the prodigy-recipes repo. The function takes a spaCy nlp object for tokenization and the stream, and will add a "tokens" property to each example.

import spacy
from prodigy.components.preprocess import add_tokens

# At the end of your recipe
nlp = spacy.load(spacy_model)
stream = add_tokens(nlp, stream)

One thing that's important to note here: the tokenization used here should match the tokenization of your custom model and allow the entities to be valid token spans. So if your model uses a custom tokenizer, you might want to use that instead and create the "tokens" property yourself – you can find the format in the "Annotation task formats" section in your PRODIGY_README.html.

To only send out one example instead of one example per span, your get_stream, could be simplified like this:

def get_stream(stream):
    for eg in stream:
        spans_from_model = get_spans_from_your_model(eg["text"])
        eg["spans"] = [{"start": start_char, "end": end_char, "label": label}
                       for start_char, end_char, label in spans_from_model]
        yield eg

In the return statement of your recipe, you can now change the view_id and add the labels:

return {
    "view_id": "ner_manual", 
    "dataset": dataset,
    "stream": stream,
    "config": {
        "labels": ["SOME_LABEL", "FOO", "BAR"]
    }
}

You should now see all entities in the text highlighted and editable, with the list of labels as selectable options on top.

Topic		Replies	Views
Custom ner recipe doesn't work with patterns ner	10	631	April 9, 2020
How do I add a --patterns option to ner.make-gold? ner , solved	11	1809	October 25, 2018
Training NER model from scratch using (forward-looking) patterns usage	8	692	December 17, 2019
Wrapping built-in recipe in custom recipe usage , custom , solved	5	1406	December 5, 2017
Prodigy present text with no matching pattern (ner.manual) usage , ner , solved	5	463	April 12, 2020

Creating a custom recipe to integrate bespoke model

Related topics