Wrapping built-in recipe in custom recipe


Instead of creating recipe from scratch, after reading the doc one more time I saw that I could just “modify existing recipe”, nice ! :slight_smile: However I got the following error:

TypeError: teach() got an unexpected keyword argument 'source'

My use case is that I want to add a PhraseMatcher and a custom stream. I followed the Modifying existing recipes section of the doc, but maybe I did something wrong… My recipe looks like this:

        mongo_uri=("MongoDB URL", "positional"),
        spacy_model=("spacy model name or path", "option", "sm", str),
        label=("Label to annotate", "option", "l", str),
        ents_path=("path to entity text file", "option", "p", str),
        sources=("Comma separted list of news source", "option", "s", str))
def teach(dataset, mongo_uri, spacy_model="en", label='ORG', sources="", ents_path=""):
    Annotate texts to train a NER model
    nlp = spacy.load("en")
    if ents_path:
        ent_matcher = EntityMatcher(nlp, ents_path=ents_path)
    # model = EntityRecognizer(nlp, label=label)
    db = MongoClient(mongo_uri)["data"]
    sources = sources.split(",") if sources else []
    stream = split_sentences(nlp, news_stream(db, s=sources))
    components = teach(dataset=dataset, spacy_model=nlp, source=stream, label=label)
    return components

Yes, this should definitely work!

Could it be that your custom recipe function – which is also called teach() – is shadowing the built-in teach() function you’re importing from prodigy.recipes.ner?

Rho I’m so stupid ^^

However it seems that I cannot pass a spacy model instance to the spacy_model argument. Could be helpful, I don’t know if it’s a common usecase.

Actually I don’t really know if the best practice should be try to modify existing recipe as much as possible or just write you’re own. At least for NLP tasks, it seems handy to reuse yours :), especially since otherwise one needs to look in the prodigy code to look at the recipes code to get some inspiration :slight_smile:

Ha, no worries – I’m sure every Python dev can relate :wink:

The ner.teach recipe currently expects the spacy_model to be a model name or path (basically, anything that’s loadable via spacy.load). So if you want to pass in a custom model, you’d have to save it out and load it back in. (In your case, you probably also need to modify your model’s __init_.py to make sure your model includes your custom component. I agree that this is not perfectly convenient at the moment.)

In general, we do imagine that Prodigy users may want to transition to custom recipes in the long run – but of course, this always depends on your use case. For the v1.0 release, we also plan to publish the built-in recipes, plus some other examples for inspiration, as a prodigy-recipes repo. We’ve also been working on tidying the recipes up and adding more components like stream filter functions that you can mix, match and reuse. So the built-in recipes will hopefully be more concise and easier to adapt.

Btw, not sure what your custom EntityMatcher does, but we’ve also been working on a pattern matcher model and an ner.match recipe, which will let you load in a match patterns file (phrase patterns or token patterns) to bootstrap new entity types. There’ll also be a --patterns argument in ner.teach, similar to the --seeds in textcat.teach.

I think my EntityMatcher does exactly what you’re adding :). It’s just a spacy extension, that wraps a PhraseMatcher, takes a list of entity names (such as “Google”, “Facebook Inc.”, … for organizations names) and set their label to “ORG” so that even if the default NER doesn’t find them, they are proposed as entities to be labeled. Kind of seeds for the NER as you say.
So :+1: for the --patterns options !!!