Yes, to get over the cold start problem, you'll have to start off with examples of the entity first to give the model something to learn from. The ner.teach
recipe supports passing in a JSONL file containing match patterns (like the patterns used by spaCy's Matcher
). Prodigy will then start showing you matches of those patterns in your texts. As you annotate those examples, the model is updated and will eventually start suggesting examples as well, based on the updated weights. We actually just recorded a video tutorial that shows this workflow for training a new entity type.
There's also more information in this thread and this comment.
In this example, we use the terms.teach
recipe to bootstrap a terminology list from word vectors and then convert it to a patterns file using terms.to-patterns
. But you could also generate the list of patterns manually – see the PRODIGY_README.html
for an example of what's possible.