ner.print-stream for patterns?

Is there a utility to show all the spans highlighted by a set of patterns in a corpus? Like ner.print-stream except for a set of patterns instead of a model?

(I think the answer is “no” and it’s fairly easy to write this, but I wanted to double-check.)

Not currently, no – but this is a good idea, and adding a --patterns argument to ner.print-stream should definitely be no problem. Will put this on my list for the next release!

In the meantime, I think the easiest way to write this yourself would be to take inspiration from ner.match (i.e. use the PatternMatcher), and add it to ner.print-stream :+1:

1 Like

Here’s my standalone recipe to do this.

import spacy
from prodigy.components import printers
from prodigy.components.loaders import get_stream
from prodigy.core import recipe, recipe_args
from prodigy.models.matcher import PatternMatcher
from prodigy.util import log


@recipe('ner.print-pattern-stream',
        spacy_model=recipe_args['spacy_model'],
        patterns=('Path to match patterns file', 'positional'),
        source=recipe_args['source'],
        api=recipe_args['api'],
        loader=recipe_args['loader'])
def print_pattern_stream(spacy_model, patterns, source=None, api=None, loader=None):
    """
    Pretty print spans matched in a stream.
    """
    log("RECIPE: Starting recipe ner.print-pattern-stream", locals())
    model = PatternMatcher(spacy.load(spacy_model)).from_disk(patterns)
    stream = get_stream(source, api, loader, rehash=True, input_key='text')
    printers.pretty_print_ner(eg for _, eg in model(stream))

Pretty easy. Nice.

1 Like

No wait, that only highlights the first match of every document. Have to look at this some more…