Yes, but it’s not yet supported out-of-the–box. (The main problem at the moment is that the seed terms logic only returns the tasks but not the actual match.) However, we’ve been rewriting the textcat.teach
to use the PatternMatcher
and make it consistent with ner.teach
. You’ll then be able to set the --patterns
argument with a match patterns file, which should also give you a lot more flexibility than just string matches via seed terms. The matched spans are then highlighted in the same style as named entities.
Here’s an updated version of textcat.teach
that you could try:
@recipe('textcat.teach',
dataset=recipe_args['dataset'],
spacy_model=recipe_args['spacy_model'],
source=recipe_args['source'],
label=recipe_args['label'],
api=recipe_args['api'],
loader=recipe_args['loader'],
patterns=recipe_args['patterns'],
long_text=("Long text", "flag", "L", bool),
exclude=recipe_args['exclude'])
def teach(dataset, spacy_model, source=None, label='', api=None, patterns=None,
loader=None, long_text=False, exclude=None):
"""
Collect the best possible training data for a text classification model
with the model in the loop. Based on your annotations, Prodigy will decide
which questions to ask next.
"""
log('RECIPE: Starting recipe textcat.teach', locals())
DB = connect()
nlp = spacy.load(spacy_model, disable=['ner', 'parser'])
log('RECIPE: Creating TextClassifier with model {}'
.format(spacy_model))
model = TextClassifier(nlp, label.split(','), long_text=long_text)
stream = get_stream(source, api, loader, rehash=True, dedup=True,
input_key='text')
if patterns is None:
predict = model
update = model.update
else:
matcher = PatternMatcher(model.nlp, prior_correct=5., prior_incorrect=5.)
matcher = matcher.from_disk(patterns)
log("RECIPE: Created PatternMatcher and loaded in patterns", patterns)
# Combine the textcat model with the PatternMatcher to annotate both
# match results and predictions, and update both models.
predict, update = combine_models(model, matcher)
# Rank the stream. Note this is continuous, as model() is a generator.
# As we call model.update(), the ranking of examples changes.
stream = prefer_uncertain(predict(stream))
return {
'view_id': 'classification',
'dataset': dataset,
'stream': stream,
'exclude': exclude,
'update': update,
'config': {'lang': nlp.lang, 'labels': model.labels}
}
Also make sure to import the PatternMatcher
:
from ..models.matcher import PatternMatcher
Here are some related threads that you might find helpful as well: