How to easily convert some patterns into terms for classification

Anz2 · December 13, 2019, 8:40pm

Hi, I am trying to teach some complex terms for classification. I prepared some phrases that can be appearing in given class (not just one word!). For example: "I can't download" and then I converted that into terms which gave me the resulting spacy style pattern "{"label":"SOME_CLASS","pattern":[{"lower":"I can't download"}]}". I have used pattern matching many times in spacy and from that experience I am assuming that such pattern will never find anything unless I create custom tokenizer which tokenizes text in such a way that that whole phrase will be one token.
I need at least something like that: {"label":"SOME_CLASS","pattern":[{"lower":"I"}, {"lower": "can't"}, {"lower":"download"}]}. Is it supported and I didn't found?

ines · December 16, 2019, 11:37am

At the moment, the terms.to-patterns recipe doesn't tokenize (although it will in the next version). But creating those patterns shouldn't be very difficult – all you have to do is tokenize the text:

phrases = ["I can't download"]
nlp = spacy.blank("en")
patterns = []
for doc in nlp.pipe(phrases):
    pattern = [{"lower": token.lower_} for token in doc]
    patterns.append({"label": "SOME_CLASS", "pattern": pattern})

Edit: Now also shipped in v1.9: you can set a --spacy-model argument on terms.to-patterns that's either the name of a model or blank:en etc. (to just use a blank language tokenizer).

Topic		Replies	Views
Input pattern file to terms.teach	3	318	February 24, 2023
Match Pattern Converter: Dataframe to JSON usage , spacy , solved	8	460	June 4, 2021
Does term.to-patterns tokenise patterns? enhancement , usage , terms , solved	2	567	August 26, 2019
Using patterns for multi-word expressions usage , solved	3	1354	November 9, 2018
Problem with new entity type and patterns usage , ner , solved	2	817	January 8, 2019

How to easily convert some patterns into terms for classification

Related topics