Seeds not recognized by textcat.teach

ines · May 3, 2018, 4:48pm

Ah, sorry about that! If you're using the latest version of Prodigy, the --seeds argument has been replaced with a --patterns argument – see here for the available recipe arguments, and here for an example of the patterns. You'll also find more info about this in your PRODIGY_README.html. We should probably add a note about this to our text classification video!

The change makes textcat.teach consistent with ner.teach, and gives you more flexibility for the seed terms. Instead of just terms, you can now also specify token patterns, similar to the patterns for spaCy's rule-based Matcher. For example:

{"label": "POSITIVE", "pattern": [{"lower": "able"}]}

This will match all tokens whose lowercase form equals "able". You can also write more complex rules that take into account part-of-speech tags or dependency labels – which sounds like it might help a lot for your use case.

You then then save the patterns as a .jsonl file and load them in via the --patterns argument:

python -m prodigy textcat.teach testing en_core_web_sm sentences.txt --loader TXT --label POSITIVE --exclude testing --patterns patterns.jsonl

To experiment with different match patterns and how to capture different types of phrases, you can also try out our demo:

Topic		Replies	Views
unrecognized arguments: --seeds in textcat.teach usage , textcat , solved	1	949	March 12, 2019
Textcat.teach not using the pattern file enhancement , textcat , done	10	1815	September 20, 2022
Is there a way to highlight seeded terms in textcat.teach? enhancement , textcat , done	5	1710	January 29, 2020
Training Insults classifier video out of date (--seeds argument) and moved documentation docs	4	640	February 8, 2019
Text Classification, Bootstrapping Error textcat	1	622	June 7, 2018

Seeds not recognized by textcat.teach

Related Topics