Ah, sorry about that! If you're using the latest version of Prodigy, the --seeds
argument has been replaced with a --patterns
argument โ see here for the available recipe arguments, and here for an example of the patterns. You'll also find more info about this in your PRODIGY_README.html
. We should probably add a note about this to our text classification video!
The change makes textcat.teach
consistent with ner.teach
, and gives you more flexibility for the seed terms. Instead of just terms, you can now also specify token patterns, similar to the patterns for spaCy's rule-based Matcher. For example:
{"label": "POSITIVE", "pattern": [{"lower": "able"}]}
This will match all tokens whose lowercase form equals "able". You can also write more complex rules that take into account part-of-speech tags or dependency labels โ which sounds like it might help a lot for your use case.
You then then save the patterns as a .jsonl
file and load them in via the --patterns
argument:
python -m prodigy textcat.teach testing en_core_web_sm sentences.txt --loader TXT --label POSITIVE --exclude testing --patterns patterns.jsonl
To experiment with different match patterns and how to capture different types of phrases, you can also try out our demo: