Seeds not recognized by textcat.teach

Ah, sorry about that! If you're using the latest version of Prodigy, the --seeds argument has been replaced with a --patterns argument โ€“ see here for the available recipe arguments, and here for an example of the patterns. You'll also find more info about this in your PRODIGY_README.html. We should probably add a note about this to our text classification video! :+1:

The change makes textcat.teach consistent with ner.teach, and gives you more flexibility for the seed terms. Instead of just terms, you can now also specify token patterns, similar to the patterns for spaCy's rule-based Matcher. For example:

{"label": "POSITIVE", "pattern": [{"lower": "able"}]}

This will match all tokens whose lowercase form equals "able". You can also write more complex rules that take into account part-of-speech tags or dependency labels โ€“ which sounds like it might help a lot for your use case.

You then then save the patterns as a .jsonl file and load them in via the --patterns argument:

python -m prodigy textcat.teach testing en_core_web_sm sentences.txt --loader TXT --label POSITIVE --exclude testing --patterns patterns.jsonl

To experiment with different match patterns and how to capture different types of phrases, you can also try out our demo:

1 Like