Older versions of Prodigy just let you use the seeds dataset directly in the textcat recipe, but newer versions now all standardise on a patterns file, which gives you more flexibility. This is expected to be a JSONL file on disk and you can create it with terms.to-patterns
. There's also a section in the video description that explains the differences in newer versions of Prodigy (since the video is already a couple of years old):
Since this video was recorded, the
textcat.teach
command has changed in one detail: instead of a --seeds argument, you can now pass in --patterns, which lets you describe single words but also more complex combinations of tokens based on their attributes. To convert a seed dataset to patterns, you can use theterms.to-patterns
recipe. For more details, see here: Seeds not recognized by textcat.teach