Textcat.teach not using the pattern file

ines · August 13, 2018, 10:31pm

Thanks for the analysis – this makes sense and is consistent with what I suspected above: since the pattern matches also receive a score, they are filtered out if they're not considered "relevant" enough. This makes sense if there's a lot of incoming data – but not so much if you're starting from scratch. So that's definitely something we want to optimise and provide more settings for.

No, this thread describes a solution for an old version of Prodigy that didn't yet support the full highlighting for textcat recipes and only highlighted the terms for ner. As I mention in my comment here, this update was shipped in v1.4.0.

If you just want to find matches in your data to pre-train the model, I would suggest repurposing the ner.match recipe which does exactly that: it takes patterns, finds the matches and asks you for feedback.

Sorry if my description was unclear. I meant editing the data you collect afterwards to add a "label", so you can use the data in textcat.batch-train. For example, once you're done with ner.match, you can export the data:

prodigy db-out your_match_dataset > data.jsonl

Then run a quick search and replace and add "label": "NEG" to each entry in the JSONL and add a new dataset for the converted annotations. You can then pre-train your text classification model from that:

prodigy dataset textcat_match_dataset "Converted dataset with added labels"
prodigy db-in textcat_match_dataset data_converted.jsonl
prodigy textcat.batch-train textcat_match_dataset ... # etc

Once you have a model that's learned a bit more about your "NEG" label, you can load it into textcat.teach and start improving the model, without the immediate need to use patterns for bootstrapping.

Topic		Replies	Views
Pattern files for textcat.teach usage , textcat	20	3747	July 6, 2018
textcat.teach - Patterns not filtering Label enhancement , textcat , done , solved	8	744	January 11, 2019
textcat.teach repeatedly annotating the same text, not annotating entire text at once usage , textcat	1	623	November 22, 2019
Seeds not recognized by textcat.teach textcat , solved	10	3275	January 23, 2019
Label mismatch in Pattern file and textcat.teach command textcat , solved	6	630	June 14, 2018

Textcat.teach not using the pattern file

Related topics