Bootstrapping using rule-based matching - handling conflicting patterns within single text

honnibal · November 1, 2019, 4:12pm

This question has come up before, so we've been thinking about how to add some extra options to the built-in recipe to control this. However, one of the ideas behind Prodigy is that everyone wants slightly different behaviours, and the easiest way to get what you want is to put the pieces together yourself into a custom recipe.

You can find a discussion of how to filter the stream to prevent the duplicate texts in this thread: textcat.teach presents same annotation task if text snippet contains multiple patterns . I think if you add the stream filter Ines is suggesting there, it should ensure that you're not asked the redundant questions.

One thing to keep in mind is, since you're doing a multilabel problem, you'll want to make sure it can ask you about different text/label combinations. So you want to make sure you're keying the filter by both the text and the label.

Topic		Replies	Views
textcat.teach repeatedly annotating the same text, not annotating entire text at once usage , textcat	1	624	November 22, 2019
Seeds for text classification appearing multiple times usage , textcat	1	667	June 27, 2019
textcat.teach repeating data with --exclude flag set and trained model in the loop usage , textcat , solved	9	744	September 25, 2019
Same task presented for every pattern match enhancement , textcat	1	560	November 30, 2019
textcat.teach - Patterns not filtering Label enhancement , textcat , done , solved	8	744	January 11, 2019

Bootstrapping using rule-based matching - handling conflicting patterns within single text

Related topics