prodigy-recipes repo – feedback appreciated!

claudio84destri · November 17, 2018, 9:00am

Hi,

I am trying to make an active learning custom recepi based on textcat.tech with the following:

adding the – memorize option which is present in mark recepi to avoid duplication during the annotation process. Basically having a cache for inside single batch texts to keep track of already asked text and remove it
set a high probability score threshold to remove most of negative samples when using prefer_high_scores(algorithm = ‘probability’)

the reason is that when using active learning textcat.teach with a very imbalanced dataset, to reduce number of negative samples, I tend to use quite a lot of patterns which ends up in a lot of duplicated questions during active learning. Some pattern is even repeated multiple time during text.

moreover if I train well the CNN, I hope that using prefer_high_scores() will help but I would like to set different threshold score according to the state of accuracy of the CNN.

my questions for you is where could I find the --memorize function to include in a custom recepi?

thank very much in advance
kind regards

claudio nespoli

Topic		Replies	Views
Can we bring back --seeds for textcat.teach? textcat , solved	7	522	February 10, 2023
textcat.teach presents same annotation task if text snippet contains multiple patterns enhancement , usage , textcat , solved	11	1668	June 3, 2019
Custom recipes tutorial not working custom , solved	6	242	July 27, 2024
Custom model Requirements usage , custom	8	2919	March 25, 2019
Saved annotation not excluded in active learning recipe bug , textcat	3	422	February 13, 2022

prodigy-recipes repo – feedback appreciated!

Related topics