Are 'Reject' examples included in textcat_multilabel train/train-curve?

joebuckle · November 7, 2022, 1:28am

Good day,

We made a mistake of treating invalid examples to be rejected instead of ignored. Are rejected examples included in the training? Is there a way to exclude them if they are? Or update the datasets in the DB to make rejects to ignored? We are on prodigy v1.11.8.

Thanks,
Joe

koaning · November 7, 2022, 2:30pm

"It depends"

What recipes have you used? If you're using a binary annotation interface then the rejected examples could indicate the absence of a class for a classification task, in which case I don't think they will be ignored.

joebuckle · November 7, 2022, 3:52pm

We created a custom recipe, that allow us to choose multiple categories for an audio file, and at the same time, also edit the transcript of the audio file. So we are using a "choice" and "text_input" as blocks. Are those binary annotation interface?

joebuckle · November 7, 2022, 4:17pm

Is there a way for us to change the "reject" examples in the DB to "ignore"? So does this mean that rejected examples can affect the training of the model?

koaning · November 8, 2022, 1:25pm

I suppose the cleanest way to do this is to do it programmatically.

You can use the db-out command to save the data on disk temporarily. Something like:

python -m prodigy db-out dataset-name dataset-old.jsonl

Then you can use a Python script that accepts the old dataset-old.jsonl file as input and changes it. You can also use a Jupyter notebook if you'd prefer that. Once you've made the changes you can save it as dataset-new.jsonl and this new dataset can be loaded in via db-in.

python -m prodigy new-dataset-name dataset-new.jsonl

Would this suffice? It think it gives you the most freedom. If you want to keep the original dataset name you can also choose to drop the dataset after it is saved on disk via prodigy drop but I recommend caution. You don't want to accidentally throw away all your annotations.

joebuckle · November 19, 2022, 9:31am

Thank you for your suggestions.

Topic		Replies	Views
textcat.batch-train reject examples usage , textcat	1	400	September 29, 2019
Meaning of reject in textcat.manual to textcat.batch-train usage , textcat , done	4	930	May 22, 2019
"prodigy train textcat ... " doesn't discard reject/ignore examples textcat , done	4	571	February 21, 2020
Train doesn't use rejected text for binary classification textcat , done	3	441	March 17, 2020
What do the accept, reject and ignore buttons do? usage , front-end , solved	11	2771	January 12, 2023

Are 'Reject' examples included in textcat_multilabel train/train-curve?

Related topics