Are 'Reject' examples included in textcat_multilabel train/train-curve?

Good day,

We made a mistake of treating invalid examples to be rejected instead of ignored. Are rejected examples included in the training? Is there a way to exclude them if they are? Or update the datasets in the DB to make rejects to ignored? We are on prodigy v1.11.8.

Thanks,
Joe

"It depends"

What recipes have you used? If you're using a binary annotation interface then the rejected examples could indicate the absence of a class for a classification task, in which case I don't think they will be ignored.

We created a custom recipe, that allow us to choose multiple categories for an audio file, and at the same time, also edit the transcript of the audio file. So we are using a "choice" and "text_input" as blocks. Are those binary annotation interface?

Is there a way for us to change the "reject" examples in the DB to "ignore"? So does this mean that rejected examples can affect the training of the model?

I suppose the cleanest way to do this is to do it programmatically.

You can use the db-out command to save the data on disk temporarily. Something like:

python -m prodigy db-out dataset-name dataset-old.jsonl

Then you can use a Python script that accepts the old dataset-old.jsonl file as input and changes it. You can also use a Jupyter notebook if you'd prefer that. Once you've made the changes you can save it as dataset-new.jsonl and this new dataset can be loaded in via db-in.

python -m prodigy new-dataset-name dataset-new.jsonl

Would this suffice? It think it gives you the most freedom. If you want to keep the original dataset name you can also choose to drop the dataset after it is saved on disk via prodigy drop but I recommend caution. You don't want to accidentally throw away all your annotations.

1 Like

Thank you for your suggestions. :slight_smile: