textcat -- what is the meaning of accept/reject and the None category

beka · December 26, 2018, 8:42am

Hi there,

I find myself really confused by the accept/reject logic of Prodigy, and would love some help to figure it out.

I’m trying to use textcat.teach for annotating a single topic of messages (cancellation requests). There can be other topics, but I’m focusing on one topic for annotation right now.

Using patterns I bootstrapped the annotation task and I get in Prodigy texts with “cancellation” or “None” as a category, and need to accept/reject. What does exactly rejecting the “None” category means? AFAIK each category is a neuron in the output layer of the model - does “None” behaves the same?

I also tried to load previous data as a dataset to train on, using the None/cancellation categories as labels with answer=accept for all the samples, and the model couldn’t learn anything. Changing the samples to be all with label=cancellation and answer=accept/reject “fixed” the issue, but I don’t entirely get why.

The last question I have here, is what does accepting “None” means? If I focus on one topic per annotation, accepting “None” means for me that it’s not the category I’m annotating, but it may well be another category - In this case should I merge the datasets, or I cat let Prodigy read all datasets and it will figure this out automagically?

Thanks a lot,
Beka

ines · December 27, 2018, 11:46am

Hi! Your workflow definitely sounds good. In general, Prodigy should only ask you about the label you explicitly specified as --label and/or the labels present in your patterns. So None as a category looks suspicious and it's not something that's hard-coded into Prodigy. Giving accept/reject feedback on texts plus label is all you need.

Is it possible that your patterns ended up with a null, e.g. None value for "label"? This could explain how the string "None" became a label suggestion. (I think internally, Prodigy might do a string conversion to ensure that the label is a string – but str(None) == 'None', which is kinda unfortunate. We should probably at least show a warning if an NER or text classification label is "None" – because I'd say in 99% of the cases, this is likely not what the user wants.)

beka · December 31, 2018, 12:53pm

Thanks for your reply!

I will re-check my flow, I think that I may first created some patterns without a category, then changed it, but maybe I did this in an unclean fashion and somehow the null/None category stayed there.

beka · January 7, 2019, 7:48am

For future reference of people getting to this thread: My mistake was running ‘prodigy terms.to-patterns’ without specifying a label for them.

ines · January 7, 2019, 10:06am

Thanks for the update and sorry about that! I’m not 100% sure if there’s a reason --label isn’t a required argument on terms.to-patterns – but it definitely should be to prevent this kind of problem.

Topic		Replies	Views
Yes/no annotations with textcat.manual usage , textcat , solved	3	692	December 21, 2020
textcat.batch-train reject examples usage , textcat	1	400	September 29, 2019
Meaning of reject in textcat.manual to textcat.batch-train usage , textcat , done	4	930	May 22, 2019
"prodigy train textcat ... " doesn't discard reject/ignore examples textcat , done	4	571	February 21, 2020
Making the right selection for multi-label text categorization usage , textcat	1	389	December 7, 2021

textcat -- what is the meaning of accept/reject and the None category

Related topics