I find myself really confused by the accept/reject logic of Prodigy, and would love some help to figure it out.
I’m trying to use textcat.teach for annotating a single topic of messages (cancellation requests). There can be other topics, but I’m focusing on one topic for annotation right now.
Using patterns I bootstrapped the annotation task and I get in Prodigy texts with “cancellation” or “None” as a category, and need to accept/reject. What does exactly rejecting the “None” category means? AFAIK each category is a neuron in the output layer of the model - does “None” behaves the same?
I also tried to load previous data as a dataset to train on, using the None/cancellation categories as labels with answer=accept for all the samples, and the model couldn’t learn anything. Changing the samples to be all with label=cancellation and answer=accept/reject “fixed” the issue, but I don’t entirely get why.
The last question I have here, is what does accepting “None” means? If I focus on one topic per annotation, accepting “None” means for me that it’s not the category I’m annotating, but it may well be another category - In this case should I merge the datasets, or I cat let Prodigy read all datasets and it will figure this out automagically?
Hi! Your workflow definitely sounds good. In general, Prodigy should only ask you about the label you explicitly specified as --label and/or the labels present in your patterns. So None as a category looks suspicious and it's not something that's hard-coded into Prodigy. Giving accept/reject feedback on texts plus label is all you need.
Is it possible that your patterns ended up with a null, e.g. None value for "label"? This could explain how the string "None" became a label suggestion. (I think internally, Prodigy might do a string conversion to ensure that the label is a string – but str(None) == 'None', which is kinda unfortunate. We should probably at least show a warning if an NER or text classification label is "None" – because I'd say in 99% of the cases, this is likely not what the user wants.)
I will re-check my flow, I think that I may first created some patterns without a category, then changed it, but maybe I did this in an unclean fashion and somehow the null/None category stayed there.
Thanks for the update and sorry about that! I’m not 100% sure if there’s a reason --label isn’t a required argument on terms.to-patterns – but it definitely should be to prevent this kind of problem.