textcat -- what is the meaning of accept/reject and the None category

done
textcat
solved

#1

Hi there,

I find myself really confused by the accept/reject logic of Prodigy, and would love some help to figure it out.

I’m trying to use textcat.teach for annotating a single topic of messages (cancellation requests). There can be other topics, but I’m focusing on one topic for annotation right now.

Using patterns I bootstrapped the annotation task and I get in Prodigy texts with “cancellation” or “None” as a category, and need to accept/reject. What does exactly rejecting the “None” category means? AFAIK each category is a neuron in the output layer of the model - does “None” behaves the same?

I also tried to load previous data as a dataset to train on, using the None/cancellation categories as labels with answer=accept for all the samples, and the model couldn’t learn anything. Changing the samples to be all with label=cancellation and answer=accept/reject “fixed” the issue, but I don’t entirely get why.

The last question I have here, is what does accepting “None” means? If I focus on one topic per annotation, accepting “None” means for me that it’s not the category I’m annotating, but it may well be another category - In this case should I merge the datasets, or I cat let Prodigy read all datasets and it will figure this out automagically?

Thanks a lot,
Beka


(Ines Montani) #2

Hi! Your workflow definitely sounds good. In general, Prodigy should only ask you about the label you explicitly specified as --label and/or the labels present in your patterns. So None as a category looks suspicious and it’s not something that’s hard-coded into Prodigy. Giving accept/reject feedback on texts plus label is all you need.

Is it possible that your patterns ended up with a null, e.g. None value for "label"? This could explain how the string "None" became a label suggestion. (I think internally, Prodigy might do a string conversion to ensure that the label is a string – but str(None) == 'None', which is kinda unfortunate. We should probably at least show a warning if an NER or text classification label is "None" – because I’d say in 99% of the cases, this is likely not what the user wants.)


#3

Thanks for your reply!

I will re-check my flow, I think that I may first created some patterns without a category, then changed it, but maybe I did this in an unclean fashion and somehow the null/None category stayed there.


#4

For future reference of people getting to this thread: My mistake was running ‘prodigy terms.to-patterns’ without specifying a label for them.


(Ines Montani) #5

Thanks for the update and sorry about that! I’m not 100% sure if there’s a reason --label isn’t a required argument on terms.to-patterns – but it definitely should be to prevent this kind of problem.