Problem with annotation

Hello,
Could you help me please with my annotation problem? I annotated 3 classes with "prodigy textcat.manual id_dataset filename.txt --loader TXT --label Positive"
{"text":"...my text ....","_input_hash":173569426,"_task_hash":-2060165333,"label":"Positive","_session_id":null,"_view_id":"classification","answer":"accept"}
also added Negative and neutral examples with the same command. when i started to train i got wrong results.
with textcat.batch-train with flag E accuracy is 0, without E is 1. I understand that it is somethign wrong with my annotation, but i do not know what. I loaded plain text.

I also tried with --label Positive,Negative,Neutral. Than structure looks like
{"text":"...text....","_input_hash":-1984147309,"_task_hash":297950317,"options":[{"id":"POSITIVE","text":"POSITIVE"},{"id":"NEGATIVE","text":"NEGATIVE"},{"id":"Neutral","text":"Neutral"}],"_session_id":null,"_view_id":"choice","accept":["NEGATIVE"],"answer":"accept"}
But it also gave 1 as accuracy. Could you help me with it please?
Thank you.

Hi! It looks like you're training from binary examples, but all your annotations have "answer": "accept"? Is that correct? If you're training from binary annotations, you typically want examples examples where the label applies (e.g. "label": "POSITIVE", "answer": "accept") and examples where the label doesn't apply (e.g. "label": "POSITIVE", "answer": "reject").

That's likely a problem here, because the model seems to have learned to always predict that the binary decision is "accept", and this seems to always be the correct answer, so you end up with 100% accuracy.

(Btw, just a note, if you want the data to be consistent, make sure you use consistent capitalisation in the labels. Labels are case-sensitive, so you may end up with incompatible data if some annotates Negative and some annotates NEGATIVE.)

Thank you for you answer. I tried both (binary and selection from several variants).
Interesting, when i used a dataset with flags --label Positive,Negative,Neutral -E, but add 2 classes (Positive + Negative), and it worked fine. At least, I think it worked with 2 classes with flag -E and multiclass annotation

But when i added to the same dataset a third class Neutral, i got strange output.

So, it seems to be ok for 2 classes, but not for 3. even I added it in the same way as 2 before.
{"text":"....text.... .","_input_hash":1435220674,"_task_hash":1052675403,"options":[{"id":"Positive","text":"Positive"},{"id":"Negative","text":"Negative"},{"id":"Neutral","text":"Neutral"}],"_session_id":null,"_view_id":"choice","accept":["Neutral"],"answer":"accept"}
{"text":"-.....text.....","_input_hash":-1153493732,"_task_hash":-1670976934,"options":[{"id":"Positive","text":"Positive"},{"id":"Negative","text":"Negative"},{"id":"Neutral","text":"Neutral"}],"_session_id":null,"_view_id":"choice","accept":["Positive"],"answer":"accept"}
Classes were annotated with the command "prodigy textcat.manual datesetname neutral.json --label Positive,Negative,Neutral -E".
I am not sure how to add the first class correctly.

How exactly are you adding the other class? Are you re-annotating all your data from scratch? Or are you just adding more annotations to the same dataset?

If you're adding to the same dataset, that could explain what's going on, because you'd essentially end up with some annotations with 2 labels and some annotations with 3 labels, which can lead to inconsistent results.

Thank you for your answer.
I added more to the same dataset.
But when I started with Positive/Negative, I always wrote --label Positive,Negative,Neutral. At least, if i do db-out, I can see than text from the first part has also 3 classes, but accept flags are only Negative or Positive.
[{"id":"Positive","text":"Positive"},{"id":"Negative","text":"Negative"},{"id":"Neutral","text":"Neutral"}],"_session_id":null,"_view_id":"choice","accept":["Negative"],"answer":"accept"}

I will try to annotate all three classes at once. Thank you.

It works after reannotation as you suggested. Thank you for your help!

1 Like