False Results of Trained models

ner
spacy
bug

(Zain Muhammad) #1

I annotated my doc using ner.manual, I also added an extra label named as POS, but after training, the model is showing a lot of false tagging and they are strange too. There are many words of documents labeled as work of art as result after the trained model.
Please Help, I am assuming it is because of my addition of extra label.


(Matthew Honnibal) #2

How many examples did you annotate? And did you annotate for all entity types, or only your new label?


(Zain Muhammad) #3

Almost 1400 examples, and I annotated for all entity types including the new label so there were total 19 labels altogether.


(Matthew Honnibal) #4

What’s the accuracy look like during ner.batch-train? You might just need to change the settings slightly, e.g. change the batch size to something smaller, increase the dropout, etc.


(Zain Muhammad) #5

Sir, I think its not about batch size, because just for testing purpose I train a new dataset with just 15 annotations and still the problem persists and most stop words labeled as work_of_art. Let me show you the line I am writing with the added label,
Python -m prodigy ner.manual my_dataset en_core_web_sm

C:\Users\Inv.prodigy\non_po.jsonl --label POSITION,CARDINAL,DATE,EVENT,FAC,GPE,LANGUAGE,LAW,LOC,MONEY,NORP,ORDINAL,ORG,PERCENT,PERSON,PRODUCT,QUANTITY,TIME,WORK_OF_ART

here you can see “Position” is my new added label
I still think it is because of me adding a new label.


(Zain Muhammad) #6

Please respond


(Matthew Honnibal) #7

I think we’ve found a bug that might explain this behaviour. I’m hoping we can get you a mitigation that you can use with your current installation; but failing that, we’ll definitely have a fix for the next release.


(Zain Muhammad) #8

Oh, I will be waiting then.


(Zain Muhammad) #9

I am still facing this issue, please tell a solution for this.


(Ines Montani) #10

Which version of Prodigy are you using? Are you on the latest version, v1.7.1?


(Zain Muhammad) #11

yes!


(Ines Montani) #12

Oh and which version of spaCy are you running?


(Zain Muhammad) #13

image


(Matthew Honnibal) #14

Could you provide the command you’re using for batch-train, with the arguments?

We’ve made some improvements in spaCy v2.1 to how the model learns when a new label is added. I would still expect it to work in v2.0 of spaCy too though, so I’m not sure what’s going on. It may be a question of getting the right training hyper-parameters. It’s also possible that your annotations are difficult to learn from, for instance if your definition of an entity is very different from the original model’s.


(Zain Muhammad) #15


(Matthew Honnibal) #16

Try adding the --no-missing argument. You might also try passing en_vectors_web_lg to train from a blank model, instead of training from en_core_web_sm.

Btw, it would be helpful if you could paste snippets as text, instead of as screenshots. You can use three backtick characters (```) to delimit literal text.


(Zain Muhammad) #17

Thanks alot, I will try this and will let you know.