Understanding and improving POS training.

pos
usage

(Abhishek Dwaraki) #1

Hi @ines,
I am working side by side on the NER aspect and the POS tagging. I was doing some annotating and training last night with the POS tags for PROPN and VERB since I noticed that they were off in some cases. I am trying to understand the accuracy results for POS training and why they are so low in my case. I did close to 1000 annotations for my dataset and here are the results:

[Abhishek:~] [NM-NLP] $ prodigy pos.batch-train net_pos_tags en_core_web_sm --output-model /tmp/models/net_labels_ner --eval-split 0.2

Loaded model en_core_web_sm
Using 20% of accept/reject examples (82) for evaluation
Using 100% of remaining examples (329) for training
Dropout: 0.2  Batch size: 4  Iterations: 10


BEFORE     0.618
Correct    42
Incorrect  26
Unknown    1063


#          LOSS       RIGHT      WRONG      ACCURACY
01         14.438     3          68         0.042
02         15.755     4          67         0.056
03         15.684     3          68         0.042
04         15.335     4          67         0.056
05         15.803     4          67         0.056
06         15.770     4          67         0.056
07         15.586     4          65         0.058
08         15.514     4          67         0.056
09         11.585     0          62         0.000
10         11.006     0          61         0.000

Correct    4
Incorrect  65
Baseline   0.618
Accuracy   0.058

I would like to know how to understand this better, since my NER training results were pretty decent and improved with more training. In the POS case, they seem to be getting worse. Is it because I am training on coarse POS tags? Prodigy is clearly learning in the loop since the scores for the POS tags keep varying and going up with more annotations and acceptances, but these results say something else.

Thank you in advance.


(Matthew Honnibal) #2

I’m not 100% sure what the issue is here, but are you sure you did close to 1000 annotations? It looks like it’s only learning from 329, and with only the two coarse-grained labels to learn from, that could be the problem.