POS-tags messed up after ner.batch-train

spacy: 2.0.11
prodigy 1.4.2

Hi,

I started a new dutch-model

  • init-model with freqs, pruned vectors, clusters
  • train-model for tagger and parser (not yet for ner!)

If i use this model it will give good results;

>>> nlp.pipe_names
['tagger', 'parser']

>>> for x in doc:
...     print('%8s %12s %30s %30s' % (x.pos_, x.dep_, x.tag_, x.text))
... 
     ADP         case                        VZ|init                             In
     DET          det              LID|bep|stan|rest                             de
    NOUN          obl     N|soort|ev|basis|zijd|stan                         aanpak
     ADP         case                        VZ|init                            van
     DET          det              LID|bep|stan|rest                             de
    NOUN         nmod               N|soort|mv|basis                    wachttijden
     ADP         case                        VZ|init                             in
     DET          det              LID|bep|stan|rest                             de
    NOUN         nmod     N|soort|ev|basis|zijd|stan                            ggz

So far, so good. After this, i wanted to train the ner-pipe:

prodigy ner.batch-train NER_TOTAL_001 nl_md --output /home/prodigy/trained_20180416/  --n-iter 4 --eval-split 0.2 --label "PER,ORG,NORP,ORG_C,PER_C,GPE,LOC"


#          LOSS       RIGHT      WRONG      ENTS       SKIP       ACCURACY  
01         30.583     1923       769        2848       0          0.714                                                         
02         21.804     2205       487        3069       0          0.819                                                         
03         19.325     2246       446        3047       0          0.834                                                         
04         19.151     2267       425        3076       0          0.842 

If i now run the POS-tags, it shows all -ADJ-
(the NER is working fine now)

>>> nlp.pipe_names
['sbd', 'tagger', 'parser', 'ner']


>>> for x in doc:
  ...     print('%8s %12s %30s %30s' % (x.pos_, x.dep_, x.tag_, x.text))

 ADJ         case    ADJ|prenom|basis|met-e|stan                             In
 ADJ          det    ADJ|prenom|basis|met-e|stan                             de
 ADJ          obl    ADJ|prenom|basis|met-e|stan                         aanpak
 ADJ         case    ADJ|prenom|basis|met-e|stan                            van
 ADJ          det    ADJ|prenom|basis|met-e|stan                             de
 ADJ         nmod    ADJ|prenom|basis|met-e|stan                    wachttijden
 ADJ         case    ADJ|prenom|basis|met-e|stan                             in
 ADJ          det    ADJ|prenom|basis|met-e|stan                             de
 ADJ         nmod    ADJ|prenom|basis|met-e|stan                            ggz

I found a difference in the cfg-files in vocab/parser and vocab/tagger. I dont know if this is of any meaning? This text is added after ner.batch-train :

  "deprecation_fixes":{
    "vectors_name":"nl_model.vectors"
  },

My questions:

  • What can i do to keep the tagger return the right POS-tags (pos_ and tag_ fields)?
  • During ner.batch-train the SBD-pipe was added, can i add this in an earlier stage to the model? does this influence the tagger/parser?

Thanks,

Rob

After trying few things i came to the following outcome:

  • keeping the n-iter lower than 4 and increasing the eval-split from 0.2 to 0.4 provides a good result for pos_ and tag_ field, but also for ner

What i dont understand yet:

  • How does learning NER influence POS-tags?
  • What triggers the ner-learning that it adds the SBD-pipe?