spacy: 2.0.11
prodigy 1.4.2
Hi,
I started a new dutch-model
- init-model with freqs, pruned vectors, clusters
- train-model for tagger and parser (not yet for ner!)
If i use this model it will give good results;
>>> nlp.pipe_names
['tagger', 'parser']
>>> for x in doc:
... print('%8s %12s %30s %30s' % (x.pos_, x.dep_, x.tag_, x.text))
...
ADP case VZ|init In
DET det LID|bep|stan|rest de
NOUN obl N|soort|ev|basis|zijd|stan aanpak
ADP case VZ|init van
DET det LID|bep|stan|rest de
NOUN nmod N|soort|mv|basis wachttijden
ADP case VZ|init in
DET det LID|bep|stan|rest de
NOUN nmod N|soort|ev|basis|zijd|stan ggz
So far, so good. After this, i wanted to train the ner-pipe:
prodigy ner.batch-train NER_TOTAL_001 nl_md --output /home/prodigy/trained_20180416/ --n-iter 4 --eval-split 0.2 --label "PER,ORG,NORP,ORG_C,PER_C,GPE,LOC"
# LOSS RIGHT WRONG ENTS SKIP ACCURACY
01 30.583 1923 769 2848 0 0.714
02 21.804 2205 487 3069 0 0.819
03 19.325 2246 446 3047 0 0.834
04 19.151 2267 425 3076 0 0.842
If i now run the POS-tags, it shows all -ADJ-
(the NER is working fine now)
>>> nlp.pipe_names
['sbd', 'tagger', 'parser', 'ner']
>>> for x in doc:
... print('%8s %12s %30s %30s' % (x.pos_, x.dep_, x.tag_, x.text))
ADJ case ADJ|prenom|basis|met-e|stan In
ADJ det ADJ|prenom|basis|met-e|stan de
ADJ obl ADJ|prenom|basis|met-e|stan aanpak
ADJ case ADJ|prenom|basis|met-e|stan van
ADJ det ADJ|prenom|basis|met-e|stan de
ADJ nmod ADJ|prenom|basis|met-e|stan wachttijden
ADJ case ADJ|prenom|basis|met-e|stan in
ADJ det ADJ|prenom|basis|met-e|stan de
ADJ nmod ADJ|prenom|basis|met-e|stan ggz
I found a difference in the cfg-files in vocab/parser and vocab/tagger. I dont know if this is of any meaning? This text is added after ner.batch-train :
"deprecation_fixes":{ "vectors_name":"nl_model.vectors" },
My questions:
- What can i do to keep the tagger return the right POS-tags (pos_ and tag_ fields)?
- During ner.batch-train the SBD-pipe was added, can i add this in an earlier stage to the model? does this influence the tagger/parser?
Thanks,
Rob