ValueError: [T003] Resizing pretrained Tagger models is not currently supported.

Hello, I used pos.correct to manually correct spaCy's predictions. After completing the annotations, I tried to train the model using the annotated dataset (pos_correct_v2) but it gave me the following error:

ValueError: [T003] Resizing pretrained Tagger models is not currently supported.

Could you please help me in determining what I am doing wrong? Below is the full code:

python -m prodigy train tagger pos_correct_v2 en_core_web_md

Loaded model 'en_core_web_md'
Created and merged data for 46 total examples
Traceback (most recent call last):
File "C:\Users\Name\AppData\Local\Continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\Name\AppData\Local\Continuum\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\Name\AppData\Local\Continuum\anaconda3\lib\site-packages\prodigy_main
.py", line 60, in
controller = recipe(args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 213, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "C:\Users\Name\AppData\Local\Continuum\anaconda3\lib\site-packages\plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "C:\Users\Name\AppData\Local\Continuum\anaconda3\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func(
(args + varargs + extraopts), **kwargs)
File "C:\Users\Name\AppData\Local\Continuum\anaconda3\lib\site-packages\prodigy\recipes\train.py", line 102, in train
pipe.add_label(label)
File "pipes.pyx", line 565, in spacy.pipeline.pipes.Tagger.add_label
ValueError: [T003] Resizing pretrained Tagger models is not currently supported.

Hi! The error could maybe be a little more specific, sorry about that. What it's trying to tell you is that spaCy currently doesn't suppoort adding more labels to an existing pretrained tagger. So your training data seems to include labels that have not been added to the model.

One possible explanation is that the annotations were collected using coarse-grained tags like VERB (and not the fine-grained tags like VBZ etc. that are the underlying labels in the model). If that's the case, the easiest workaround would be to set the --binary flag to use Prodigy's annotation model that can handle coarse-grained tag.

You could also use Prodigy to turn your coarse-grained POS tag annotations into fine-grained annotations by streaming in the data again and adding multiple-choice options based on the possible options in the tag map (nlp.Defaults.tag_map) – for instance, VBD, VBP, VBZ and so on for VERB. This makes sense if you want the fine-grained distinction in your data – if not, it's probably overkill.

Hi Ines, thank you for your quick reply!

So pos.correct should only be used with fine-grained tags - is that correct? Because when I used pos.correct in Prodigy with the coarse grained tags (which is the default), I did not add any new tags - I only updated the existing pre-selected annotations.

I don't really need the fine-grained tags, but if I understand correctly, that's the only way to add/change the annotations suggested by the model - is that correct?

You can use both – it's just that for the coarse-grained tags, we need a bit of extra "magic", which is only available in Prodigy, to be able to update the model with the information. (For instance, if we know that something is a VERB, but we don't know if it's actually VBD, VBZ etc., we can still update the model towards the VERB.) If you add the --binary flag when you run the train command, it will use Prodigy's annotation model with the extra logic needed for coarse-grained tags.

(We should probably make this clear in the docs and also solve this more elegantly in the future – parts of this were a little tricky while we needed to preserve backwards-compatibility.)

Gotcha, now I understand. Definitely would be helpful to have this explained in the docs.

...but unfortunately, when I added the --binary flag when I ran the train command, I still got the same error.

python -m prodigy train tagger pos_correct_v2 en_core_web_md --binary

Is there something I am doing wrong? Or something more I should be doing?

Ahhh I think there's a second problem here that I missed earlier: the train recipe also adds all labels present in the data to the model, which makes sense for all other scenarios – except the one where you have coarse-grained part-of-speech tags. What happens if you comment out those lines (line 101-102 in prodigy/recipes/train.py)?

Alternatively, you could also just use the previous pos.batch-train recipe. It's still included with Prodigy and the plan it to replace it with the new train recipe. But since this one use case isn't fully covered yet, there's nothing wrong with using the old recipe :slightly_smiling_face: