Unable to set attribute 'POS' in tokenizer exception

hi there
I just Download spacy-streamlit, and I have a model that finish training with Prodigy latest version (prodigy-1.11.4), and work so good on spacy, however when I am try to Download and use that model with spacy-streamlit, start complaining about tokenization like this:

ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for ' '. Tokenizer exceptions are only allowed to specify ORTH and NORM.

could you please help me?

spacy==3.1.2
spacy-legacy==3.0.8
spacy-streamlit==1.0.2
es-core-news-lg @ https://github.com/explosion/spacy-models/releases/download/es_core_news_lg-3.1.0/es_core_news_lg-3.1.0-py3-none-any.whl
sense2vec==2.0.0
streamlit==0.87.0

Thanks and best regards
Angelo

Hi! It looks like this isn't really related to the Streamlit app and more an issue with the model (you'd likely see this error in any environment when running your model). It basically means that for some reason, you ended up with outdated tokenizer exceptions that are setting part-of-speech tags (which isn't allowed via the tokenizer), so spaCy complains.

Could you share some more details on how the model was trained? Which base language did you use? And did you do anything custom, e.g. add custom tokenizer exceptions?

hi there Ines,
I am tryng to use the model in spacy with this command:

prodigy data-to-spacy /prodigy/fechas_NER_ --ner fechas_NER -l es

python -m spacy train /prodigy/fechas_NER_/config.cfg --paths.train /prodigy/fechas_NER_/train.spacy --paths.dev /prodigy/fechas_NER_/dev.spacy

however I am not Able to to use this model,

however if I train with a Prodigy like this:

prodigy train /prodigy/fechas_NER_ --lang es --ner fechas_NER

It is working fine.
I know that i might be doing something wrong but i could not figure out what is it.

thanks
angelo

Thanks for the details, this is really strange :thinking: I have no idea where that tokenizer exception could be coming from. I just tried it with some test data and I can't reproduce this problem.

Just a random idea but could you try it again with a clean install (new virtual environment) of Prodigy and spaCy? Maybe your installation ended up in a weird state?

If that still doesn't solve it, are you able to share your data? It's fine if you can only do it privately (you can email me at ines@explosion.ai). Then I can try and reproduce it with the exact data so we can maybe track down the problem.

hi there,
I can send yo you training output, no raw data as this information is private.
best regards
angelo