Hey @ines I hope it's okay if I ask a follow-up in here.
I went ahead and added the encoding="utf-8" directly in my site-packages and the recipe runs now.
I get the following warning though:
entities=ent_str[:50] + "..." if len(ent_str) > 50 else ent_str,
C:\Users\x.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\spacy\training\iob_utils.py:142: UserWarning: [W030] Some entities could not be aligned in the text " II.3. Das Ausmaß der Zinsminderung richtet si..." with entities "[(121, 130, 'JUSTIZ'), (275, 289, 'JUSTIZ'), (291,...". Use
spacy.training.offsets_to_biluo_tags(nlp.make_doc(text), entities)to check the alignment. Misaligned entities ('-') will be ignored during training.
What's weird to me about this is that, once again, I'm not getting this when running any oher recipe. I used to get it during training, when I wasn't using my custom tokenizer, but I'm providing the model and callbacks to train-curve and yet I still see this issue.
This is the command I'm using:
python -m prodigy train-curve --ner train_ner_citation -m .\final-model-t2v\ -F .\functions.py
Do you have any pointers as to why this is happening here but not with other recipies?