For the UnicodeDecode error, it usually happens when there is something unusual in your terminal encoding settings. Under the hood, Python tries to decode text according to UTF-8 rules. When a particular byte doesn't follow such rule, it throws this error.
Also, you might want to check your command again. If you're training a NER model, you probably want to do something like this:
prodigy train --ner <NER dataset>\
--textcat-multilabel <TCM dataset> \
--eval-split 0.2 \
# your other config ...
# <OUTPUT_DIR>
Maybe the reason why it errors out is because we're inadvertently passing a non-UTF8 file? To be sure, you can check prodigy train --help for more information.
and it worked.
I looked through the documentation to see if i could add pretrained token-to-vector weights (from spacy pretrain) but i could not find any guidance there to how to add it to the command. Can you advice me in regard to this? I saw the old way to add it would be to add the command as "init-tok2vec ./tok2vec_cd8_model289.bin" but this does not seem to work now.
This time, it now goes into your spaCy configuration file, specifically under the [initialize] section. Then you can pass that config in the --config parameter of the train command. The benefit of doing so is that you can configure your initialization step and other parameters into one file, they just now "live" in one file, and you don't need to pass a lot in your CLI command.