Hi,
I created a new annotated dataset via the following command:
python -m prodigy ner.manual company_ned blank:en .\all_companies.txt --label ORG,LOCATION,DEPARTMENT,DIVISION,PRODUCT,BRAND,POSITION,LEGAL,OTHER
I annotated about 200 of these, then I wanted to add a couple of labels, leverage the baseline of the SpaCy NER model, and go back over those initial annotations to add extra tags where needed:
python -m prodigy ner.correct company_ned en_core_web_sm .\all_companies.txt --label ORG,LOCATION,DEPARTMENT,DIVISION,PRODUCT,BRAND,POSITION,LEGAL,OTHER,TYPE,SUBSIDIARY
This didn't work--the framework refused to revisit the original annotations to correct them. I did some reading and figured out that I am supposed to output the labels so far, and then read them back in as an input data set in order to revisit them. Ok, so then I did:
python -m prodigy db-out company_ned > company_ned_1.jsonl
The json-lines file produced can't be pasted here as it contains interspersed non-printable 'space' characters. However, when trying to load it into prodicgy, I get the following error:
Task exception was never retrieved
future: <Task finished coro=<RequestResponseCycle.run_asgi() done, defined at C:\Users\james\venv\science\science\lib\site-packages\uvicorn\protocols\http\h11_impl.py:383> exception=UnicodeDecodeError('utf-8', b'\xff\xfe{\x00"\x00t\x00e\x00x\x00t\x00"\x00:\x00"\x00S\x00S\x00C\x00E\x00T\x00,\x00 \x00B\x00h\x00i\x00l\x00a\x00i\x00"\x00,\x00"\x00_\x00i\x00n\x00p\x00u\x00t\x00_\x00h\x00a\x00s\x00h\x00"\x00:\x00-\x006\x006\x002\x000\x003\x003\x000\x007\x007\x00,\x00"\x00_\x00t\x00a\x00s\x00k\x00_\x00h\x00a\x00s\x00h\x00"\x00:\x00-\x001\x002\x003\x002\x007\x002\x007\x003\x006\x00,\x00"\x00t\x00o\x00k\x00e\x00n\x00s\x00"\x00:\x00[\x00{\x00"\x00t\x00e\x00x\x00t\x00"\x00:\x00"\x00S\x00S\x00C\x00E\x00T\x00"\x00,\x00"\x00s\x00t\x00a\x00r\x00t\x00"\x00:\x000\x00,\x00"\x00e\x00n\x00d\x00"\x00:\x005\x00,\x00"\x00i\x00d\x00"\x00:\x000\x00}\x00,\x00{\x00"\x00t\x00e\x00x\x00t\x00" ... ... ...
Any ideas what might have happened?
Thanks!