I just generated an examples.jsonl
file with these contents.
{"text": "臉書"}
{"text": "阿里巴巴"}
{"text": "抖音"}
Next, I annotate them via textcat.manual
via:
python -m prodigy textcat.manual issue-6383-2 examples2.jsonl --label company
When I now output these annotations via db-out
then indeed the output does not seem utf-8
encoded.
python -m prodigy db-out issue-6383
This yields:
{"text":"\u81c9\u66f8","_input_hash":2129430638,"_task_hash":1813097253,"label":"company","_view_id":"classification","answer":"accept","_timestamp":1677235133}
{"text":"\u963f\u91cc\u5df4\u5df4","_input_hash":786114873,"_task_hash":-1088016566,"label":"company","_view_id":"classification","answer":"accept","_timestamp":1677235134}
{"text":"\u6296\u97f3","_input_hash":-1163267003,"_task_hash":165057773,"label":"company","_view_id":"classification","answer":"accept","_timestamp":1677235134}
However, when I now save these annotations into a file and if I were to re-use these in another recipe.
python -m prodigy db-out issue-6383 > examples2.jsonl
python -m prodigy textcat.manual issue-6383-2 examples2.jsonl --label company
Then the interface is totally able to render the characters, meaning no information got lost.
This behavior is normal, and it is also explained in more detail here: