Chinese pattern file for text classification

I just generated an examples.jsonl file with these contents.

{"text": "臉書"}
{"text": "阿里巴巴"}
{"text": "抖音"}

Next, I annotate them via textcat.manual via:

python -m prodigy textcat.manual issue-6383-2 examples2.jsonl --label company

When I now output these annotations via db-out then indeed the output does not seem utf-8 encoded.

python -m prodigy db-out issue-6383 

This yields:

{"text":"\u81c9\u66f8","_input_hash":2129430638,"_task_hash":1813097253,"label":"company","_view_id":"classification","answer":"accept","_timestamp":1677235133}
{"text":"\u963f\u91cc\u5df4\u5df4","_input_hash":786114873,"_task_hash":-1088016566,"label":"company","_view_id":"classification","answer":"accept","_timestamp":1677235134}
{"text":"\u6296\u97f3","_input_hash":-1163267003,"_task_hash":165057773,"label":"company","_view_id":"classification","answer":"accept","_timestamp":1677235134}

However, when I now save these annotations into a file and if I were to re-use these in another recipe.

python -m prodigy db-out issue-6383 > examples2.jsonl
python -m prodigy textcat.manual issue-6383-2 examples2.jsonl --label company

Then the interface is totally able to render the characters, meaning no information got lost.

This behavior is normal, and it is also explained in more detail here: