db-out utf-8 character problem

FahriBilici · July 14, 2020, 10:41am

Hello i am new at prodigy. I am labeling turkish text classification data. i exported data and its looks like this :
{"text":"4n1k ilk a\u015fk neden neden \u00e7\u0131km\u0131yor","_input_hash":1894167324,"_task_hash":-401355380,"options":[{"id":"Pozitif","text":"Pozitif"},{"id":"Negatif","text":"Negatif"},{"id":"Notr","text":"Notr"}],"_session_id":null,"_view_id":"choice","accept":["Notr"],"config":{"choice_style":"single"},"answer":"accept"}

but my original text is : 4n1k ilk aşk neden neden çıkmıyor

how can i enable utf-8 format?

ines · July 14, 2020, 10:51am

Hi! This is just the default behaviour of json.dumps, which is called under the hood to export your data. It's the safest way to represent utf8 and prevent encoding issues. When you load the text back in Python etc., the characters will look as expected again. You can re-export the data without ASCII-only characters – you just need to be careful you don't end up with encoding issues. Also see here for details:

FahriBilici · July 14, 2020, 11:00am

okay as you said its solved when i reading. thanks

Topic		Replies	Views
ner.manual and terms.to_patterns save to utf-16 usage , solved	4	447	June 26, 2021
Language with macrons (āēīōūĀĒĪŌŪ) in output usage , solved	3	666	June 26, 2020
Chinese pattern file for text classification	3	244	February 24, 2023
How to get db-out to use unicode symbols database , solved	2	1015	March 12, 2019
db-in error after db-out database , solved , windows	6	1195	February 10, 2022

db-out utf-8 character problem

Related topics