I am working with a language with macrons (āēīōūĀĒĪŌŪ), and need the output to include these not just the unicode string. Is there a (simple) way to do this? I'm a beginner and don't know too much,
Hi! This is just the default way JSON is written to a file when you call
json.dumps. When you load the string back in (e.g. by calling
json.loads or similar), the characters will be represented as the regular unicode characters.
Hi thanks, this has happened using Prodigy's "db out" in the terminal on a Mac, how do I fix it from there?
Yes, under the hood, that justs saves the JSON. If you load the exported data back into Python, you'll see the original unicode characters. For example:
import srsly data = srsly.read_jsonl("/path/to/your_file.jsonl") print(data)
Saving data as ASCII (e.g. with
\u) is how JSON data is saved by default in Python – this has nothing to do with Prodigy. It's a useful default, because it prevents encoding issues.
You can decide to not store your text as ASCII by saving out the data again like this – just be careful, because if you open the file again on a different machine with a different encoding, you may not be able to see those characters.