JSONL files are not opening citing a charmap codec can't decode byte 0x9d


I tried so many ways to remove spaces and non text characters but the JSONLs are rejected with the "charmap codec can't decode byte 0x9d" error.

Please help. I tried on basic text, it works perfectly fine. The dataset I have are resumes written by many different people, in PDF and DOCX formats. I converted them with and without UTF8 but with no luck.

ANY IDEAS?? Thanks

Hi @kalhosni!

Thanks for your question and welcome to the Prodigy community :wave:

Does this help?

If not, can you run prodigy stats and provide your Prodigy version/OS?