I tried so many ways to remove spaces and non text characters but the JSONLs are rejected with the "charmap codec can't decode byte 0x9d" error.
Please help. I tried on basic text, it works perfectly fine. The dataset I have are resumes written by many different people, in PDF and DOCX formats. I converted them with and without UTF8 but with no luck.