Oops something went wrong error

Hi everyone,

I posted this question on another thread, but since that thread has been solved, I think my question isn't visible (prodigy.json setting causes web app error - #24 by stephanps)

I'm running into the same problem, but the css setting doesn't work for me. I'm running a custom coref recipe and it works but six texts in it encounters the following error:

and my prodigy.json file is empty except for {} - screenshot below:

I added "card_css": {} between the two brackets and it runs, but still gives the error message. I replaced everything with only "card_css": {} and then it doesn't start the server.
I'm running prodigy 1.11.4 on Windows 10 with Python 3.9.6. (updated to Prodigy 1.11.6, but still same problem).

Any idea what could be done?

Regards,
Stephan

Hi @stephanps , welcome to Prodigy!

You mentioned that it works except for those six texts. Is it possible to see their JSON / dictionary structure? Usually this happens when something is wrong in the indexing / data, so the UI won't display that well.

Hi @ljvmiranda921,

Thanks for the reply. The following are the first 5 lines of my dataset. Just to correct your reply: I didn't mean it works except for the 6 text, I meant it works until text 6 (actually text 5). The last text in this group is where the error comes up. However, it isn't exclusive to that, I have deleted these 5 lines from the main file and ran Prodigy again. It's fine for a few texts then it shows the error again. I can't see anything with the input data that is the probelm.

{"text": "Gr\u00e6nselukning giver ingen mening, fordi Danmark ikke er en \u00f8. Hvis du ikke stoler p\u00e5 mig, s\u00e5 lyt til \u2066@SSTbrostrom\u2069 Sundhedsstyrelsens chef der siger det samme! #corona #dkpol  https://t.co/yX26EO1gXd ", "meta": {"start_id": "1238865417248165895", "start_user": "pomaEB", "len": 5, "part": "1 of 5"}}
{"text": "\n@pomaEB @SSTbrostrom I Italien og Spanien kan man ikke flytte sig fra by til by eller mellem kommunerne. Mener du heller ikke, at det giver nogen mening? Vi v\u00e6lger at bruge forsigtigshedsprincippet. ", "meta": {"start_id": "1238865417248165895", "start_user": "pomaEB", "len": 5, "part": "2 of 5"}}
{"text": "\n@JeppeBruus @pomaEB @SSTbrostrom Husk nu p\u00e5, at forsigtighedsprincippet aldrig nogensinde m\u00e5 bruges mod borgerne, og at det kun b\u00f8r anvendes i meget kort tid. Ingen af os \u00f8nsker en totalit\u00e6r stat. Nej tak til Orwells \"1984\". ", "meta": {"start_id": "1238865417248165895", "start_user": "pomaEB", "len": 5, "part": "3 of 5"}}
{"text": "\n@LANDBOEN @pomaEB @SSTbrostrom Enig i at det er en helt ekstraordin\u00e6r situation. Forh\u00e5bentlig kort. ", "meta": {"start_id": "1238865417248165895", "start_user": "pomaEB", "len": 5, "part": "4 of 5"}}
{"text": "\n@JeppeBruus @pomaEB @SSTbrostrom En undtagelsestilstand SKAL V\u00c6RE meget kortvarig. Ellers vakler demokratiets og retsstatens fundament. Det m\u00e5 v\u00e6re selvindlysende. ", "meta": {"start_id": "1238865417248165895", "start_user": "pomaEB", "len": 5, "part": "5 of 5"}}
{"text": "-Jeg var ikke sikker p\u00e5, om du kom.\n", "meta": {"source": "/content/dagw/sektioner/opensub/opensub_233494"}}

I have also attached the full file and the custom recipe that I have for the task.

I run it via the following command:

python -m prodigy annotator_coref.manual coref_test da_core_news_trf .\DEP_annotator1_09112021.jsonl --label COREF -sl NP -W -F ./annotator_coref.py 

DEP_annotator1_09112021.jsonl (404.4 KB)

annotator_coref.jsonl (16.5 KB)

*The script is uploaded as a jsonl file, since I can't upload txt file types or py file types

Regards,
Stephan

Hey everyone,

Just want to check in to see if anyone on the Prodigy team had a chance to look into this problem.

Regards,
Stephan

Hi @stephanps , sorry it took some time, but thanks for waiting!

I was able to replicate the problem, and I think that the solution involves deleting or escaping the newlines (i.e, \n ) in your JSONL file. For JSON, a newline is treated as a control character. Think of it as like a keyword in Python. For example, this example sentence shouldn't work:

{"text": "-Jeg var ikke sikker p\u00e5, om du kom.\n", "meta": {"source": "/content/dagw/sektioner/opensub/opensub_233494"}}

But by escaping the newline symbol (\n) by prepending another slash, it should show up:

{"text": "-Jeg var ikke sikker p\u00e5, om du kom.\\n", "meta": {"source": "/content/dagw/sektioner/opensub/opensub_233494"}}

(notice the \\n)

It's up to you if you want to preserve these newlines or remove them altogether (cf. javascript - How do I handle newlines in JSON? - Stack Overflow). For example, here's the same annotations file but with deleted newlines:

deleted_newlines.jsonl (403.4 KB)

A good way to sanity-check any JSONL file is to paste them in a JSON validator. Of course, just be careful pasting data on any website especially if it's sensitive / private. Examples of JSON control characters are: backspace (\b), tabs (\t), and of course newlines (\n).

Let me know if it works and thanks for your patience :slight_smile: