Ontonotes 5 Prodigy Training JSON Format

I am working on NER analysis for arabic language using Ontonotes 5 dataset. I would like to ask about the structure of the JSON file (what should the structure be?) that will be imported into prodigy dataset. If you can help me, I would really appreciate it.


Hi! You can see an example of the JSON format that Prodigy uses here: https://prodi.gy/docs/api-interfaces#ner_manual

It uses character offsets and lets you provide a list of "tokens" as well that the spans can reference. If you need to convert token-based tags to offsets, you could use the helper functions spaCy provides: https://prodi.gy/docs/named-entity-recognition#tip-biluo-offsets

I'd say that importing your data into Prodigy really only makes sense if you're planning on annotating it, either to correct the annotations or to add to them. If your goal is to train a model, you probably want to train with spaCy directly, which is more flexible and removes one layer of abstraction (because under the hood, Prodigy also just calls into spaCy).

Thank you so much for the help.