Ontonotes 5 Prodigy Training JSON Format

hmousa961 · June 7, 2021, 12:46pm

Hello,
I am working on NER analysis for arabic language using Ontonotes 5 dataset. I would like to ask about the structure of the JSON file (what should the structure be?) that will be imported into prodigy dataset. If you can help me, I would really appreciate it.

Thanks.

ines · June 11, 2021, 12:12am

Hi! You can see an example of the JSON format that Prodigy uses here: https://prodi.gy/docs/api-interfaces#ner_manual

It uses character offsets and lets you provide a list of "tokens" as well that the spans can reference. If you need to convert token-based tags to offsets, you could use the helper functions spaCy provides: https://prodi.gy/docs/named-entity-recognition#tip-biluo-offsets

I'd say that importing your data into Prodigy really only makes sense if you're planning on annotating it, either to correct the annotations or to add to them. If your goal is to train a model, you probably want to train with spaCy directly, which is more flexible and removes one layer of abstraction (because under the hood, Prodigy also just calls into spaCy).

hmousa961 · June 15, 2021, 7:25am

Thank you so much for the help.

Topic		Replies	Views
Converting SpaCy training json file to Prodigy jsonl format usage , spacy	9	3014	April 17, 2023
prodigy ner train error iob translated to json annotation data usage , ner , training	3	618	March 28, 2022
Prodigy annotations to SpaCy train spacy	13	5617	January 31, 2018
Create a dataset out of many txt_files documents (Best Practice) usage , ner , best-practices	4	1821	March 30, 2021
Converting data to Prodigy's format Getting Started usage , ner	1	1566	December 5, 2018

Ontonotes 5 Prodigy Training JSON Format

Related topics