Need to create a jsonl file on python according to certain format

Schrodinger_cat · October 2, 2019, 1:50am

Hi,
I need to create a jsonl file on Python which AutoML Natural Language Entity Extraction expects,
plz, see it below:

{
  "annotations": [
     {
      "text_extraction": {
         "text_segment": {
            "end_offset": number, "start_offset": number
          }
       },
       "display_name": string
     },
     {
       "text_extraction": {
          "text_segment": {
             "end_offset": number, "start_offset": number
           }
        },
        "display_name": string
     },
   ...
  ],
  "text_snippet":
    {"content": string}
}

Any clues?
thanks in advance

ines · October 2, 2019, 8:16am

This should hopefully be pretty straightforward – if the end offset and start offset are the character offsets into the text, those are going to be the same values you have in the "spans" added by Prodigy. Not sure what the display_name is? Is that the label? And text_snippet.content is the tasks's "text".

Maybe something like this? Untested and you might have to fiddle around with it a bit ot get the format and values right. But it shouldn't be too hard.

from prodigy.components.db import connect
import srsly

db = connect()
examples = db.get_dataset("your_dataset")
converted = []
for eg in examples:
    converted.append(
        {
            "text_snippet": {"content": eg["text"]},
            "annotations": [
                {
                    "text_extraction": {
                        "text_segment": {
                            "end_offset": span["end"],
                            "start_offset": span["start"],
                        }
                    },
                    "display_name": span["label"],
                }
                for span in eg.get("spans", [])
            ],
        }
    )
srsly.write_jsonl("converted.jsonl", converted)

Topic		Replies	Views
Create a jsonl pre-populated with annoatations from .txt file usage , ner	4	1068	March 1, 2021
need help in creating own jsonl file for training the model usage , solved	9	2684	February 2, 2019
Is it possible to make Prodigy export a Tokenized JSONL file by inputting a JSON file with no annotations done on the dataset? ner , solved	1	505	October 10, 2022
JSONL with annotation for NET multi-tag for newbies usage , ner	3	659	February 14, 2022
How to creat a jsonl file with a raw text in format of .txt usage , solved	3	702	October 13, 2021

Need to create a jsonl file on python according to certain format

Related topics