Need to create a jsonl file on python according to certain format

Hi,
I need to create a jsonl file on Python which AutoML Natural Language Entity Extraction expects,
plz, see it below:

{
  "annotations": [
     {
      "text_extraction": {
         "text_segment": {
            "end_offset": number, "start_offset": number
          }
       },
       "display_name": string
     },
     {
       "text_extraction": {
          "text_segment": {
             "end_offset": number, "start_offset": number
           }
        },
        "display_name": string
     },
   ...
  ],
  "text_snippet":
    {"content": string}
}

Any clues?
thanks in advance

This should hopefully be pretty straightforward – if the end offset and start offset are the character offsets into the text, those are going to be the same values you have in the "spans" added by Prodigy. Not sure what the display_name is? Is that the label? And text_snippet.content is the tasks's "text".

Maybe something like this? Untested and you might have to fiddle around with it a bit ot get the format and values right. But it shouldn't be too hard.

from prodigy.components.db import connect
import srsly

db = connect()
examples = db.get_dataset("your_dataset")
converted = []
for eg in examples:
    converted.append(
        {
            "text_snippet": {"content": eg["text"]},
            "annotations": [
                {
                    "text_extraction": {
                        "text_segment": {
                            "end_offset": span["end"],
                            "start_offset": span["start"],
                        }
                    },
                    "display_name": span["label"],
                }
                for span in eg.get("spans", [])
            ],
        }
    )
srsly.write_jsonl("converted.jsonl", converted)