empty spans and spans with no 'text' attribute

hi @stefan.bartell!

Thanks for your question and welcome to the Prodigy community :wave:

This is a bit odd. Your annotated file has "answer":"accept" tags for each record, indicating they were accepted (saved), but yes, they should include your spans as a list of dictionaries, dictionary per span. For ner.manual, saved annotated spans will be in spans. See this link for what the data looks like for it.

When I did the steps below, everything worked out fine:

python3 -m prodigy ner.manual identify_dosage_non_dosage_validate_data_SB2 en_core_web_lg validate_data_dosage_annotations_SB2.jsonl --label non_dosage,dosage

Then annotated two spans, accepting them by clicking the Green "Accept" button.

Then on the next record, I clicked save at the top:

I can now go back to my terminal and shut down the server by pressing CTRL + C. When I do this, you can also confirm whether your annotation was saved in the CLI:

$ python3 -m prodigy ner.manual dosage_dataset en_core_web_lg data/validate_data_dosage_annotations_SB2.jsonl --label non_dosage,dosage
Using 2 label(s): non_dosage, dosage
Added dataset dosage_dataset to database SQLite.

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

^C
✔ Saved 1 annotations to database SQLite
Dataset: dosage_dataset
Session ID: 2023-01-10_14-17-06

Now if I output out that file with db-out:

python3 -m prodigy db-out dosage_dataset > dosage.jsonl

I get:

{
  "text": "January 9 - 241 6 - 375mg split into 3 doses. 96m deadlift/back/shoulder session. 30m cardio. 7,872 steps. 1,640 calories at 17g (7g net) carbs, 93g fat, 128g protein. 1 5g water January 10 - 241 4 - 375mg split into 3 doses. 74m arms session. 30m cardio. 8,402 steps. 1,640 calories at 26g (15g net) carbs, 127g fat, 106g protein. 1 5g water Expected a weight drop by now so I hope it's just water retention as I've been religious with everything and eating has been on point. My circadian rhythms do usually ebb and flow where I'll get a \"whoosh\" weight drop every once in awhile. I'm gonna keep on, keepin' on.",
  "_input_hash": 1201376478,
  "_task_hash": -478339982,
  "_is_binary": false,
  "tokens": [
    {
      "text": "January",
      "start": 0,
      "end": 7,
      "id": 0,
      "ws": true
    },
    .
    .
    .
    {
      "text": ".",
      "start": 612,
      "end": 613,
      "id": 163,
      "ws": false
    }
  ],
  "_view_id": "ner_manual",
  "answer": "accept",
  "_timestamp": 1673378743,
  "spans": [
    {
      "start": 12,
      "end": 44,
      "token_start": 3,
      "token_end": 11,
      "label": "dosage"
    },
    {
      "start": 192,
      "end": 224,
      "token_start": 56,
      "token_end": 64,
      "label": "dosage"
    }
  ]
}

These are the two spans that were saved.

Now those spans do not include by default the raw span text. You can add this by modifying your db-out recipe:

Can you double check that you annotated correctly (e.g., highlighting the spans, clicking Accept (green) button, and saving your annotations by clicking the Save Button)?