empty spans and spans with no 'text' attribute

ryanwesslen · January 10, 2023, 7:56pm

Thanks for your question and welcome to the Prodigy community

This is a bit odd. Your annotated file has "answer":"accept" tags for each record, indicating they were accepted (saved), but yes, they should include your spans as a list of dictionaries, dictionary per span. For ner.manual, saved annotated spans will be in spans. See this link for what the data looks like for it.

When I did the steps below, everything worked out fine:

python3 -m prodigy ner.manual identify_dosage_non_dosage_validate_data_SB2 en_core_web_lg validate_data_dosage_annotations_SB2.jsonl --label non_dosage,dosage

Then annotated two spans, accepting them by clicking the Green "Accept" button.

Then on the next record, I clicked save at the top:

I can now go back to my terminal and shut down the server by pressing CTRL + C. When I do this, you can also confirm whether your annotation was saved in the CLI:

$ python3 -m prodigy ner.manual dosage_dataset en_core_web_lg data/validate_data_dosage_annotations_SB2.jsonl --label non_dosage,dosage
Using 2 label(s): non_dosage, dosage
Added dataset dosage_dataset to database SQLite.

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

^C
✔ Saved 1 annotations to database SQLite
Dataset: dosage_dataset
Session ID: 2023-01-10_14-17-06

Now if I output out that file with db-out:

python3 -m prodigy db-out dosage_dataset > dosage.jsonl

I get:

{
  "text": "January 9 - 241 6 - 375mg split into 3 doses. 96m deadlift/back/shoulder session. 30m cardio. 7,872 steps. 1,640 calories at 17g (7g net) carbs, 93g fat, 128g protein. 1 5g water January 10 - 241 4 - 375mg split into 3 doses. 74m arms session. 30m cardio. 8,402 steps. 1,640 calories at 26g (15g net) carbs, 127g fat, 106g protein. 1 5g water Expected a weight drop by now so I hope it's just water retention as I've been religious with everything and eating has been on point. My circadian rhythms do usually ebb and flow where I'll get a \"whoosh\" weight drop every once in awhile. I'm gonna keep on, keepin' on.",
  "_input_hash": 1201376478,
  "_task_hash": -478339982,
  "_is_binary": false,
  "tokens": [
    {
      "text": "January",
      "start": 0,
      "end": 7,
      "id": 0,
      "ws": true
    },
    .
    .
    .
    {
      "text": ".",
      "start": 612,
      "end": 613,
      "id": 163,
      "ws": false
    }
  ],
  "_view_id": "ner_manual",
  "answer": "accept",
  "_timestamp": 1673378743,
  "spans": [
    {
      "start": 12,
      "end": 44,
      "token_start": 3,
      "token_end": 11,
      "label": "dosage"
    },
    {
      "start": 192,
      "end": 224,
      "token_start": 56,
      "token_end": 64,
      "label": "dosage"
    }
  ]
}

These are the two spans that were saved.

Now those spans do not include by default the raw span text. You can add this by modifying your db-out recipe:

Can you double check that you annotated correctly (e.g., highlighting the spans, clicking Accept (green) button, and saving your annotations by clicking the Save Button)?

Topic		Replies	Views
Exported annotations missing text ner	2	225	November 10, 2022
Issues with db-in and CSV usage , database , solved	1	658	June 17, 2020
Confused about the structure of spans in NER examples	5	229	January 30, 2024
spans.manual merge tokens using db-out usage	4	357	November 18, 2022
TypeError when reviewing annotations spans.manual spancat	3	289	January 6, 2023

empty spans and spans with no 'text' attribute

Related topics