hi @stefan.bartell!
Thanks for your question and welcome to the Prodigy community
This is a bit odd. Your annotated file has "answer":"accept"
tags for each record, indicating they were accepted (saved), but yes, they should include your spans
as a list of dictionaries, dictionary per span. For ner.manual
, saved annotated spans will be in spans
. See this link for what the data looks like for it.
When I did the steps below, everything worked out fine:
python3 -m prodigy ner.manual identify_dosage_non_dosage_validate_data_SB2 en_core_web_lg validate_data_dosage_annotations_SB2.jsonl --label non_dosage,dosage
Then annotated two spans, accepting them by clicking the Green "Accept" button.
Then on the next record, I clicked save at the top:
I can now go back to my terminal and shut down the server by pressing CTRL + C. When I do this, you can also confirm whether your annotation was saved in the CLI:
$ python3 -m prodigy ner.manual dosage_dataset en_core_web_lg data/validate_data_dosage_annotations_SB2.jsonl --label non_dosage,dosage
Using 2 label(s): non_dosage, dosage
Added dataset dosage_dataset to database SQLite.
✨ Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!
^C
✔ Saved 1 annotations to database SQLite
Dataset: dosage_dataset
Session ID: 2023-01-10_14-17-06
Now if I output out that file with db-out
:
python3 -m prodigy db-out dosage_dataset > dosage.jsonl
I get:
{
"text": "January 9 - 241 6 - 375mg split into 3 doses. 96m deadlift/back/shoulder session. 30m cardio. 7,872 steps. 1,640 calories at 17g (7g net) carbs, 93g fat, 128g protein. 1 5g water January 10 - 241 4 - 375mg split into 3 doses. 74m arms session. 30m cardio. 8,402 steps. 1,640 calories at 26g (15g net) carbs, 127g fat, 106g protein. 1 5g water Expected a weight drop by now so I hope it's just water retention as I've been religious with everything and eating has been on point. My circadian rhythms do usually ebb and flow where I'll get a \"whoosh\" weight drop every once in awhile. I'm gonna keep on, keepin' on.",
"_input_hash": 1201376478,
"_task_hash": -478339982,
"_is_binary": false,
"tokens": [
{
"text": "January",
"start": 0,
"end": 7,
"id": 0,
"ws": true
},
.
.
.
{
"text": ".",
"start": 612,
"end": 613,
"id": 163,
"ws": false
}
],
"_view_id": "ner_manual",
"answer": "accept",
"_timestamp": 1673378743,
"spans": [
{
"start": 12,
"end": 44,
"token_start": 3,
"token_end": 11,
"label": "dosage"
},
{
"start": 192,
"end": 224,
"token_start": 56,
"token_end": 64,
"label": "dosage"
}
]
}
These are the two spans that were saved.
Now those spans do not include by default the raw span text. You can add this by modifying your db-out
recipe:
Can you double check that you annotated correctly (e.g., highlighting the spans, clicking Accept (green) button, and saving your annotations by clicking the Save Button)?