Error storing spans with ner.mark

I’m getting an error when I shut down the server after running ner.mark. It looks like the server saved the raw annotations, but not the spans that go with them.

$ prodigy ner.mark names_ner en_core_web_lg names_ner_reject.jsonl --label PERSON
Storing raw annotations in dataset 'names_ner_raw' and spans in 'names_ner'

  ✨  Starting the web server at http://localhost:8080 ...
  Open the app in your browser and start annotating!

^C
Saved 23 annotations to database SQLite
Dataset: names_ner_raw
Session ID: 2017-12-29_17-40-23

Traceback (most recent call last):
  File "/home/wff/miniconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/wff/miniconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/wff/miniconda3/lib/python3.6/site-packages/prodigy/__main__.py", line 244, in <module>
    server(controller, controller.config)
  File "/home/wff/miniconda3/lib/python3.6/site-packages/prodigy/app.py", line 40, in server
    controller.save()
  File "cython_src/prodigy/core.pyx", line 119, in prodigy.core.Controller.save
  File "/home/wff/miniconda3/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 179, in on_exit
    end = eg['spans'][accept[-1]]['end']
IndexError: list index out of range

This is in prodigy 1.1.0.

Thanks a lot! This definitely looks like something went wrong while converting the “raw” annotations to the "spans" format. Maybe the data contains an edge case we didn’t consider…

If you have a second, could you do us a favour and run the following script and see which example it fails on? And if your data is not too sensitive, could you post the example it prints (or a modified version of it, if the data is sensitive)?

from prodigy.components.db import connect
import copy

db = connect()

raw = db.get_dataset('names_ner_raw')
for eg in raw:
    if eg.get('accept') and eg['spans']:
        eg = copy.deepcopy(eg)
        accept = eg.pop('accept')
        # this should be the part it fails on
        try:
            end = eg['spans'][accept[-1]]['end']
        except IndexError:
            print(eg, accept)

Thanks for the quick response, this is the output I’m seeing:

{'text': '6,000 Units ****** * *******, RN', 'spans': [{'start': 0, 'end': 5, 'text': '6,000'}, {'start': 6, 'end': 11, 'text': 'Units'}, {'start': 12, 'end': 18, 'text': '******'}, {'start': 19, 'end': 20, 'text': '*'}, {'start': 21, 'end': 28, 'text': '*******'}, {'start': 28, 'end': 29, 'text': ','}, {'start': 30, 'end': 32, 'text': 'RN'}], 'meta': {'score': 0.0092569148}, '_input_hash': 1353030142, '_task_hash': 1743419235, 'answer': 'accept', 'label': 'PERSON'} [2, 3, 4, 5, 6, 7]
{'text': '******* * ********, RN', 'spans': [{'start': 0, 'end': 7, 'text': '*******'}, {'start': 8, 'end': 9, 'text': '*'}, {'start': 10, 'end': 18, 'text': '********'}, {'start': 18, 'end': 19, 'text': ','}, {'start': 20, 'end': 22, 'text': 'RN'}], 'meta': {'score': 0.3698282278}, '_input_hash': -1728385495, '_task_hash': 1696672218, 'answer': 'accept', 'label': 'PERSON'} [0, 1, 2, 3, 4, 5]

@erikwiffin Thanks a lot for sharing the output – will investigate this!

Btw, good news: We’ll likely kill the ner.mark recipe and boundaries interface in favour of a better and more flexible recipe and interface that lets the annotator select tokens by highlighting them, add multiple spans per task and also pick the label. Check out this thread for more details and a little interactive demo of how this will work :blush:

I saw that - looking forward to it!

1 Like