Hi all,
Before I describe the exception I am getting, I want to give a little bit of context.
I managed to collect some automated annotation about NE (diseases in my case) on a bunch of texts and I want to use Prodigy to collect feedback on these annotations. For the moment I do not want to do active learning within Prodigy, I plan to do this a little bit later.
So, I am aware I can load my annotated text both using a custom recipe or by generating a JSONL file. To try it out, I first made a script to generate a JSONL file with the tasks.
An example will be the following:
{ "text": "alecensa as monotherapy is indicated for the first-line treatment of adult patients with anaplastic lymphoma kinase (alk)-positive advanced non-small cell lung cancer (nsclc).alecensa as monotherapy is indicated for the treatment of adult patients with alk‑positive advanced nsclc previously treated with crizotinib.", "meta": { "first_sentence": "", "source": "type", "indication_id": "4", "annotation_id": "483924" }, "spans": [ { "end": 173, "source": "type", "text": "nsclc", "rank": 0, "label": "INDICATION", "start": 168, "score": 0.5 } ] }
I tried to use ner.manual
instead of ner.teach
cause I did not want active learning at the moment.
So I run prodigy ner.manual condition_terms en_core_web_md custom.jsonl --label INDICATION
The web server starts but when I hit it, it fails with the following stacktrace:
16:49:48 - Exception when serving /get_questions
Traceback (most recent call last):
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/waitress/channel.py”, line 338, in service
task.service()
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/waitress/task.py”, line 169, in service
self.execute()
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/waitress/task.py”, line 399, in execute
app_iter = self.channel.server.application(env, start_response)
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/hug/api.py”, line 424, in api_auto_instantiate
return module.hug_wsgi(*args, **kwargs)
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/falcon/api.py”, line 244, in call
responder(req, resp, **params)
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/hug/interface.py”, line 734, in call
raise exception
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/hug/interface.py”, line 709, in call
self.render_content(self.call_function(input_parameters), request, response, **kwargs)
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/hug/interface.py”, line 649, in call_function
return self.interface(**parameters)
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/hug/interface.py”, line 100, in call
return __hug_internal_self._function(*args, **kwargs)
File “/home/ubuntu/prodigy/venv/lib/python3.5/site-packages/prodigy/app.py”, line 84, in get_questions
tasks = controller.get_questions()
File “cython_src/prodigy/core.pyx”, line 87, in prodigy.core.Controller.get_questions
File “cython_src/prodigy/core.pyx”, line 71, in iter_tasks
File “cython_src/prodigy/components/preprocess.pyx”, line 132, in add_tokens
ValueError: Mismatched tokenization. Can’t resolve span to token index 173. This can happen if your data contains pre-set spans. Make sure that the spans match spaCy’s tokenization or add a ‘tokens’ property to your task.{‘text’: ‘nsclc’, ‘end’: 173, ‘label’: ‘INDICATION’, ‘start’: 168, ‘score’: 0.5, ‘rank’: 0, ‘token_start’: 28, ‘source’: ‘type’}
The start
and end
keys have the right values, so I am confused on why it is failing. If I use the ner.teach
recipe it doesn’t complain when loading the tasks.
I am probably doing something wrong, so it would be great it you can shade some lights here.
I also though of creating my own recipe based on ner.teach
by changing prefer_uncertain(predict(stream))
into stream
. It would be great to have your opinion on this.
Many thanks for your great work.