As part of the workflow for my project I've created an annotated dataset that I want to assess using spans.manual. The dataset contains the text, start and end char and tokens and the label, and has been saved in jsonl format using srsly.write_jsonl.
When I pass the dataset to prodigy the first 2 docs are displayed with the highlighted span and corresponding label but when I get to 3+ I get the following error.
Oops, something went wrong
You might have come across a bug in Prodigy's web app – sorry about that. We'd love to fix this, so feel free to open an issue on the Prodigy Support Forum and include the steps that led to this message.
TypeError: can't access property "push", a[r] is undefined
Any ideas on why this is happening. At first I thought it could be because I'm not passing the tokens in the jsonl file but it's strange that it works for the first 2 docs and then throws the error.
Hello @BenGriffithsPEP,
thank you for your message.
For reproducing the error, I need some more information. Could you please share the command you've used for starting prodigy as well as a part of the .jsonl?
Hello @BenGriffithsPEP
thank you for your answer
I was able to reproduce the error. The problem is that, in your third task, you have a span with token_end equal to 25. However, prodigy's tokenization splits the text such that the maximum token-index for this text is 24. Changing the respective value to 24 should solve the error.
To not running into more of these errors, I would recommend you to check if you have similar cases in the rest of your data, i.e., tasks for which the span is at the end of your text and where the token_end key is higher than the maximum token-index.
I hope this helps you. If not or if you have more questions, please feel free to ask.