Random crash of NER UI while annotating

I've had prodigy for less than a day, I'm very happy with what this tool has to offer, so thank you for creating it. Now to my issue:
I was using the ner.correct recipe on a dataset of about 1100 lines. Suddenly my prodigy web app crashed, with the following error.

TypeError: Cannot read property 'start' of undefined

in t
in Jss(t)
in div
in t
in Jss(t)
in Unknown
in t
in Jss(t)
in div
in div
in t
in Jss(t)
in Connect(Jss(t))
in main
in div
in Shortcuts
in t
in n
in Jss(n)
in Connect(Jss(n))
in t
in t
in Connect(t)
in t
in t

Hi! Did this happen while annotating the data, or when a new example was loaded? The error indicates that the app has come across unexpected data format, probably in the spans or tokens :thinking: How does the data you're loading in look? Do you have any pre-defined "spans" or "tokens" in there?

How does the data you're loading in look?

Well it's just a simple 1 line sentence. Containing a couple words and numbers.

Did this happen while annotating the data, or when a new example was loaded?

While annotating.

Do you have any pre-defined "spans" or "tokens" in there?

Negative.

Thanks and hmmm, this is very mysterious, I've never seen anything like this come up before :thinking: I doubt it'd be related to an indvidual example and I'm pretty sure it must be caused by something added in v1.10, otherwise a similar problem would have at least come up once before.

It's unlikely that it's related to the tokens added to the data, because then more things would be broken and the rendering would fail immediately. There's only one relevant part in that particular UI that changed and I don't quite understand how that could cause a problem... but I'll add a safeguard around it and if you never see the problem again with the next version v1.10.1, then that was it.

Btw, you don't have --highlight-chars enabled, do you?

Thank you for your reply.

Btw, you don't have --highlight-chars enabled, do you?

I do not, didn't even know about it :stuck_out_tongue:

1 Like

Okay, solved it – turns out it had nothing to do with the interface at all and came down to a very subtle bug in the sentence segmentation, introduced in v1.10.

You can read more about the background here if you're interested.

Edit: Just released v1.10.1, which should fix the underlying problem :tada:

Hello, @ines i have been using this tool for a few months now and am quite pleased with it. Thanks a lot for creating it and the whole spacy ecosystem.

I am facing the same issue where the app crashes with error "TypeError: Cannot read property 'start' of undefined" , i tried it with v1.10.5, 10.10.7 of prodigy. I tried with by setting 'unsegmented' flag but that didnt seem to work either.

To describe the workflow in more detail, I am using a custom ner recipe to label spans and have an sklearn model to aid the annotator. To serve the predictions of the model into prodigy i am setting the 'spans' attribute.
When i start my recipe i see that the span has been correctly predicted (the char offsets look correct) but on the rendered text the span is a single char in a completely different place. I cannot update the span in any way and when i try to create a different span, prodigy crashes with the above error.
Another thing to note is that when there is no span predicted by the classifier i am able to annotate as usual which would point to setting the spans attribute as the problem

code to set the spans:
eg["spans"] = [ {"start": s[1], "end":s[2] , "label":s[0] } for s in spans]

Hi! Can you share some more details on the recipe that you're running (I assume it's a custom recipe)? And when you're adding tokens to the incoming examples, are you doing this before or after the spans are set? (If it's not happening after the spans are set, the spans won't include token information like token_start and token_hand, which can definitely be a problem).

From the error, it sounds like there's a mismatch in the annotated spans and the actual text and tokens, which is also indicated by the span being in the "wrong place". So another thing to double-check is that the span start and end values are correct and describe the right character offsets (not token IDs!). You can use spaCy's Doc.char_span to double-check this, e.g. nlp(text).char_span(start, end).

I was creating the spans after the tokens were created. Creating them before solved the issue.
Thanks a lot!