Possible Bug in Web Application

I'm updating a custom label in a NER model and wanted to review and approve previous annotations. The database is a mix of ner and ner_manual annotations. I decided to try out the review recipe and ran the following command:

python -m prodigy review ps_skills pslic_gold --label PSLIC --view-id ner_manual

One of the results was sort-of correct and I thought I could correct it using the manual ner process, so I tried to manually highlight the proper label and the following error occured.

The Accept/Reject buttons will continue to work, but the information about what you are accepting/rejecting is no longer displayed and the only way to reset the view is to shutdown the server and restart it.

TypeError: Cannot read property 'start' of undefined

in t
in Jss(t)
in t
in Jss(t)
in div
in t
in Jss(t)
in Unknown
in t
in Jss(t)
in div
in div
in t
in Jss(t)
in Connect(Jss(t))
in main
in div
in Shortcuts
in t
in n
in Jss(n)
in Connect(Jss(n))
in t
in t
in Connect(t)
in t
in t

Thaks for the report and sorry about that! It's pretty mysterious, but it looks very similar to the one reported here:

Can you reproduce the problem with the same example, or is it non-deterministic? There's realy only one place in the code that changed and that could cause this, and I'm pretty sure I already fixed it and it'll go out with the v1.10.1 release (likely this week) :slightly_smiling_face:

I can reproduce the error. Just to be sure, I also tried loading up several other datasets using the same process. For datasets with custom labels, I get the same error when attempting to highlight text.

I do have a dataset that is not ner labeled (accept/reject for a textcat) and highlighting text does not crash the viewer. The highlighting is different though; it doesn't look the same as ner_manual highlighting.

Okay, that's strange, maybe there's something else happening then :thinking:

Do you have a JSON example you can share that reproduces the problem so I can try it out?

I'm also getting this error running ner.correct on a model with a new custom NER label when trying to highlight. Binary buttons still work fine.

Here's an example with an issue:

What's the easiest way to get a JSON example out to investigate further?

I've also noticed some other weirdness with the highlighting, posting a GIF below. If this is a separate issue I'm happy to make another post on here, but I thought they might be related.

It looks like it's highlighting different spans than I select, then shifting the spans around after they're selected. In the example, I'm trying to select the span "mood lift". When it creates a span over "and body" instead, I try and remove that and it removes the annotation over "soreness" instead.

For more context, I only have this issue with ner.correct and not other similar views. Using ner.silver-to-gold and ner.manual both work without issue for me.

Thanks for the reports! This is strange... my initial thought was that the problem here was related to a change in the UI, but now I'm starting to think that it could be related to add_tokens and that it somehow produces incomplete data for existing spans (which would explain why it occurs in ner.correct).

If you can find that example in your dataset, could you just copy-paste that line of JSON? Alternatively, if you have JavaScript enabled (e.g. by just putting something in there, "javascript": "console.log('JS enabled')"), you'll be able to type window.prodigy.content in your browser's development console to view the current task JSON.

Edit: Okay, nevermind, I think I can reproduce it!

Edit 2: Alright, I think I got it. This was very interesting, because it turned out to have nothing to do with the actual interface at all :sweat_smile: There must have been some changes in the add_tokens logic that cause the tokens to (sometimes?) receive incorrect "id" values, e.g. the first token will be "id": 2. Still investigating that. But this also means that the good news is, there can be a temporary workaround because you can just fix the IDs.

Edit 3: Wow, this was subtle! The true cuplrit: sentence segmentation :fire: It all makes sense now (because I was really scratching my head about the UI and there was just nothing that could have explained the problem). Some background on what happens: we refactored the tokenization logic to support character-based spans and tokens. As part of that, we made the token's "id" call into spaCy's Token.i (token index in the Doc, makes a lot of sense). However, when sentences are segmented, that index reflects the sentence's first token in the doc, which could easily be 10 or whatever. This caused the tokens to be out-of-sync.

Anyway, this is pretty good news, because the easiest workaround is to just set --unsegmented (or pre-segment your text yourself if you want sentence segmentation).

We'll definitely get an updated release ready this week – should probably be able to do it all today, but don't want to overpromise.

1 Like

@ines you rock :slight_smile: thanks for the quick deep dive and temporary workaround with --unsegmented

1 Like

Okay, just released v1.10.1, which should fix the underlying problem! :tada:

2 Likes

The same issue seems to be happening in 1.11 alpha:

Version 1.11.0a4
Platform macOS-10.15.7-x86_64-i386-64bit


Can you share an example of the underlying JSON data? I'll try and reproduce this. There was a related fix we shipped in the nightly that should have been included in v11.0.a4 already, but I'll double-check that :+1:

Hi,

I was also getting this error recently - For me, when I have pre labelled data that were generated prior to loading into Prodigy. This error appears when the original text spans for labels does not align with the loaded "text" the labels were generated from. I.e. in my example there was Span labels included for text that was missing/cut off from the loaded text.

This might help someone else that is still getting this error.

2 Likes