Got missing and glitched text in UI

I am running relations task on version 1.11 of prodigy. The command I'm using is this:

prodigy custom-dep app_v1 ./app_v1.jsonl -F ./preprocess.py

I generate the file with the tokens and spans and we've been verifying the spans match and are correct, but we end up with prodigy giving us a messed up view of the text. Any ideas?
UI:


orginal data:

UI:

orginal data:

Hi @ocelot43, welcome to Prodigy!

For your input data, it may be better to ensure that your ids are unique. This can affect UI rendering especially if the tokens and spans do not match. In your first example, the word "paragraphs" is divided into two tokens ("paragraph" and "s") yet they share the same id. Try to make them unique and adjust the spans accordingly.

This thread might also be helpful: Tokenization causes glitched text

Thanks for reply. I have found that there are duplicated id in my input data, but I think the real problem is the nest entities in labels, especially in first example. Is there any easy way to find out or may be some hits in UI?

Hi @ocelot43,

If you have nested or overlapping entities, it won't work using ner_manual UI. It's designed for named entities that cannot overlap. If you want to render and annotate overlapping spans, you can use spans_manual (Docs) instead.

1 Like