Can't select entity span in manual interface

andrewx24 · June 5, 2019, 7:43pm

Hello,

I’d like to highlight a custom entity called Location. I am having trouble using ner.manual to annotate certain sentences like “it should be44 W 5th St” When I try to highlight ‘44 W 5th St’ it also picks up the ‘be’. As a result, I end up skipping this annotation because I don’t want the model to learn ‘be44 W 5th St.’

Is there anything in the works to address this issue?

Thanks!

ines · June 5, 2019, 8:00pm

The reason here is likely that the tokenizer you’re using doesn’t split the text in a way that produces the entity spans: "be44" remains one token, so no token "44" exists, and no entity span can be created for it.

The manual NER interface uses pre-tokenized text, to make it easier to highlight things (the selection can “snap” to the token boundaries), and also to make issues like this more obvious, and allow you to adjust the tokenization if needed. If you were to train a model with spaCy using annotations that don’t map to valid tokens, the model won’t be able to learn anything meaningful from them, because it’ll never actually produce those tokens.

If you are using spaCy, one solution would be to adjust the tokenization rules. For example, you might want to consider adding a rule that always splits numbers following letters, if that’s common in your data. Or if this is just a single stray example, you can also just skip it.

If you’re not using spaCy, you can also always provide your own tokenization via the "tokens" key in the data – see the “Annotation taks formats” section in your PRODIGY_README.html for details.

Topic		Replies	Views
Newlines included in entity spans bug , ner	6	387	August 24, 2023
rel.manual not accepting entities because of tokenization ner , solved , relations	7	1056	April 17, 2024
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018
Matching tokenisation on pre-existing annotated data usage , ner , spacy , solved	2	553	March 27, 2020
spaCy, prodigy, annotation usage , ner , solved	2	722	February 8, 2019

Can't select entity span in manual interface

Related topics