Text does not exist in spans after NER labelling

ryanwesslen · February 2, 2023, 2:06pm

By default, annotated spans do not include the raw text by design.

You mention that it seems "random" that sometimes you do see the spans text and other times you don't. Could you be simply seeing the text for the tokens, not the spans text?

Let me describe. For example, you can view the ner_manual interface and see what the intended annotated spans should look like:

Produces:

{
  "text": "First look at the new MacBook Pro",
  "spans": [
    {"start": 22, "end": 33, "label": "PRODUCT", "token_start": 5, "token_end": 6}
  ],
  "tokens": [
    {"text": "First", "start": 0, "end": 5, "id": 0},
    {"text": "look", "start": 6, "end": 10, "id": 1},
    {"text": "at", "start": 11, "end": 13, "id": 2},
    {"text": "the", "start": 14, "end": 17, "id": 3},
    {"text": "new", "start": 18, "end": 21, "id": 4},
    {"text": "MacBook", "start": 22, "end": 29, "id": 5},
    {"text": "Pro", "start": 30, "end": 33, "id": 6}
  ]
}

Notice that the tokens have the text, but not the spans, which are the actual annotations.

Please confirm that this is consistent with what you're seeing.

The reason is that for training, spaCy only needs the start and end info, not the actual text itself.

If you do need to add the text, you can add it with something like this:

Topic		Replies	Views
Exported annotations missing text ner	2	225	November 10, 2022
Highlighting spans that are not the entities to be labeled when using ner.correct usage , ner	1	453	December 21, 2020
Confused about the structure of spans in NER examples	5	225	January 30, 2024
Is there any way to annotate text with HTML tags in it ? ner , spacy	1	27	February 25, 2025
NER overlapping datasets, meaning of lack of annotation usage , ner , best-practices	1	1189	April 25, 2019

Text does not exist in spans after NER labelling

Related topics