Text does not exist in spans after NER labelling

For some reason some of my labelled data is missing 'text' within spans entirely, I saw a similar post but they had the item 'text' but an empty string, where in my case it seems 'text' does not exist at all.

This seems to be entirely random and basically only has occured for a few entities, only have noticed after evaluating my trained model that the entities with the lowest F1 score seem to have no 'text' within the 'span' and I presume that the model would rely on 'text' also existing alongside the char span?

Here is an example of the output of the issue:
I live alone, and I need help!
[{'start': 0, 'end': 12, 'token_start': 0, 'token_end': 2, 'label': 'LIVING_ALONE'}]

When in some other cases the output has come out correctly I'm confused what could be causing this as I am not doing anything differently in any of these cases. Is this just a bug?

hi @bev.manz!

By default, annotated spans do not include the raw text by design.

You mention that it seems "random" that sometimes you do see the spans text and other times you don't. Could you be simply seeing the text for the tokens, not the spans text?

Let me describe. For example, you can view the ner_manual interface and see what the intended annotated spans should look like:


  "text": "First look at the new MacBook Pro",
  "spans": [
    {"start": 22, "end": 33, "label": "PRODUCT", "token_start": 5, "token_end": 6}
  "tokens": [
    {"text": "First", "start": 0, "end": 5, "id": 0},
    {"text": "look", "start": 6, "end": 10, "id": 1},
    {"text": "at", "start": 11, "end": 13, "id": 2},
    {"text": "the", "start": 14, "end": 17, "id": 3},
    {"text": "new", "start": 18, "end": 21, "id": 4},
    {"text": "MacBook", "start": 22, "end": 29, "id": 5},
    {"text": "Pro", "start": 30, "end": 33, "id": 6}

Notice that the tokens have the text, but not the spans, which are the actual annotations.

Please confirm that this is consistent with what you're seeing.

The reason is that for training, spaCy only needs the start and end info, not the actual text itself.

If you do need to add the text, you can add it with something like this: