hi @bev.manz!
By default, annotated spans do not include the raw text by design.
You mention that it seems "random" that sometimes you do see the spans
text and other times you don't. Could you be simply seeing the text for the tokens
, not the spans
text?
Let me describe. For example, you can view the ner_manual
interface and see what the intended annotated spans should look like:
Produces:
{
"text": "First look at the new MacBook Pro",
"spans": [
{"start": 22, "end": 33, "label": "PRODUCT", "token_start": 5, "token_end": 6}
],
"tokens": [
{"text": "First", "start": 0, "end": 5, "id": 0},
{"text": "look", "start": 6, "end": 10, "id": 1},
{"text": "at", "start": 11, "end": 13, "id": 2},
{"text": "the", "start": 14, "end": 17, "id": 3},
{"text": "new", "start": 18, "end": 21, "id": 4},
{"text": "MacBook", "start": 22, "end": 29, "id": 5},
{"text": "Pro", "start": 30, "end": 33, "id": 6}
]
}
Notice that the tokens
have the text, but not the spans
, which are the actual annotations.
Please confirm that this is consistent with what you're seeing.
The reason is that for training, spaCy only needs the start
and end
info, not the actual text itself.
If you do need to add the text, you can add it with something like this: