I have been trying to highlight certain spans in the classification view, using the textcat.teach recpipe.
Unfortunately, I wasn't even able to reproduce the example from the docs listed under classification:
{
"text": "Apple updates its analytics service with new metrics",
"spans": [
{
"start": 0,
"end": 5,
"label": "ORG"
}
]
}
When running textcat.teach on a jsonl data, the spans are being ignored. I have read the docs cover to cover but didn't find any additional clues. What am I missing?
I think the problem here is that the textcat.teach recipe and annotation model will reset the "spans", because it also uses them internally if you're running the long-text classification mode (to highlight the sentences you're classifying). So allowing pre-set spans here could cause conflicts. So the classification interface itself supports rendering spans, but the recipe has claimed them.
One simple workaround could be to store the spans as "_spans" and then overwrite them after you run the model, sorter etc. on the stream. Like this:
def overwrite_spans(stream):
for eg in stream:
eg["spans"] = eg["_spans"]
yield eg
# at the very end of the recipe
stream = overwrite_spans(stream)