Textcat using span overlapping view

darrylestrada97 · March 15, 2023, 4:48pm

Hi!, I am trying to use the overlapping feature of spancat inside the textcat, by adding the span key inside my dataset, but all I get is the regular ner view, it doesnt allow overlapping. For example my data looks like this.

{"text":"Biomaterials and medical devices are broadly used in the diagnosis, treatment, repair, replacement or enhancing functions of human tissues or organs. Although the living conditions of human beings have been steadily improved in most parts of the world. ","label":"ID: 27047681","spans":[{ "start": 0, "end": 12, "label": "ORG" },{ "start": 0, "end": 12, "label": "ORG_2" }]}

and I am getting a view like this one.

is there anyway I can change this behavior and use the spancat viewer instead, so that I can have overlapping mentions.

Thanks.

ryanwesslen · March 15, 2023, 6:14pm

hi @darrylestrada97!

Thanks for your question!

Are you looking for this?

I wrote a custom recipe that does this:

gist.github.com

https://gist.github.com/wesslen/31c44ca0f83242c512772dcfe15a81fc

textcat-manual-spans.py

from typing import Iterable, List, Optional, Union

import prodigy
from prodigy.components.loaders import get_stream
from prodigy.components.preprocess import add_tokens
from prodigy.util import split_string

import spacy
from spacy.language import Language

This file has been truncated. show original

python -m prodigy textcat.manual.spans issue-6434 blank:en overlapping.jsonl -F textcat-manual-spans.py

It's a bit of a hack, but essentially you need to pass the stream through get_tokens(), which will add tokens to the stream (see line 42 of gist), which will switch it to spans_manual. If you remove this line, then it'll opt for the ner_manual interface, i.e., non-overlapping spans.

I didn't get a chance to dig deeply in the UI's. But my hypothesis on why this works is that the spans_manual interface may need tokens; hence, you need to tokenize the stream. In order to do this, you need to add in a spaCy tokenizer (i.e., notice that textcat.manual does not require a model as it doesn't do tokenization by default).

I'll raise this point to the Prodigy front-end leads to see if this is intended behavior and see if we need to make any changes.

Does this work-around solve your problem for now? I'll post back if we make changes on this in the future.

darrylestrada97 · March 16, 2023, 3:25pm

It looks like what I wanted but the flag --label is not working, how could I had the labels?

magdaaniol · March 16, 2023, 5:16pm

Hey @darrylestrada97,

Great job on modifying Ryan's snippet to add the labels! It looks to me like your version does the job. Let me know if you're still miss anything.

Topic		Replies	Views
'text' field of the 'view_id' is being overwritten by the result of span labeling textcat	1	215	March 31, 2023
Spans not displayed in classification view enhancement , textcat , front-end , solved	2	490	October 31, 2019
Spancat + Textcat usage , textcat , spancat	6	204	May 31, 2024
Training Data after Using spans.manual usage , done , spacy , spancat	20	843	August 21, 2021
SpanCat and TextCat textcat , custom , spancat	1	28	September 17, 2024

Textcat using span overlapping view

Related topics