Textcat using span overlapping view

Hi!, I am trying to use the overlapping feature of spancat inside the textcat, by adding the span key inside my dataset, but all I get is the regular ner view, it doesnt allow overlapping. For example my data looks like this.

{"text":"Biomaterials and medical devices are broadly used in the diagnosis, treatment, repair, replacement or enhancing functions of human tissues or organs. Although the living conditions of human beings have been steadily improved in most parts of the world. ","label":"ID: 27047681","spans":[{ "start": 0, "end": 12, "label": "ORG" },{ "start": 0, "end": 12, "label": "ORG_2" }]}

and I am getting a view like this one.

is there anyway I can change this behavior and use the spancat viewer instead, so that I can have overlapping mentions.


hi @darrylestrada97!

Thanks for your question!

Are you looking for this?

I wrote a custom recipe that does this:

python -m prodigy textcat.manual.spans issue-6434 blank:en overlapping.jsonl -F textcat-manual-spans.py

It's a bit of a hack, but essentially you need to pass the stream through get_tokens(), which will add tokens to the stream (see line 42 of gist), which will switch it to spans_manual. If you remove this line, then it'll opt for the ner_manual interface, i.e., non-overlapping spans.

I didn't get a chance to dig deeply in the UI's. But my hypothesis on why this works is that the spans_manual interface may need tokens; hence, you need to tokenize the stream. In order to do this, you need to add in a spaCy tokenizer (i.e., notice that textcat.manual does not require a model as it doesn't do tokenization by default).

I'll raise this point to the Prodigy front-end leads to see if this is intended behavior and see if we need to make any changes.

Does this work-around solve your problem for now? I'll post back if we make changes on this in the future.

It looks like what I wanted but the flag --label is not working, how could I had the labels?

Hey @darrylestrada97,

Great job on modifying Ryan's snippet to add the labels! It looks to me like your version does the job. Let me know if you're still miss anything.