Hi!, I am trying to use the overlapping feature of spancat inside the textcat, by adding the span key inside my dataset, but all I get is the regular ner view, it doesnt allow overlapping. For example my data looks like this.
{"text":"Biomaterials and medical devices are broadly used in the diagnosis, treatment, repair, replacement or enhancing functions of human tissues or organs. Although the living conditions of human beings have been steadily improved in most parts of the world. ","label":"ID: 27047681","spans":[{ "start": 0, "end": 12, "label": "ORG" },{ "start": 0, "end": 12, "label": "ORG_2" }]}
It's a bit of a hack, but essentially you need to pass the stream through get_tokens(), which will add tokens to the stream (see line 42 of gist), which will switch it to spans_manual. If you remove this line, then it'll opt for the ner_manual interface, i.e., non-overlapping spans.
I didn't get a chance to dig deeply in the UI's. But my hypothesis on why this works is that the spans_manual interface may need tokens; hence, you need to tokenize the stream. In order to do this, you need to add in a spaCy tokenizer (i.e., notice that textcat.manual does not require a model as it doesn't do tokenization by default).
I'll raise this point to the Prodigy front-end leads to see if this is intended behavior and see if we need to make any changes.
Does this work-around solve your problem for now? I'll post back if we make changes on this in the future.