Classifying Entities Using Textcat


I'd like to use textcat to work on a very similar problem to this hugging face model. I'd like to provide a sentence and get a prediction for the assertion status (present, negated, et al.) of each entity in the sentence, where commonly entities will have different statuses (e.g. " The brain MRI did not show a neurologic disorder", where the brain MRI is present but the neurological disorder is negated.) I can derive the sentences containing entities, and I could likewise add a token around the entity in question - as the authors of the linked model do - if the effect would be similar. Is there a way using prodigy to structure entity-specific assertion detection as a textcat problem?

Thank you

The textcat pipeline in spaCy looks at the full text as input. Not the full text per entity, so I worry that textcat won't work here.

That said. There is a two step approach possible too. You could train a named entity model on the medical status and add a negspacy component to the pipeline. This component won't be perfect, but it suggests that it might be able to detect negation on an entity.

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

doc = nlp("She does not like Steve Jobs but really enjoys Samsung products.")
for e in doc.ents:
    print(e.text, e._.negex)

# Steve Jobs True
# Samsung False

More details on the project can be found here.

Might this help?

1 Like

Thanks, Vincent. Negex is a nice place to start, but it has some known issues with false positives and complex sentences. There are additionally other kinds of attributions in clinical data that are relevant

  • family history

others son(s): epilepsy ; problem b.2010 lincoln; problem healthy

mother - alive,copd/emphysema, heart dz throat: clear.
  • hypotheticals
call your doctor at once if you have: blurred vision, tunnel vision, eye pain, or seeing halos around lights; shortness of breath (even with mild exertion), swelling, rapid weight gain; severe depression, changes in personality, unusual thoughts or behavior; new or unusual pain in an arm or leg or in your back; bloody or tarry stools, coughing up blood or vomit that looks like coffee grounds; seizure (convulsions); or low potassium (confusion ...

allergies, possibilities, etc.. I already have the named entity model, but was hoping to use a classification model for tricky kinds of attributions - the kind that negex misses. It seems like the textcat solution available to me within prodigy - in the absence of tokens to denote an entity - is to change the value of the named entity to a fixed term. So

she has not been diagnosed with a chronic + salpingo- lung disease, no bronchial asthma, no bronchitis, no copd, no oophorectomy xi emphysema


he has not been diagnosed with a chronic + salpingo- lung disease, no bronchial asthma, no bronchitis, no copd, no oophorectomy xi named_entity

This does work, although perhaps not as well as using a dependency parser or other approaches for this task. It just has the benefit of being more legible to me at the start.

Have you seen this blogpost?

Specifically, this part which talks about negation in the context of health benefits of products. It might offer some inspiration.

The tricky thing with any advice here is that "it depends". You're likely going to find a solution that is going to fit your problem in a unique way and the only way to find the best solution is to try out a few reasonable approaches and to keep iterating.