Classifying Entities Using Textcat

clinical_nlp · January 23, 2023, 8:42pm

Hi,

I'd like to use textcat to work on a very similar problem to this hugging face model. I'd like to provide a sentence and get a prediction for the assertion status (present, negated, et al.) of each entity in the sentence, where commonly entities will have different statuses (e.g. " The brain MRI did not show a neurologic disorder", where the brain MRI is present but the neurological disorder is negated.) I can derive the sentences containing entities, and I could likewise add a token around the entity in question - as the authors of the linked model do - if the effect would be similar. Is there a way using prodigy to structure entity-specific assertion detection as a textcat problem?

Thank you

koaning · January 25, 2023, 10:29am

The textcat pipeline in spaCy looks at the full text as input. Not the full text per entity, so I worry that textcat won't work here.

That said. There is a two step approach possible too. You could train a named entity model on the medical status and add a negspacy component to the pipeline. This component won't be perfect, but it suggests that it might be able to detect negation on an entity.

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

doc = nlp("She does not like Steve Jobs but really enjoys Samsung products.")
for e in doc.ents:
    print(e.text, e._.negex)

# Steve Jobs True
# Samsung False

More details on the project can be found here.

Might this help?

clinical_nlp · January 26, 2023, 3:54pm

Thanks, Vincent. Negex is a nice place to start, but it has some known issues with false positives and complex sentences. There are additionally other kinds of attributions in clinical data that are relevant

family history

others son(s): epilepsy ; problem b.2010 lincoln; problem healthy

mother - alive,copd/emphysema, heart dz throat: clear.

hypotheticals

call your doctor at once if you have: blurred vision, tunnel vision, eye pain, or seeing halos around lights; shortness of breath (even with mild exertion), swelling, rapid weight gain; severe depression, changes in personality, unusual thoughts or behavior; new or unusual pain in an arm or leg or in your back; bloody or tarry stools, coughing up blood or vomit that looks like coffee grounds; seizure (convulsions); or low potassium (confusion ...

allergies, possibilities, etc.. I already have the named entity model, but was hoping to use a classification model for tricky kinds of attributions - the kind that negex misses. It seems like the textcat solution available to me within prodigy - in the absence of tokens to denote an entity - is to change the value of the named entity to a fixed term. So

she has not been diagnosed with a chronic + salpingo- lung disease, no bronchial asthma, no bronchitis, no copd, no oophorectomy xi emphysema

becomes

he has not been diagnosed with a chronic + salpingo- lung disease, no bronchial asthma, no bronchitis, no copd, no oophorectomy xi named_entity

This does work, although perhaps not as well as using a dependency parser or other approaches for this task. It just has the benefit of being more legible to me at the start.

koaning · February 1, 2023, 9:47am

Have you seen this blogpost?

Specifically, this part which talks about negation in the context of health benefits of products. It might offer some inspiration.

The tricky thing with any advice here is that "it depends". You're likely going to find a solution that is going to fit your problem in a unique way and the only way to find the best solution is to try out a few reasonable approaches and to keep iterating.

Topic		Replies	Views
Framing NER task as a text classification task usage , ner , textcat	5	632	December 19, 2019
Will NER improve Text Categorization?	2	413	July 18, 2022
regex + training categories usage , spacy	2	654	August 19, 2019
Can we show NER labeled entities while classifying text? usage , textcat	1	319	March 10, 2021
Model for Entity Check with Prodigy usage , ner , textcat , solved	6	498	August 27, 2019

Classifying Entities Using Textcat

Related topics