Learning the polarity of a specific term (with spaCy)

Andrey · January 21, 2019, 5:45pm

I am wondering whether there is a simple way to learn the polarity of a word ? By polarity I mean the context in which certain key-words appear. For example, working with medical notes, some drugs could be prescribed or only mentioned (but not actually prescribed). The question is how to discriminate between the actually prescribed and those which only have been mentioned.

For example:

['Aspirin has been prescribed to a patients'] -> {[('key_word': 'aspirin', 'prescribed': TRUE)]}

['If symptoms continue, the patient should consider taking Omeprazol'] -> {[('key_word': 'omeprazol', 'prescribed': FALSE)]}

[‘The plan is for him to commence 25mg of Trazodone as soon as he gets better.’] -> {[('key_word': 'trazodone ', 'prescribed': FALSE)]}

['her current meds are: sertraline 200 mg and olanzapine 5 mg'] -> {[('key_word': 'sertraline', 'prescribed': TRUE), ('key_word': 'olanzapine', 'prescribed': TRUE)]}

['if she continues to be depressed, then she needs to be started on Risperidone'] -> {[('key_word': 'risperidone', 'prescribed': FALSE)]}

So, basically, I need to train a model to recognize the intent of certain key words (such as meds) or correctly classify them. I am exploring the tutorial of Ines on insult classification, but it is not exactly as in my case (though similar).

Any help will be highly appreciated.

honnibal · January 24, 2019, 1:20am

I think it might be misleading to refer to this task as ‘polarity’, as that’s usually applied to sentiment, which isn’t quite what you’re doing here. I think “intent” isn’t the best terminology either, because that’s usually applied to parsing commands.

I think the meanings you’re trying to classify here are pretty subtle, and so the classifier is probably going to learn keywords. You might find that you’re better off doing a rule-based approach, with the rules referring to the dependency parse. This at least lets you control what the system is outputting a bit better.

If you do want a classifier to do the task, probably the best way to structure the model that I can think of is to use a sequence classification model with the categories PRESCRIBED_DRUG and NON_PRESCRIBED_DRUG. spaCy’s default NER model might not be great at this task, as you probably want to use a BiLSTM model instead of spaCy’s CNN.

Whether you use a rule-based approach or try to train a model for whether the drugs are prescribed or not, you should definitely have a good evaluation set annotated, with at least 1000 examples. This way you can compare different approaches, and refine your rules. If you’re doing a rule-based approach, try not to overfit your rules to your annotated data too much. It can help to have a separate training and test set, where you’re allowed to look at the training set, but not at the test set.

Topic		Replies	Views
Prodigy + spaCy for negation extraction and a link between the entities usage , ner , dep , best-practices , medical	2	2534	July 23, 2018
Sentiment of single words/phrases usage , textcat , spacy , solved	2	1034	May 2, 2019
Pattern-based recognition using negation detection usage , spacy	4	2175	February 21, 2019
Form and prevalence of negative examples in the Training Set when training a Custom NER SPACY model spacy	3	1199	December 28, 2022
How does the Spacy language model classify before any human annotation? textcat , spacy	3	468	March 10, 2020

Learning the polarity of a specific term (with spaCy)

Related topics