(Madhu Jahagirdar) #1

My text classification really works great, however, when we analyzed false positive we found that the model does not take in to account negation aspect. For example: no follow up required is still ranked higher whereas due to no word this should not have been classified. Most of the false positives are due to negation issues, is there any way we can handle to negation ?

(Matthew Honnibal) #2

I would expect the model to be able to handle negation, although it might take more examples to learn.

You might try running your classifier over a lot of text, and then searching for positives with “no” or “not” in them. Try then marking whether the example is a false positive. Hopefully this can give you a quick way to label more examples that exhibit this problem, so that you can add them to your training dataset, and hopefully resolve the issue.

Basically this is just another active learning step — except you’re making a more custom intervention into how the examples are selected.

Another thing you can try is pre-processing the text, so that negated items are retokenized into one word. For instance, you would replace not good with not_good, or no follow up required with follow up no_required. The dependency parse can be useful for this. Probably the easiest way to implement this is to make a new Doc object with the new wording.