Reclassifying text fragments with custom NER

niedakh · August 30, 2018, 6:21pm

Hi,

I have a use case where my own model predicts text fragments which contain some NER-like meaning like:

I enjoyed, surprisingly, doing this a lot.

The fragment enjoy this will be marked as a NER called “enjoyment phrase”,.

But it is very rule-based and yields lots of problems with contextual negations and other stuff. For example it will catch a false positive:

I never enjoy this.

I’d like to use prodigy to go through my labeled data and make it include contextual stuff like negations etc.

How do I do this? Can I load the pre-trained data and reclassify it somehow and prodigy will be able to discern the my corrections from original classifications and then propagate the corrections over the rest of the set? Or do I use text classification and try to learn a classifier outputting a “True positive” label? In the second approach, how do I pass the selected fragments, to the model in the data?

Best,
Piotr

honnibal · September 1, 2018, 11:11am

Well, this isn't really Named Entity Recognition (NER). The parts you're trying to recognise aren't contiguous phrases, and actually the syntactic structure can be pretty complicated. You might find it easier to use the dependency parse for writing these rules.

Here's how the sentence you gave would be parsed: displaCy Dependency Visualizer · Explosion . You can play with the API for this here: Linguistic Features · spaCy Usage Documentation

I'm worried that your classification scheme probably isn't very well defined. Like, what exactly will be an "enjoyment phrase"? Consider the following examples:

I enjoy that
That is enjoyed by me
I enjoy doing that
Doing that is enjoyed by me
That is enjoyable
It is enjoyable to do that
Doing that makes me happy
I'm happy when I'm doing that
I was happy, because I did that.
I did that. I became happy.
etc

If you don't have a linguistically precise definition of what counts and what doesn't, you won't be able to annotate accurately --- let alone replicate those annotations in a machine learning model.

I would suggest collecting a set of trigger words (fun, enjoy, etc). If the words are ambiguous (i.e. some word is sometimes a trigger, sometimes not) you can use the NER model to predict a context-specific label. You'd collect annotations for that task.

I would advise against building negation into the defintion of the trigger word. You should annotate that separately. Otherwise, the model which has to learn the trigger words will have a very difficult task: it's trying to learn two pieces of information jointly, even though the negation word might be arbitrarily far apart from the trigger word.

If this is an important or commercially valuable project, I would suggest trying to find some annotators with linguistic experience (e.g. at least a few undergraduate courses in syntax) to help you make sure your annotation scheme makes sense.

Topic		Replies	Views
Recommended approaches for combining NER with text calssification usage , ner , textcat	2	731	October 22, 2019
Framing NER task as a text classification task usage , ner , textcat	5	633	December 19, 2019
classification usage , ner	1	446	September 8, 2019
Combining NER with text classification usage , ner , textcat	10	6898	March 20, 2024
Manual text typing usage , custom	2	932	February 25, 2018

Reclassifying text fragments with custom NER

Related topics