Using a text classifier instead of NER

Firs of all, your documentation is great :boom:

I would love to see you elaborate on this section though. I.e. how to train a classifier instead of NER when you have multiple spans you want to classify in the text. I do realize that the examples can vary quite a bit but I'd love to see you do a video on how to approach such a task. I'd assume this is a very common task for a lot of people as well (for information extraction etc.)

Here is a dumb example of two spans I'd like to extract using a classifier approach (assume it couldn't be done with NER although it would make more sense in this example probably)

For some reason I want to extract this span and for non-obvious reason I don't want to extract this span and for even less obvious reason I'd also like to extract this span

Thanks for the suggestion! I agree that a video on this would be a great idea, I'll start thinking about that :thinking: . I've been weighing up different ideas for videos and I think that's a great suggestion.

There are two ways to do the text-classification-as-NER strategy. One is to structure your downstream application so that you don't require the specific highlighted span. Sometimes this is viable, sometimes it isn't.

The other way is to chain together text classification and some sort of span identification strategy. You can either put the text classifier first or second here. The text classification label indicates whether the sentence contains any instances of the named entity in question. This can make life much easier for the downstream NER model, as it doesn't have to worry about confusing instances that have nothing to do with what you're trying to recognise.

The other approach is to run a more generic span identification process first, for instance by classifying with a single label. You then use text classification to provide the more specific labels you might be trying to recover.

The text classification approach works best if you usually only have one candidate span per sentence, or per other easily segmentable unit of text. If you have multiple candidate spans as in your example, it's a bit trickier.

Sorry I can't give more specific advice: it's inherently pretty heuristic driven, based on experiments and the characteristics of your problem.

1 Like

I'd like to see a video on that for sure - especially the case with multiple candidates in the same sentence. I think your suggestion with chaining together a classifier and NER would work for my case. But I wonder if features from the transformer models could be exploited in some way as well to catch broader context when necessary.