Using a text classifier instead of NER

Firs of all, your documentation is great :boom:

I would love to see you elaborate on this section though. I.e. how to train a classifier instead of NER when you have multiple spans you want to classify in the text. I do realize that the examples can vary quite a bit but I'd love to see you do a video on how to approach such a task. I'd assume this is a very common task for a lot of people as well (for information extraction etc.)

Here is a dumb example of two spans I'd like to extract using a classifier approach (assume it couldn't be done with NER although it would make more sense in this example probably)

For some reason I want to extract this span and for non-obvious reason I don't want to extract this span and for even less obvious reason I'd also like to extract this span

Thanks for the suggestion! I agree that a video on this would be a great idea, I'll start thinking about that :thinking: . I've been weighing up different ideas for videos and I think that's a great suggestion.

There are two ways to do the text-classification-as-NER strategy. One is to structure your downstream application so that you don't require the specific highlighted span. Sometimes this is viable, sometimes it isn't.

The other way is to chain together text classification and some sort of span identification strategy. You can either put the text classifier first or second here. The text classification label indicates whether the sentence contains any instances of the named entity in question. This can make life much easier for the downstream NER model, as it doesn't have to worry about confusing instances that have nothing to do with what you're trying to recognise.

The other approach is to run a more generic span identification process first, for instance by classifying with a single label. You then use text classification to provide the more specific labels you might be trying to recover.

The text classification approach works best if you usually only have one candidate span per sentence, or per other easily segmentable unit of text. If you have multiple candidate spans as in your example, it's a bit trickier.

Sorry I can't give more specific advice: it's inherently pretty heuristic driven, based on experiments and the characteristics of your problem.

2 Likes

I'd like to see a video on that for sure - especially the case with multiple candidates in the same sentence. I think your suggestion with chaining together a classifier and NER would work for my case. But I wonder if features from the transformer models could be exploited in some way as well to catch broader context when necessary.

Hi @honnibal ,

Thanks for the explanation above, it helped clear many things for me.

One thing which I have in mind which I haven't yet understood is that let's say to concretize my case, I show you a description like this

I want to perform a single label classificaion on the text above which tells me if the product described in the sentence i.e. a dress is suitable to be worn in summer, in winter, across any season or the sentence has no mention of the season in it.

Generally words like warm, autumn, winter, wool, thick etc. are associated with winter and words like cool, breezy, flouncy etc. are associated with summer dresses. Most of the sentences have the word summer or winter explicitly in addition to these keywords mentioned above. But, some of them (quite a few of them) only have those descriptive keywords and just the generic word season mentioned in them.

So, I wanted to ask is text classification a suitable approach to take here and is it capable to look at such nuances? Because as I highlighted from your post above, only when explicit keywords are present and that too one per sentence, then will text classification work well right? Or am I misinterpreting it here.

Can you give me some clarity about the same?

Thanks & Regards,
Vinayak.

From the example you posted above, it definitely looks like text classification is a good approach for the task. The text classifier will be able to take all words in the text into account, so if certain keywords commonly occur in summer texts, it's something that your model will be able to learn and generalise from.

I think what Matt is referring to in his comment is more related to using a text classifier to decide on fine-grained categories for spans mentioned in the text. This is easier if you're expecting the text to be about one concept only, which you can then apply the text category to.

For instance, let's say you have a sentence like "The Yeezy Foam RNNR is the perfect shoe for sultry city nights". What you're looking to extract here is the product (Yeezy Foam RNNR), the type (shoe) and the season (summer). You could now go and create training data that labels "Yeezy Foam RNNR" as SUMMER_SHOE – but that'd likely be a very inefficient approach. There's nothing about the product name that makes it inherently a summer shoe, and you might come across it in different contexts that have nothing to do with the season at all. The model is likely gonig to struggle to learn the distinction on top of the span boundaries. So a better approach would be to predict the season over the whole text (just like what you're doing) and then use an entity recognizer or just a simple database or product catalogue lookup to detect the product.

This works well if you're dealing with product reviews etc. and you know that the text is about one span and/or product. Where it gets more difficult is if you have a sentence that talks about multiple products with different attributes (e.g. a summer shoe vs. a winter shoe). Here, you couldn't just predict a label over the whole text and apply that to all products.

1 Like

Thanks Ines!

This was helpful!