Text classification with window

usage
textcat
(Damiano) #1

Hello,
I have to classify my sentences but the order(inside the document) in my case matter.
So I thought to train each sentence by passing the text of the previous sentence, the current and the next one. So basically the text of three sentences to classify each sentence. Could it work?

(Matthew Honnibal) #2

The current text classifier architecture won’t really be sensitive to that. There are a range of architectures that could work, depending on the nature of the context sensitivities. Coming up with a model that captures this sort of ordering effect is a bit out-of-scope of Prodigy support though — it’s more of a general machine learning question.

You can see an example of a model with sentence convolutions here: https://github.com/explosion/thinc/blob/master/examples/imdb_cnn.py . You could also try a PyTorch support group for information about experimenting with hierarchical text classification models in PyTorch. PyTorch is easier to work with for architecture experimentation for most people, because there’s a big community around it. Personally I like using Thinc, but that’s mostly because I wrote it, so I know everything about it. You can wrap a PyTorch model for use in spaCy very easily.

(Damiano) #3

Hi @honnibal
I thought about passing more context for each sentence because those sentences are short, so maybe spacy cannot really understand the class.
Maybe I can try to classify each sentence and check what happen first

(Matthew Honnibal) #4

In that case maybe adding the context isn’t such a bad idea. Give it a try?

(Damiano) #5

Ok! Thanks