Text classification with window

damiano · May 11, 2019, 3:10pm

Hello,
I have to classify my sentences but the order(inside the document) in my case matter.
So I thought to train each sentence by passing the text of the previous sentence, the current and the next one. So basically the text of three sentences to classify each sentence. Could it work?

honnibal · May 11, 2019, 4:56pm

The current text classifier architecture won’t really be sensitive to that. There are a range of architectures that could work, depending on the nature of the context sensitivities. Coming up with a model that captures this sort of ordering effect is a bit out-of-scope of Prodigy support though — it’s more of a general machine learning question.

You can see an example of a model with sentence convolutions here: https://github.com/explosion/thinc/blob/master/examples/imdb_cnn.py . You could also try a PyTorch support group for information about experimenting with hierarchical text classification models in PyTorch. PyTorch is easier to work with for architecture experimentation for most people, because there’s a big community around it. Personally I like using Thinc, but that’s mostly because I wrote it, so I know everything about it. You can wrap a PyTorch model for use in spaCy very easily.

damiano · May 11, 2019, 5:47pm

Hi @honnibal
I thought about passing more context for each sentence because those sentences are short, so maybe spacy cannot really understand the class.
Maybe I can try to classify each sentence and check what happen first

honnibal · May 11, 2019, 10:22pm

In that case maybe adding the context isn’t such a bad idea. Give it a try?

damiano · May 12, 2019, 7:48pm

Ok! Thanks

Topic		Replies	Views
Access to/manipulate sent.cat within TextClassifier class? usage , textcat , spacy	4	947	February 21, 2019
Sentence-based classification: Automated sentence splitting? usage , textcat , spacy , solved	5	1835	June 14, 2018
breaking down texts to sentences for textcat textcat , best-practices	2	335	December 13, 2023
Can't improve textcat model performance textcat	2	389	May 3, 2020
Text classification - content of a web page usage , textcat , solved	2	700	August 31, 2018

Text classification with window

Related topics