Text Classifier model architecture

agh92 · November 18, 2018, 10:49am

Hello Guys,

I’ve been using spacy for a cople of months and I am trying to dive a little deeper in spaCy v2 and Thinc. I am currently doing document classification with the Texclassifier provided by spaCy and I am trying to understand the model behind the classifier. I was wondering if someone could walk me thought the steps that build the word embeddings.

I get from the post, other ones and the code that I have been reading that the model vectors gathers the features of each doc using doc.to_array in the FeatureExtracter and that all gets converted to a single vector. However, this is just the embedding step. Can some give me more light on what is going on in the cnn_model .

I am also trying to mapp the names here and in the code to the concepts explained in the post Embed, encode, attend, predict. As far as I can understand, in this case the ID which goes in to the embedd step is represented by ‘doc.to_array’. I am not beeing able to recognize the encoding step, where the sentece matrix is build. I see the layer parametricAttention but the attend step is also unclear for me.

Thank in advance!

Topic		Replies	Views
Text classification with window usage , textcat	4	852	May 12, 2019
PyTorch / Prodigy integration pytorch , news	3	1378	November 1, 2018
Model Architecture textcat.train-batch spacy , api , solved	2	661	August 26, 2019
Custom TextClassifier model for sequences textcat , custom , thinc	4	1940	February 10, 2018
thinc.neural.ops.Ops.allocate MemoryError.! thinc	2	1473	July 30, 2018

Text Classifier model architecture

Related topics