Text Classifier model architecture

Hello Guys,

I’ve been using spacy for a cople of months and I am trying to dive a little deeper in spaCy v2 and Thinc. I am currently doing document classification with the Texclassifier provided by spaCy and I am trying to understand the model behind the classifier. I was wondering if someone could walk me thought the steps that build the word embeddings.

I get from the post, other ones and the code that I have been reading that the model vectors gathers the features of each doc using doc.to_array in the FeatureExtracter and that all gets converted to a single vector. However, this is just the embedding step. Can some give me more light on what is going on in the cnn_model .

I am also trying to mapp the names here and in the code to the concepts explained in the post Embed, encode, attend, predict. As far as I can understand, in this case the ID which goes in to the embedd step is represented by ‘doc.to_array’. I am not beeing able to recognize the encoding step, where the sentece matrix is build. I see the layer parametricAttention but the attend step is also unclear for me.

Thank in advance!