Hello Guys,
I’ve been using spacy for a cople of months and I am trying to dive a little deeper in spaCy v2 and Thinc. I am currently doing document classification with the Texclassifier provided by spaCy and I am trying to understand the model behind the classifier. I was wondering if someone could walk me thought the steps that build the word embeddings.
I get from the post, other ones and the code that I have been reading that the model vectors
gathers the features of each doc using doc.to_array
in the FeatureExtracter
and that all gets converted to a single vector. However, this is just the embedding step. Can some give me more light on what is going on in the cnn_model
.
I am also trying to mapp the names here and in the code to the concepts explained in the post Embed, encode, attend, predict. As far as I can understand, in this case the ID which goes in to the embedd step is represented by ‘doc.to_array’. I am not beeing able to recognize the encoding step, where the sentece matrix is build. I see the layer parametricAttention but the attend step is also unclear for me.
Thank in advance!