Does the spancat function include context for the labelling?
Say I have the two sentences:
'Mike found a postcard worth USD 2.'
'Mike found USD 2.'
In both cases, I want the string 'USD 2' as my span. However, in the first case I want it to be the span.label_ 'Value' and in the second I want to use the span.label_ 'Item' - in the first case, 'postcard' would be the span with the label 'Item'. Will spancat catch this difference from the context or do I have to include things like 'worth' into the first span?
Will spancat catch this difference from the context or do I have to include things like 'worth' into the first span?
I'm not 100% sure if I follow what you mean there. In your example the word "worth" isn't part of the span of text that describes the value. Or am I reading it wrong?
In any case. It's very hard to guarantee what a machine learning model will or will not do. A lot of it depends on how much, and how relevant, the data is being used to train it.
That said, when the default spancat model looks at a span, it uses the first/last token features in the span as input for the classification model. So in that sense, it does not look "outside" of the span for context.
There is a caveat to this, namely when you use a BERT style model as a featurizer. In this case the feature space will be "contextualized", which means that theoretically, the first/last token will carry some information from outside of the span. I don't recommend using BERT when you're just starting out though, as it really complicates training and typically makes iteration a fair bit slower.
Hi, yes that answers my question - thank you. I had hoped it had an attention window e.g. using n-grams including 3 tokens each side of the span. Just like NER would also include context for the classification.
In case it's of interest, you can read more details here if you haven't already.
The idea behind spancat is that there are two techniques at play. One selects potential spans, and another selects the span. While it's true that there's not really a "window", there is the mechanism that considers multiple spans for classification. So it could be that spancat still works out, but ... as always ... "it depends" and the only way to know for sure is to try it out.
A final comment: you can choose to mix/match spancat with NER. There might be certain entities better fit for the NER approach while others benefit more from spancat.
Thank you so much already. Hoping that I do not overstretch you patience with me : I read the article that you also shared. And in there it describes the stages of Embedding, Pooling and Scoring. Pooling reads as follows:
"Pooling : we reduce the sequences to make the model more robust, then encode the context using a window encoder"
That's where my question/misunderstanding came from.
If that is not the context surrounding the span, would it then sound feasible to take each token vector and multiply it by the token.dep before passing it to the scorer? This way 'USD2' - dobj would get a coherent and predictably different vector than the 'USD2' - npadvmod.