Quick question on the SpanCategorizer that I haven’t been able to find an answer about.
Does the SpanCategorizer include context surrounding the suggested spans when classifying or is only the span being considered? If it does take context into account - how much’ish?
Ah! That's good to know to my knowledge, the spancat classifier depends on the suggested spans going in. But suppose that you're considering all 5-grams. Then by definition, we will consider every combination in a moving window of tokens. So even if the classifier only considers one chunk at a time. It is also considering surrounding tokens because these occur in the surrounding windows.
If you'd like to go more into details, it would be better to ask this question on the spaCy discussion forum. The spaCy maintainers will be able to give more detailed answers to any further questions.