Incorporating custom position feature into NER

I have an NER task that is sensitive to position in the document. The entities I’m looking for will tend to show up at the front of the document. This isn’t consistent enough to be deterministic, but is enough of a trend that I bet a “line number in the document” feature would help. Is there a way of incorporating this into the NER model? From the overview video of the NER model it appears that the NER model is not explicitly position-sensitive by default. How hard is it to add something like a custom “line number in the document” feature? Maybe as an addition to the Attend features described here.

(I suppose this is more of a spaCy/thinc question than a Prodigy one so let me know if there’s a better place to post it. But I figure at the moment the staff of Explosion is small enough that the question will find its way regardless. :slight_smile: )

It’s fine to ask this here :). One of the reason we wanted to build on our own technologies is that it’s not very satisfying to ask people to raise the issue upstream — we want to be able to just answer the question.

As far as actually computing the position feature, you might want to base it off the token.idx attribute, and then discretise this somehow. Probably bins like 0-10, 10-30, 30-60, 60+ might be good. Additional bins for regions near the end might also be useful. If you hate arbitrary things like this, you can use information theory to find more principled cutting points – you want to find cuts that maximise the information gain with respect to the entity labels. I wouldn’t bother doing that though.

I can think of three pretty different ways you could get the model to take advantage of the position features. Unfortunately they’re all somewhat complicated, and I’m in Sydney until early January, so can only offer limited assistance. The three options are:

  1. Add a feature to the word representations.

  2. Add a feature to the state representations.

  3. Create a linear model, and make the scores linear(input) + neural(input). This way you can add whatever features you want to the linear model.

  4. Create a multi-task objective, so the model is trained to predict the position given the input.

Number 4 is the fancy new neural way of going about things. The idea is a bit counter-intuitive. Instead of incorporating the position information as a feature, you encourage the model to learn that the start of the document looks different from the rest of the document. The knowledge required to make that distinction gets put into the word representations, which are shared between the position prediction model and the entity prediction model. At run-time, you throw away the position prediction model, and just use the word representations.

Have a look at the init_multitask_objectives method of EntityRecognizer in https://github.com/explosion/spaCy/blob/master/spacy/pipeline.pyx . If you subclass this class, you can have it create a subclass of MultitaskObjective that tries to predict a position feature.

I’m planning to build hooks to make 3 much easier, because I think this will be useful for the syntactic parser. It’s also my plan to address the current speed problems: make the model a linear and neural ensemble, and let people switch off the neural component if they want the parser to be faster. So I’ll defer discussion of this for now.

Options 2 and 1 will be easiest to do if you build from source at first, while you experiment. Once you’ve finished experimenting, you can work out the neatest way to subclass everything to make your changes.

The part of the code to modify to do 1 is the Tok2Vec function in spacy/_ml.py. You need to write a function like this signature:

from thinc.api import layerize, chain, flatten, FeatureExtracter
from thinc.i2v import Embed

def discretize(bins):
    def discretize_fwd(positions):
        # This should take a single array of integer positions, and return a single array of uint64 feature IDs
        return discretized
    return layerize(discretize_fwd)

def EmbedPositions(widths, bins):
    with Model.define_operators({'>>': chain}):
        model = (
            FeatureExtracter([IDX])
            >> with_flatten(
                discetize(bins)
                >>  Embed(width, nV=len(bins))
            )
        )
    return model

The idea is to create the feature extracter that takes Doc objects, and returns a list of arrays with the position vectors. This list of vectors can then be used to as part of the Tok2Vec function. You could put the position vectors before or after the convolutions. I’m not sure which would work best.

Strategy 2 — extending the state representation — is currently difficult, because the state representation assumes that all of the values refer to tokens.

1 Like

Could you kindly elaborate about this one please? Ideally, could you write a minimal implementation for a simple feature (such as the one described or any other, say a term included in some dictionary) as subclassing MultitaskObjective does not seem completely straightforward to me.

Additionally, what is the workflow to train specifically for this 'feature'? Or shall we rely on the NN having natural incentive during a global supervised training?

I’ve just added an example to spaCy showing how this is done: https://github.com/explosion/spaCy/blob/master/examples/training/ner_multitask_objective.py

I made a couple of improvements to make the code simpler, so it requires the version of spaCy currently on master. We should be doing a release of spaCy this week – before then, you could try out the example by building from source.

1 Like

ty. will pull and have a thorough look!

How exactly does MultitaskObjective change the word representations? Does it add attributes to the lexeme referring the word, and by add attributes I mean whether it uses the function passed in add_multitask_objective() to calculate that attribute and if it is doing it this way.

It changes the training objective used to supervise the word representations. It doesn’t change any of the features, but the features are actually quite general already — they capture most of the information you would want.

Thank you for your quick reply.
Can you please shed some light on Training objective used to supervise the word representations? or some resources that will help me understand it, I am new to the field and having an understanding of how this works will be a great help to me.

Ah, sorry. Sure – this blog post might be helpful: http://ruder.io/multi-task-learning-nlp/

1 Like

Thanks a lot :smile: