Does ner.teach take into account attribute extensions?

kevinrosenberg21 · January 12, 2018, 1:06pm

Hi! Thank you for making such a great product and providing amazing support. It’s been life saving.

I’m working on adding custom attributes to my model’s tokens so that they can be used as extra features to train the NER. I was looking at spaCy’s documentation and figured out I could add them like this and/or like this.

My question is, will prodigy take into account this extra attribute when choosing the most relevant samples and updating the model?

Thank you, and regards.

ines · January 12, 2018, 1:52pm

No, spaCy's NER model currently uses the NORM, PREFIX, SUFFIX and SHAPE attributes as its features. Custom attributes are not used, because they can contain pretty much any arbitrary information – so spaCy has no way of knowing what is relevant and what isn't, or how the custom attributes relate to the data.

You can customise the features of the model, but this will take a little more work. Our video on how spaCy's NER model works should be a good place to get started. You can also find more details on this in the neural network model architecture section of the docs.

One thing you could do pretty easily, however, is to use custom attributes to influence the selection of relevant examples. By default, ner.teach uses uncertainty sampling, which is implemented via the prefer_uncertain sorter. Sorter functions take a stream of (score, example) tuples and yield a stream of sorted annotation tasks, based on the score. So instead of using the built-in model to score the examples, you can implement your own function that takes custom attributes into account. Here's a simplified example:

def get_stream(stream):
    for eg in stream:
        doc = nlp(eg['text'])  # process the example text with spaCy
        score = doc._.custom_score  # get a score from your custom attribute
        yield (score, eg)

stream = prefer_uncertain(get_stream(stream))  # sort stream

Of course, how useful any of this will be depends on what you're trying to do.

kevinrosenberg21 · January 12, 2018, 1:59pm

Thank you for your answer.

So, if I understood correctly, if I manage to get spaCy’s NER to take into account this extra feature and did something like using the model first on the data and choose the samples with the lowest confidence instead of prefer_uncertain and the model’s update function somehow uses the new feature also (whether by default behavior or via a custom update function) I could still use prodigy?

ines · January 12, 2018, 2:09pm

Yes, exactly – it would even work if your model wasn’t a spaCy / Thinc model but, say, a PyTorch or TensorFlow model. If you’re using Prodigy with a spaCy model, all feature extraction happens in spaCy – so if you get your spaCy to use your custom features, Prodigy will go along with that. (As far as Prodigy is concerned, it’s simply asking spaCy for a score.)

To implement a custom solution, all you need is two functions like this:

def predict(stream):
    for eg in stream:
        score = YOUR_MODEL.predict(eg['text'])
        yield (score, eg)

def update(examples):
    loss = YOUR_MODEL.update(examples)
    return loss

Topic		Replies	Views
Adding custom features to improve a NER model while training usage , ner , spacy , custom	1	294	October 25, 2022
Adding Custom Features to Train a NER spaCy Model ner , spacy	1	700	February 16, 2021
How to use customized spaCy model in Prodigy? ner , spacy	6	491	July 3, 2023
adding custom attribute to doc, having NER use attribute ner , spacy	11	5437	March 9, 2018
Prodigy NER model active learning usage , ner	2	494	March 23, 2020

Does ner.teach take into account attribute extensions?

Related topics