Using prefer_uncertain with make-gold recipe

I have been using the make-gold recipe from GitHub. Is there a way to easily integrate the prefer_uncertain sorter into this custom recipe?

Changing this to use prefer_uncertain isn’t too hard, but also isn’t entirely trivial. The issue is that spaCy’s nlp object doesn’t return scores for the entities by default, as the model actually doesn’t produce any scores that can really be interpreted that way (it’s a transition-based model).

See this issue for an explanation of how to use a slightly different model in spaCy to get entity probabilities: Accessing probabilities in NER

Once you have the scores, you can change your stream to produce (score, example) tuples, and then wrap the scored stream with prefer_uncertain.

1 Like

Thanks! I will look into this

1 Like