Changing this to use prefer_uncertain
isn’t too hard, but also isn’t entirely trivial. The issue is that spaCy’s nlp
object doesn’t return scores for the entities by default, as the model actually doesn’t produce any scores that can really be interpreted that way (it’s a transition-based model).
See this issue for an explanation of how to use a slightly different model in spaCy to get entity probabilities: Accessing probabilities in NER
Once you have the scores, you can change your stream to produce (score, example)
tuples, and then wrap the scored stream with prefer_uncertain
.