In a scenario of using ner.teach with only one label,
I know that if only a partial word of an entity is recognized in a task, the task need to be rejected.
However, I face multiple cases where only one entity is recognized in the task and the others are ignored or will come up in the next task. What should be the decision here to support the active learning model?
Hi! The concept of ner.teach is to show you different suggestions, one entity at a time, with different confidence scores. So you'll get to focus on one single suggestion at a time, and the accept/reject feedback only applies to the given highlighted span, not the full parse. If the highlighted span is correct, you should accept – if not, you should reject
If you want to see the model's best prediction of all entities in the current text, check out the ner.correct workflow. It doesn't update a model in the loop, but it lets you make manual corrections and create gold-standard training data.
Hi @ines
I have a large corpus with most of the text containing no entities. Currently, I don't get any true negative examples because of setting the all_examples=Fasle in ner.manual (PatternMatcher). The model is performing well, but Is this a good practice?
I could do so for ner.manual, but would it be also possible for the ner.correct?
Focusing on the examples that contain given words and phrases you're interested is fine for bootstrapping an initial training set IMO and getting over the "cold start problem", especially if your data is a bit imbalanced. If there's no shortage of examples with no entities, you definitely want to make sure that you're getting enough examples with entities in there that your model can learn from. Once you're ready to train, you can always mix in some texts without entities (which should be really quick to annotate as well, because you can do it as a simple yes/no selection).
Many thanks for your feedback!
So, is there a way to only get the highlighted examples in ner.correct recipe as well?
There is no PatternMatchter, so I can't use the "all_examples=False" workaround.
Yes, the ner.correct recipe will always show you all examples by default, including those with no predictions made by the model (which is also a prediction in itself).
But you could write a simple filter function that only sends out examples with spans:
def filter_examples(stream):
for eg in stream:
if eg.get("spans", []):
yield eg
And then you can apply that at the end of your recipe, after the predictions have been added: