only some entities in task recognized in ner.teach

az1373 · November 25, 2020, 9:31am

In a scenario of using ner.teach with only one label,

I know that if only a partial word of an entity is recognized in a task, the task need to be rejected.
However, I face multiple cases where only one entity is recognized in the task and the others are ignored or will come up in the next task. What should be the decision here to support the active learning model?

ines · November 26, 2020, 12:12am

Hi! The concept of ner.teach is to show you different suggestions, one entity at a time, with different confidence scores. So you'll get to focus on one single suggestion at a time, and the accept/reject feedback only applies to the given highlighted span, not the full parse. If the highlighted span is correct, you should accept – if not, you should reject

If you want to see the model's best prediction of all entities in the current text, check out the ner.correct workflow. It doesn't update a model in the loop, but it lets you make manual corrections and create gold-standard training data.

az1373 · November 26, 2020, 8:55am

It's clear now. Many thanks.

az1373 · November 27, 2020, 12:24pm

Hi @ines
I have a large corpus with most of the text containing no entities. Currently, I don't get any true negative examples because of setting the all_examples=Fasle in ner.manual (PatternMatcher). The model is performing well, but Is this a good practice?

I could do so for ner.manual, but would it be also possible for the ner.correct?

ines · November 30, 2020, 12:36am

Focusing on the examples that contain given words and phrases you're interested is fine for bootstrapping an initial training set IMO and getting over the "cold start problem", especially if your data is a bit imbalanced. If there's no shortage of examples with no entities, you definitely want to make sure that you're getting enough examples with entities in there that your model can learn from. Once you're ready to train, you can always mix in some texts without entities (which should be really quick to annotate as well, because you can do it as a simple yes/no selection).

az1373 · November 30, 2020, 8:27am

Many thanks for your feedback!
So, is there a way to only get the highlighted examples in ner.correct recipe as well?
There is no PatternMatchter, so I can't use the "all_examples=False" workaround.

ines · November 30, 2020, 9:50am

Yes, the ner.correct recipe will always show you all examples by default, including those with no predictions made by the model (which is also a prediction in itself).

But you could write a simple filter function that only sends out examples with spans:

def filter_examples(stream):
    for eg in stream:
        if eg.get("spans", []):
            yield eg

And then you can apply that at the end of your recipe, after the predictions have been added:

stream = filter_examples(stream)

az1373 · November 30, 2020, 12:23pm

Great. That's really helpful.
It works just as expected now.
Many thnaks!

Topic		Replies	Views
Accepting and rejecting in ner.teach recipe usage , ner , solved	1	415	January 3, 2020
How to treat entity-free text in manual/match modes. usage , ner	1	474	April 16, 2019
Does ner.manual requires negative examples? ner	1	643	March 12, 2018
✨ VIDEO: FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy project , news	4	843	February 13, 2019
Only one entity per example in evaluation dataset ner	1	484	September 19, 2019

only some entities in task recognized in ner.teach

Related topics