active learning + two token entities: should I reject OR accept partially correct predictions at the beginning of learning?

andrei.volkau · November 7, 2018, 6:43pm

Intro:

I want to extract Last Name and First Name in one entity from resume.
I am working with Resume (Curriculum Vitae) documents.
I am working on improvement of existing “PERSON” category.

Problem formulation:
en_core_web_sm model does not recognize Last Name and First Name as one token at the beginning of learning. The model recognizes a lot of irrelevant tokens as PERSON entity at the beginning of learning. Fortunately sometimes the model recognizes First Name as a PERSON (see the screenshot).

Question:
Which option is better?

Option 1:
Accept partially correct entities at the beginning o learning. It means press green button for the case illustrated on the screenshot. The approach should allow model to pay more attention on relevant tokens (I mean the model will pay more attention on real Last Names and First Names). So the model will pay less attention on irrelevant tokens like “Java”, “Visio”, “Jira” and so on.

As soon as the model starts pay more attention into tokens related to the real Last Names and First Names, I should start rejecting partially correct predictions. So I will try to explain the model that it should learn two token entities.

Option 2:
Reject partially correct entities. So the model will start learning two token entities, but in meantime I will need to reject a lot of irrelevant suggestions also. I will need to reject a lot of irrelevant prediction because of the model will try to understand what I want it to learn. So it will suggest a lot of irrelevant entities like “Java”, “Visio”, “Jira” and so on.

Thank you in advance for choosing the best option and explaining your choice.

ines · November 8, 2018, 9:38am

We'd definitely recommend option 2 – you should always reject partial suggestions. The active learning-powered recipes will look at all possible analyses for the parse, so the correct boundaries are likely in there – it might just not be the suggestion you see first. By being very "strict" and rejecting inomplete suggestions, you tell the model to "try again" and will move it towards the correct boundaries.

Also see this thread:

andrei.volkau · November 8, 2018, 11:06am

OK, thank you very much for the complete explanation.

Topic		Replies	Views
only some entities in task recognized in ner.teach usage , ner , solved	7	469	November 30, 2020
✨ VIDEO: FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy project , news	4	843	February 13, 2019
Ignore or reject in text with many entities usage , ner , solved	2	1574	July 30, 2018
Accepting and rejecting in ner.teach recipe usage , ner , solved	1	415	January 3, 2020
How to score incompletely highlighted entities? usage , ner , solved , best-practices	2	1362	June 20, 2018

active learning + two token entities: should I reject OR accept partially correct predictions at the beginning of learning?

Related topics