should i include the context before and after an entity i want

yishairasowsky · December 12, 2019, 10:38am

i am training to identify the start date of a contract. so should i highlight the text before and after it, because that will teach the model to recognize the key context words? for example,

honnibal · December 12, 2019, 11:09am

You should only highlght the entity, not the context. The model will read the surrounding text, so you don't have to mark it. What you're telling the model when you highlight is, "Predict this span as an entity". You want to make sure your annotations are consistent, so that the model doesn't get confused.

Here's a quick summary of how the model works: we first read the text and come up with a meaning representation for each token in the text (in technical terms: we apply a convolutional neural network to calculate token vectors). We then go over the words left-to-right, and decide whether to start a new entity. If an entity has begun, we decide whether to continue it, based on the current word, the first word of the current entity, and the last word of the current entity.

yishairasowsky · December 12, 2019, 11:53am

OK, great; and thanks for the reply!
do you believe that ner.manual is the best recipe for us?
again, our goal is eventually to extract the start/end dates, payment sums, lessor/lessee names, etc from unseen documents.

yishairasowsky · December 12, 2019, 11:58am

i think that by current word, you mean the right-most, as yet unclassified word.
you only need information about the first and last words of the current entity? what about the middle words? they are not important?

yishairasowsky · December 12, 2019, 12:27pm

many times an entity appears out of context aside from the places it appears in context. by that i mean, for instance, the start date in a contract sometimes appears in a suggestive context, such as "...shall begin on February 1, 2012, and continue until..."; but other times that same date will appear in the same document but with a less obvious context, such as "including any hook-up charges as of February 1, 2012". Should I highlight the dates in both contexts?

Here is an example of the start date out of obvious context.

yishairasowsky · December 12, 2019, 12:29pm

if i understand you, i should not do this

but rather this

yishairasowsky · December 12, 2019, 4:08pm

do you think we should use the rule-based matching?

honnibal · December 17, 2019, 8:14pm

I'm sorry but we can only give quite limited amounts of project advice. We do try to point people in the right direction, but at the end of the day each project will be different.

I do think the second of the two images you posted looks more correct, and it's possible you should use rule-based matching --- but ultimately it's up to you.

Topic		Replies	Views
Training the model and hold the memory of the previous sentences usage	2	453	August 12, 2019
Highlighting spans during text classification annotation ner , solved , legal	8	3509	August 6, 2021
Best practices for NER annotation ner , best-practices	2	752	March 16, 2021
NER Annotation Highlight's nothing at beginning of sentence ner	1	421	October 22, 2018
Can't select entity span in manual interface usage , ner	1	568	June 5, 2019

should i include the context before and after an entity i want

Related topics