`ner.correct` doesn't show the full text

Sergio.Marrero · March 9, 2021, 11:50am

Hi.

I am working with the ner.correct recipe and it has an unexpected behaviour. Instead of show the complete text, it only shows a part of the text. Even it could only show just one or two words.

Let me show with an example what I mean:

Expected:
This is just an example about what I expected to happen.

what actually happened:

one annotation:
This is just:

second annotation
an example about what I expected

third annotation:
to happen

is it normal or weird behaviour?

Thanks in advance.

Sergio M.

ines · March 10, 2021, 1:46am

Hi! By default, ner.correct will use the spaCy model to segment the text into sentences. You can disable this by setting the --unsegmented flag. Just make sure that the text you feed in is reasonably segmented.

Is this the actual text you're using? If so, that's definitely unexpected sentence segmentation behaviour.

Sergio.Marrero · March 10, 2021, 9:49am

Hi.

Let me show you the full sequence of commands:

prodigy ner.manual ner_positions es_core_news_lg some_data.jsonl --label POSITIONS --patterns some_patterns.jsonl

prodigy train ner ner_positions es_core_news_lg --output ./models/tmp --eval-split 0.2

prodigy ner.correct ner_positions_correct ./models/tmp some_data.jsonl --label POSITIONS --exclude ner_positions

The last step is when arrise the unexpected behaviour. Let me drop an actual example:

This is the full document stored as jsonl:

{"text": "Cocinero/Chef\n\nEmpresa de servicios precisa de cocinero con experiencia en colectividades para incorporación inmediata en nuestro centro ubicado en Vigo. Perfil ideal es el de una persona con formación profesional grado superior en hostelería y turismo, Grado Superior de Dirección de Cocina, Grado en ciencias grastronómicas o similar, con capacidad de liderazgo, bien organizada, capaz de manejar las situaciones estresantes, siendo meticuloso en en sus tareas y manteniendo el control de las manos.\nImprescindible dominar la gestión y dirección de una cocina.\nSalario según experiencia y formación", 
 "meta": {"xx": "xx",
              "xx": "xx", 
              "xx": "xx",
              "xx": "xx"}
}

(I put xx to hide some information)

And this is what prodigy shows when ner.correct is called:

As you see, instead of show the full text, it only take a random sentence...?

Thank you in advance.

Sergio M.

ines · March 10, 2021, 10:03am

Yes, that definitely looks like it's the sentence segmentation. By default, ner.correct will split the text into sentences. Since you're excluding the dataset ner_positions, Prodigy may be skipping the first sentence, since an example with that input hash is already in the dataset.

If you set --unsegmented when you call ner.correct, segmentation will be disabled and you'll see the full example.

Sergio.Marrero · March 10, 2021, 11:50am

you are right! fixed!

Thank you very much!

Sergio M.

Topic		Replies	Views
Implementing ner.correct says the model you are using isn't setting sentence boundaries ner , solved	8	363	July 24, 2023
Prodigy sentence splitting during ner.correct usage , ner , spacy	3	428	February 24, 2021
Strange text segmentation with ner.teach recipe usage	7	596	September 9, 2019
Error while using ner.correct usage , ner	4	1055	January 19, 2020
Getting warning while using ner.correct usage , ner , solved	2	533	April 2, 2020

`ner.correct` doesn't show the full text

Related topics