`ner.correct` doesn't show the full text

Hi.

I am working with the ner.correct recipe and it has an unexpected behaviour. Instead of show the complete text, it only shows a part of the text. Even it could only show just one or two words.

Let me show with an example what I mean:

Expected:
This is just an example about what I expected to happen.

what actually happened:

one annotation:
This is just:

second annotation
an example about what I expected

third annotation:
to happen

is it normal or weird behaviour?

Thanks in advance.

Sergio M.

Hi! By default, ner.correct will use the spaCy model to segment the text into sentences. You can disable this by setting the --unsegmented flag. Just make sure that the text you feed in is reasonably segmented.

Is this the actual text you're using? If so, that's definitely unexpected sentence segmentation behaviour.

Hi.

Let me show you the full sequence of commands:

prodigy ner.manual ner_positions es_core_news_lg some_data.jsonl --label POSITIONS --patterns some_patterns.jsonl

prodigy train ner ner_positions es_core_news_lg --output ./models/tmp --eval-split 0.2

prodigy ner.correct ner_positions_correct ./models/tmp some_data.jsonl --label POSITIONS --exclude ner_positions

The last step is when arrise the unexpected behaviour. Let me drop an actual example:

This is the full document stored as jsonl:

{"text": "Cocinero/Chef\n\nEmpresa de servicios precisa de cocinero con experiencia en colectividades para incorporación inmediata en nuestro centro ubicado en Vigo. Perfil ideal es el de una persona con formación profesional grado superior en hostelería y turismo, Grado Superior de Dirección de Cocina, Grado en ciencias grastronómicas o similar, con capacidad de liderazgo, bien organizada, capaz de manejar las situaciones estresantes, siendo meticuloso en en sus tareas y manteniendo el control de las manos.\nImprescindible dominar la gestión y dirección de una cocina.\nSalario según experiencia y formación", 
 "meta": {"xx": "xx",
              "xx": "xx", 
              "xx": "xx",
              "xx": "xx"}
}

(I put xx to hide some information)

And this is what prodigy shows when ner.correct is called:

image

As you see, instead of show the full text, it only take a random sentence...?

Thank you in advance.

Sergio M.

Yes, that definitely looks like it's the sentence segmentation. By default, ner.correct will split the text into sentences. Since you're excluding the dataset ner_positions, Prodigy may be skipping the first sentence, since an example with that input hash is already in the dataset.

If you set --unsegmented when you call ner.correct, segmentation will be disabled and you'll see the full example.

1 Like

you are right! fixed! :slight_smile:

Thank you very much!

Sergio M.

1 Like