Evaluation dataset + patterns

nikolaysm · January 12, 2024, 12:07pm

Hi there,

I'm using ner.manual with patterns to recognize the company names in the text.

prodigy ner.manual ner_company_names nl_core_news_lg ./assets/raw_text.jsonl --label ORG,PERSON --patterns ./assets/company_name_patterns.jsonl

I have a few questions:

Patterns: Can I edit patterns during the ner.manual annotation while the server is running? If I make changes to the file with patterns, should I restart the server and refresh the browser? What effect will it have on the dataset? (Currently, I'm just restarting the server and refreshing the browser.)
Evaluation dataset: The evaluation file contains a few thousand samples. Do I need to perform any annotation, run training, or should it remain as raw text?

prodigy ner.manual ner_company_names_eval nl_core_news_lg ./assets/raw_text_eval.jsonl --label ORG,PERSON

Raw text line is looking like:

{"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam faucibus eros aliquam, laoreet magna et, tincidunt arcu.", "meta": {"source": "lipsun", "id": 4450141}}

Thanks in advance.

magdaaniol · January 16, 2024, 3:54pm

Hi @nikolaysm and welcome to the forum

Re. Patterns

It's true that you can't really update the patterns in the built-in NER recipe while the server is running. Restarting it with the updated patterns file and the same target dataset will apply the updated patterns to the unsaved questions in the dataset.
In other words the new patterns won't be re-applied to the already saved examples.

One way around it is to write a custom NER workflow leveraging the new stream_reset feature and feed the new patterns interactively via custom event.
You can see this in action in our ANN plugin that uses custom events for modifying query over the the indexed dataset. The source code of this solution is available here. It's, of course, not exactly the same problem, but maybe it can serve as an inspiration

Re. Evaluation dataset
The evaluation dataset should contain the right answers, so yes, it should be annotated.

Topic		Replies	Views
ner.manual pattern file usage , ner	1	672	August 21, 2021
NER Complex Entity Web Interface Suggestions usage , ner	4	738	April 27, 2018
how to use ner.correct --update usage , ner , solved	4	684	October 21, 2021
Manual Input of Entities to a prodigy database usage , ner , solved	5	431	July 10, 2021
Corrections on an already annotated NER dataset usage , ner	3	521	December 21, 2022

Evaluation dataset + patterns

Related topics