NER document Labeling

mystuff · July 18, 2019, 11:29pm

i finally fixed my issue by adding all white space chars like \r\t. not just " ". when i ran ner.batch-train below is the output. I have used default batch size. Also there is no duplicates in data.

Correct 420
Incorrect 419
Baseline 0.000
Accuracy 0.501

How do i improve accuracy? is it by adding more data(currently it has 150.)?

ines · July 19, 2019, 8:39am

Yes, adding data should definitely be the first step. 150 examples is very low, so you won’t be seeing very reliable results.

mystuff · July 19, 2019, 11:00am

Thought so. but just want to confirm. Thanks for the reply. Your reply means a lot, you gave me confidence that i am on right direction.

mystuff · July 31, 2019, 9:03am

Hello @ines , I have increased the dataset from 150 to 300 using ner.manual.
Annotated new 150 and merged those with previous 150.
python -m prodigy ner.batch-train dataset_300 en_core_web_sm --output model_300 --label ........
The accuracy only increased 0.8%. May i know where i am doing wrong?. Is there a way to debug the accuracy?

dataset_150:
Correct 420
Incorrect 419
Baseline 0.000
Accuracy 0.501

dataset_300:
Correct 831
Incorrect 597
Baseline 0.000
Accuracy 0.582

ines · August 1, 2019, 8:55am

300 examples is still a very low number of examples. To really be able to trust your results, you typically want a lot more - maybe like 1000 or 2000.

If you haven't seen it yet, check our my NER flowchart for some more tips:

mystuff · August 1, 2019, 10:21am

Thanks for the detailed Flowchart. In that flow chart, it says 1000 sentences not 1000 documents. am i right?. I have more than 4000 sentences in my dataset.

Topic		Replies	Views
HTML to jsonl and NER task workflow usage , ner , solved	6	851	July 19, 2019
revising annotation by prodigy--here only one label (DATE) usage , ner , solved	16	1931	May 20, 2019
Create a dataset out of many txt_files documents (Best Practice) usage , ner , best-practices	4	1821	March 30, 2021
NER - basic model doubt ner	13	386	February 22, 2024
Best approach for using ner manual and mark usage , ner , solved	22	2345	January 20, 2020

NER document Labeling

Related topics