i finally fixed my issue by adding all white space chars like \r\t. not just " ". when i ran ner.batch-train below is the output. I have used default batch size. Also there is no duplicates in data.
Hello @ines , I have increased the dataset from 150 to 300 using ner.manual.
Annotated new 150 and merged those with previous 150.
python -m prodigy ner.batch-train dataset_300 en_core_web_sm --output model_300 --label ........
The accuracy only increased 0.8%. May i know where i am doing wrong?. Is there a way to debug the accuracy?
Thanks for the detailed Flowchart. In that flow chart, it says 1000 sentences not 1000 documents. am i right?. I have more than 4000 sentences in my dataset.