ner.batch train output - Right, wrong, accuracy returned as Zero

I annotated data (50 rows of data, 2 labels) using ner.manual and exported the data. I use the following command to train the model.

prodigy ner.batch-train sampledata en_core_web_sm --output custommodel --label CHEMONE,CHEMTWO --eval-split 0.2 --n-iter 6 --batch-size 4
  1. Once the training is done, the accuracy returned is zero! and even the incorrect entities are displayed as zero. I am not sure about the exact process happening here. Also, the samples used for evaluation is 20% = 7 and 100% (598) for training.

  2. What do the numbers denote here? My understanding is the labels are two and the no. of examples I annotated are 50.

  3. What does ‘unknown’ mean?

  4. How would the entities be (115) here? Are the entities from the data automatically detected and reported?

  5. The accuracy after the 6 iterations is 0.


The ner.batch-train command supports a few different use-cases, which has introduced some usability problems. When we redo this for Prodigy v2, I think we’ll likely split this up into different functions, to avoid some of these common problems.

The specific problem here is that your command is trying to add your new entities into an existing model. It’s reading your annotated data as though you were saying, “There may be other entities here (as predicted by the existing model, e.g. PERSON, ORG, etc). But learn to predict these new entities as well.” I doubt that’s what you’re trying to learn here: instead you just want a model that predicts your two new entities, right?

If you try the following command, I think you should be able to get better accuracy:

prodigy ner.batch-train sampledata en_vectors_web_lg --output custommodel --label CHEMONE,CHEMTWO --eval-split 0.2 --n-iter 6 --batch-size 4 --no-missing

The two important changes are:

  • We don’t use a model with existing NER weights
  • We use the --no-missing flag, to tell the learning algorithm that if the model predicts an entity not in the annotations, it isn’t correct (i.e. the annotations don’t have any missing entities).

@honnibal @ines

I tried the command suggested with --no-missing flag and still got the same output (all the values to zero). The difference this time was event ENTS is zero.

So just to check if my understanding of the process is correct. I have been trying different things for a long time, there is some gap in my understanding of what data I pass to train or I am missing something else.

  • I have sentences in CSV which I load into prodigy, annotate using 2 custom labels (50 rows of data) and saved it.
  • Now I need to train the model using Spacy or Prodigy. Here, I start with ner.batch-train. The data used in the ner.batch-train command is the dataset (annotated data as JSON) saved into Prodigy
  • So this process should give me some accuracy, which is currently giving Zero.
  • If this is successful, I would have to test the model and check P/F/R scores.

can you confirm if what I am doing is right.

I didn’t notice that you only had 50 rows of data. One possibility is that the model simply hasn’t managed to generalise from your data yet. On the other hand, I see from the screenshot you gave that the dataset has 598 training examples. But then, there’s only 7 evaluation samples? It might be that the model is learning things, but just doesn’t happen to get any of the 7 evaluation examples right?

@honnibal That was one of my initial questions too. When I have only 50 rows of data, how is it 598 training examples? Is it considering various tokens from the 50 sentences?

  • For the evaluation set, I split the same training data to --eval-split 0.2. Is it not the case?

I guess it’s possible that the sentence splitting that’s performed had a role in expanding out your 50 input rows into more examples. But it’s still probably worth checking. You could take a quick look at the data with the ner.print-dataset command, just to see what’s in it.


I printed it and it gave me the whole sentences (just the way when I loaded data into prodigy) and the custom tags highlighted in different colors. It is not tokens.

Yes it’s expected that it will show you the sentence inputs, with the spans highlighted. But do the annotations look correct? And how many rows is it printing: 50, or 700 or so?

@honnibal The annotations look correct. I could not print the length but looks like it is 50 rows.

To answer your question above: Yes, the general approach is right. What’s unclear to me is why it’s saying there are so many training examples, when there should be only fifty. You might want to add a couple of print statements to the batch-train recipe (in prodigy/recipes/ within your Prodigy installation) to figure this out?

Apart from that puzzle, I think the 0 accuracy results are easily explained by the fact that you’re working with so little data, both during training and during evaluation.