ner.correct: Only 31 annotations to database no matter how many actually annotated everytime

Python: 3.7
Prodigy: 1.10.6

Hi,
I followed the guide described in sense2vec reloaded: contextually-keyed word vectors · Explosion and https://youtu.be/59BKHO_xBPA for NER in patent data from link.

After training a first model,
python -m prodigy train ner annotatedh01m en_vectors_web_lg --init-tok2vec ./tok2vec_cd8_model289.bin --output ./tmp_model --eval-split 0.2

Moving onto the step where we label more examples by correcting the model's predictions, I worked through 200 examples ( please see the screenshot below) using ner.correct but the output was that only 31 were saved:

python -m prodigy ner.correct annotatedg06f_correct ./tmp_model g06fsents.3000.txt --loader txt --label TECH --exclude annotatedg06f

Using 1 label(s): TECH
Added dataset annotatedg06f_correct to database SQLite.
:warning: The model you're using isn't setting sentence boundaries (e.g. via the parser or sentencizer). This means that incoming examples won't be split into sentences.
:sparkles: Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating! ^C

:heavy_check_mark: Saved 31 annotations to database SQLite
Dataset: annotatedg06f_correct
Session ID: 2021-02-17_16-56-43

Using a the older ner.make_gold had the same output:

python -m prodigy ner.make-gold annotatedh01m_correct ./tmp_model h01msents.4802.txt --loader txt --label BATT --exclude annotatedh01m

:heavy_check_mark: Saved 31 annotations to database SQLite
Dataset: annotatedh01m_correct
Session ID: 2021-03-07_11-42-12

I have once tried setting up my venv again with the same results and do not understand what the problem might be. Any help is appreciated.
Thank You.

Hi! That's strange, I haven't seen this one before. If you run prodigy stats or access the database in Python, how many examples does it show you are in the datasets? Are you using any custom SQLite database? And did you always hit "save" at the end of the annotation session?

Hi,
Thank you for your reponse. Yes I did save everytime. I was guessing that maybe the same examples do not get written resulting in a lower number which makes sense but in multiple sessions I did go through many different examples and corrected them, but saw the output that 31 annotations were saved so it cannot be a coincidence. I re-did 100 annotations just now with ner.correct recipe:

Checking the db directly, at first I have 83 annotations up to now:

Next I annotate some more (100 in one session):

Then after ending the session I see 31 annotations saved:

And prodigy db-out counts a total 114 annotations now:

So it seems each time only 31 or less annotations are added via ner.correct recipe.

Please let me know if I'm making some mistake.

Thank you.

Yeah, the 31 definitely makes it very strange :thinking:

Under the hood, Prodigy uses peewee to manage the database connection. The "saved annotations" message is only shown if the database reports that the data was successfully added. So it's unlikely that the databaese connection is broken. Prodigy will also save batches of data in the background as you annotate and if that fails, you'll see an error. This mechanism is the same for all workflows, so I don't think the problem is recipe-specific.

One thing you could try to help get to the bottom of this: if you run Prodigy with the environment variable PRODIGY_LOGGING=basic, you'll see log statements of everything that's going on, including examples saved to the database. If you click through the examples, you should see log statements for the /give_answers endpoint and the controller receiving answers as Prodigy auto-saves in the background. What do those logs say?

Also, can you reproduce this with a new dataset? (You can just click through some examples quickly.) If it turns out that only 31 annotations are saved, are these examples from the start or the end of the dataset? Do you see any pattern here?