After training a first model,
python -m prodigy train ner annotatedh01m en_vectors_web_lg --init-tok2vec ./tok2vec_cd8_model289.bin --output ./tmp_model --eval-split 0.2
Moving onto the step where we label more examples by correcting the model's predictions, I worked through 200 examples ( please see the screenshot below) using ner.correct but the output was that only 31 were saved:
Using 1 label(s): TECH
Added dataset annotatedg06f_correct to database SQLite.
The model you're using isn't setting sentence boundaries (e.g. via the parser or sentencizer). This means that incoming examples won't be split into sentences.
Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating! ^C
Hi! That's strange, I haven't seen this one before. If you run prodigy stats or access the database in Python, how many examples does it show you are in the datasets? Are you using any custom SQLite database? And did you always hit "save" at the end of the annotation session?
Hi,
Thank you for your reponse. Yes I did save everytime. I was guessing that maybe the same examples do not get written resulting in a lower number which makes sense but in multiple sessions I did go through many different examples and corrected them, but saw the output that 31 annotations were saved so it cannot be a coincidence. I re-did 100 annotations just now with ner.correct recipe:
Checking the db directly, at first I have 83 annotations up to now:
Under the hood, Prodigy uses peewee to manage the database connection. The "saved annotations" message is only shown if the database reports that the data was successfully added. So it's unlikely that the databaese connection is broken. Prodigy will also save batches of data in the background as you annotate and if that fails, you'll see an error. This mechanism is the same for all workflows, so I don't think the problem is recipe-specific.
One thing you could try to help get to the bottom of this: if you run Prodigy with the environment variable PRODIGY_LOGGING=basic, you'll see log statements of everything that's going on, including examples saved to the database. If you click through the examples, you should see log statements for the /give_answers endpoint and the controller receiving answers as Prodigy auto-saves in the background. What do those logs say?
Also, can you reproduce this with a new dataset? (You can just click through some examples quickly.) If it turns out that only 31 annotations are saved, are these examples from the start or the end of the dataset? Do you see any pattern here?