Difference number examples dataset and batch-train

ines · August 28, 2019, 2:58pm

Hi! I wrote about this in some more detail here:

I'm not sure what's in your data and how many unique examples you have – you could check that looking at how many unique input hashes there are:

from prodigy.components.db import connect
db = connect()
input_hashes = db.get_input_hashes(["energy_patterns"])
print(len(set(input_hashes)))

You could also set PRODIGY_LOGGING=basic to see if anything else is being skipped.

Topic		Replies	Views
Question about example data during ner.batch-train ner , spacy	2	629	July 29, 2019
Model Training & Dataset Exploration usage , ner	1	993	June 12, 2019
Debugging NER - batch_train with custom dataset ner	5	619	October 16, 2019
Which number of training labels should I trust	1	377	November 10, 2022
Deleting certain annotation sessions usage , database	1	1329	January 20, 2019