total_examples_target pulls the number of docs in the dataset instead of being hard coded

Is there a way to pull the number of documents per dataset instead of hardcoding the target_examples_target field in prodigy.json? I have multiple recipes using the same prodigy.json and each dataset contains a different amount total documents. Thanks!

hi @cheyanneb!

Would this work to get the size of your dataset?

from prodigy.components.db import connect
db = connect()
examples = db.get_dataset("my_dataset")
n_dataset = len(examples)

You could then use that value to modify your prodigy.json, as an override or add it to a custom recipe return dict like:

    return {
        "dataset": dataset,          # the dataset to save annotations to
        "stream": stream,            # the stream of incoming examples
        "config": {
            "total_examples_target": n_dataset, # add dataset count
            ...
        }

Let me know if this helps or if you have further questions!