No tasks available

hi @jeanphilippegoldman!

Thanks for your post and welcome to the Prodigy community :wave:

I think the problem is you uploaded your data db-in into chall_transcripts2 and tried to put your annotations into the same dataset chall_transcripts2. But the db-in step wasn't necessary.

Said differently, first clean out only your Prodigy dataset with your annotations, then rerun your same command:

prodigy drop chall_transcripts2
prodigy spans.manual chall_transcripts2 blank:en ../../../data/prodigy_jsonl/transcripts2.jsonl  --label 'M','U','R','UNK',.... ```

What's happening is you're hitting Prodigy's dedupe as you're trying to annotate examples in your source file (transcript2.jsonl) that are already in your chall_transcript2 dataset.

You would likely see this in your logs

Also, why did you include --use-annotations in your prodigy manual.spans? That's not a default argument so it won't do anything. If you pass a .jsonl with spans, it'll automatically use those annotations.

I can understand that this is a bit confusing. I posted on this previously.

You can interpret this as:

So while it has the error Found and keeping existing "answer" in 0 examples , it's saying that it kept your original "answer" tags for 0 examples because it replaced them for you.

The recipe is keeping a count of the total with the "answer" replaced:

    added_answers = 0
    for task in data:
        task = set_hashes(task, overwrite=rehash)
        if "answer" not in task or overwrite:
            task["answer"] = answer
            added_answers += 1
        examples.append(task)

Then printing out at the end:

    n_total = len(examples)
    msg.good(
        f"Imported {n_total} annotated examples and saved them to '{set_id}' "
        f"(session {session_id}) in database {DB.db_name}",
        f'Found and keeping existing "answer" in {n_total - added_answers} examples',
    )

So that number is n_total - added_answers or the total number of records minus those with an added answer. Since your data didn't have any answers, added_answers was for the all records (aka, the same as n_total), hence why it was showing 0.

What's important is that you should see in the output one line above that the same number of annotations (1838) were still loaded into the database and automatically populated as "accept".

Fyi, if you want to view the recipe, you can find the recipe by finding your path to your installation run Prodigy stats, look for Location, then look for the recipes/commands.py and find the db-in recipe. You can do the same for the other built-in recipes.

But again, you really don't need this db-in if you're trying to correct these labels as you can simply load them from your source file directly.

One other tip - we typically wouldn't recommend having more than 5-7 labels at a time -- the cognitive load to switch to those can be a bit challenging. However, if you need to have more than 7 labels, I recommend creating a labels.txt with the name of your labels on new lines:

# labels.txt
M
U
R
UNK
ADJ
...

Then pass that labels.txt instead of the raw label names in your Prodigy command like --labels labels.txt. That'll minimize the error of a fat finger error of your labels.