Implications of "answer": "accept" being automatically added to all imported examples with db-in

Edit: I've now determined that the issues I initially reported are not down to the data, so I've removed those references. I'll ask about those particular issues on the spaCy discussion board.

Hi,
I've been trying to migrate a from an earlier version of Prodigy running spaCy 2.3 to the latest Prodigy (1.11.11) working with spaCy 3.5.

In preparation for further annotation and to train a new model, I used db-out on the old version of Prodigy to export the datasets as a .jsonl file. I then used db-in to import this .jsonl file into the latest Prodigy installed within a fresh environment. I did not use the "answer": "accept" value.

However, I'm wondering if this the right thing to do when moving data over form one version of Prodigy to another, as the recipe at Built-in Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP features an "Important Note" stating:

Because all examples in Prodigy need an "answer" value, "answer": "accept" is automatically added to all imported examples, unless specified otherwise in the data or via the --answer argument.

By using db-in without "answer": "accept", have I now added "answer": "accept" to unannotated entries somehow?

Thanks in advance,
Gruff

hi @gruff!

Thanks for your questions! Yes, please reach out to the spaCy discussion board for training specific questions.

I still think your general questions about the "answer" keys and how to handle them are important (likely other users have had similar questions about what's their function).

The "answer" corresponds to the annotation choice users can decide for each example: "accept", "reject", "ignore".

This has a slightly different interpretation per task; like for example, binary text classification the "accept" and "reject" can mean positive and negative examples of a class. But for many of the recipes, "answer": "accept" means you want to use this data in training (or evaluation) data.

Thanks for your reply Ryan, appreciate it.

I understand that the "answer" corresponds to the user's annotation choice. My question concerns the fact that the documentation for db-out suggests that "answer": "accept" is applied to data exported with db-out even where users are yet to annotate the data entry. Is that the case?

If so, that seems undesirable for my use case of transferring a dataset from one version of Prodigy to another, as I do not want the data to be changed.

What is best practice for moving data from one instance of Prodigy to another?

Yes. That's because db-out was designed to export out only accepted annotations, typically if a user wanted to train in a different system. It wasn't designed to pull out unannotated data.

Depending on your version of Prodigy, you may not need to do so or unless you were changing your backend database (e.g., had annotations originally stored in SQLite within Prodigy but now want to move to PostgreSQL in Prodigy). I haven't really tried for older version so I would love to know if there were such a case.

If you wanted the unannotated data out, could you not use Prodigy's database components?

from prodigy.components.db import connect
import srsly

db = connect()
examples = db.get_dataset("my_dataset")
srsly.write_jsonl("my_file.jsonl", examples)
1 Like

That's because db-out was designed to export out only accepted annotations, typically if a user wanted to train in a different system. It wasn't designed to pull out unannotated data.

Ah, I see! To be fair to the documentation, it does say:

Export annotations in Prodigy’s JSONL format.

I think I took the name db-out to mean that this was a general method for exporting 'annotations-in-progress' (although such a concept doesn't really exist). Your use of 'accepted annotations' definitely helps clarify the situation.

Thank you for the suggestion to use Prodigy's database components. I'll have to check but I believe the old database was in PostgreSQL, whilst the new database is SQLite. I'll give it a go and report back.

Thanks again for your help!