Restart Text classification and want to add additional labels

My apologies if there is already an article about this.

I have a case where multiple annotators have already annotated a couple of thousands of examples.
I am using my own Tensorflow model, so it is a straightforward case where annotations are logged in the DB for later consumption.

Is it possible to restart the annotation task, presenting the examples already in the DB but with the option of tagging with the additional labels?

Apologies if the answer is straight-forward, could you then point me in the right direction?

Hi! You should be able to use the existing annotations as the input data when you restart the server – just make sure you're saving the annotations to a new dataset. Otherwise Prodigy may skip the examples that are already annotated (which typically makes sense because you don't want to be asked the same question twice).

If you're running Prodigy v1.10+, you can use the dataset:your_dataset_name as the source argument (instead of the file). In earlier versions, you can run db-out to export the annotations as JSONL, and then use that as the source. If you've already annotated labels, those will be pre-selected in the UI.

Here's an example:

prodigy textcat.manual dataset1 your_data.jsonl --label A,B,C
# Re-annotate the everything with an additional label
prodigy textcat.manual dataset2 dataset:dataset1 --label A,B,C,D 

Great! Thank you for the quick response.

Dirk

I am pushing it, but what would happen in the following case:

prodigy textcat.manual dataset1 your_data.jsonl --label A,B,C

Re-annotate everything with a DIFFERENT label

prodigy textcat.manual dataset2 dataset:dataset1 --label A,B,D,E

This should work just – you may just end up with leftover C labels in the underlying JSON if you're annotating with multiple choice options and you're not making a change in the UI.

When you select an option, Prodigy will re-compute the accepted choices, based on what's selected in the UI. But if you're not changing anything in the re-annotation round, it's possible you end up with C in the list of selected options, unchanged. But that should be easily to filter out programmatically if you know that C isn't a valid label anymore.