Accessing (un)finished annotations after local host internet outages

taylor26 · May 8, 2023, 5:44am

Hi team,

I tunneled my local port to annotators for tagging dataset_A. Unfortunately, my local host's internet was down, so I needed to restart the Jupyter Notebook session.

After restarting, how can I access the already finished taggings of dataset_A? And how can the annotators continue with the unfinished part of dataset_A?

Will "!python -m prodigy textcat.manual dataset_A input.jsonl --label X Y Z" allows annotators to continue tagging dataset_A from where they left off?

Thanks a lot!
Taylor

koaning · May 8, 2023, 9:42am

Hi Taylor,

in theory your annotators should be able to continue when you restart the server because Prodigy comes with a hashing mechanism that keeps track of which examples have already been annotated. Prodigy checks the database for hashes which it uses to filter the stream of examples. Effectively, that means that examples with the same hash are just skipped locally.

It's explained in more detail here:

If you have follow up questions related to this: feel free to ask!

taylor26 · May 9, 2023, 2:07am

Thanks for the response. I'd appreciate further guidance on making my annotators reconnect to dataset_A.

Should I use the following code and share the link with them, and they can continue from where they left off? Or should I use a different code? I'm worried that the code I'm using will overwrite existing annotations and create a whole new dataset_A.

python -m prodigy textcat.manual dataset_A input.jsonl --label X Y Z

Thanks a lot!
Taylor

koaning · May 10, 2023, 9:00am

First, one comment about your line of code.

python -m prodigy textcat.manual dataset_A input.jsonl --label X Y Z

I think it should be this:

python -m prodigy textcat.manual dataset_A input.jsonl --label "X,Y,Z"

The --label param needs a comma separated string to denote the labels.

Should I use the following code and share the link with them, and they can continue from where they left off?

That should just work, yes. Note; are you annotating with/without overlap at the moment? You might get duplicate annotations if feed_overlap is to true, but this might also be a good thing. This way, you can check if your annotators agree on the annotation. More details on this can be found here:

I'm worried that the code I'm using will overwrite existing annotations and create a whole new dataset_A.

Do you have evidence of this?

Topic		Replies	Views
Restart Text classification and want to add additional labels usage , textcat , solved	4	774	July 24, 2020
Resume Annotation Session with Prodigy - Text Classification textcat	1	1643	June 14, 2018
How do I reload a dataset to access and continue labelling? usage , solved	5	790	July 28, 2020
Textcat - same data keeps appearing usage , textcat	3	518	July 23, 2019
Resuming annotation with a model in the loop usage , solved	2	1312	March 6, 2018

Accessing (un)finished annotations after local host internet outages

Related topics