Resume Annotation Session with Prodigy - Text Classification

BaringsDS_OGK · June 14, 2018, 3:57pm

Hello,

I am currently utilizing my company’s license for Prodigy to annotate some data (for text classification) that I am interested in later analyzing with Spacy. As part of the annotation process, is there a way to resume annotation where you left off from your previous session ?

(Ex. If I do 200 annotations on Day 1, I would like to resume at 201, and not have to worry about re-annotating data from the first set of 200. I noticed when using the software yesterday, I had to repeat annotating certain data points during my second annotation session.)

Please advise.

ines · June 14, 2018, 4:20pm

Hi! By default, Prodigy tries to make as little assumptions about your existing dataset as possible – but you can tell it to explicitly ignore annotations present in one or more datasets using the --exclude option. For example:

prodigy textcat.teach your_dataset en_core_web_sm data.jsonl --label XXX --exclude your_dataset

This will exclude all examples that were already annotated in the dataset your_dataset (i.e. the current one you’re also saving your annotations to). The exclude mechanism is also useful when you’re creating evaluation sets, to make sure that no training examples accidentally end up in your evaluation data (or vice versa).

If you’re using an active learning-powered recipe like textcat.teach, you’re also training a model in the loop. So if you want to restart with the same model state, you can also pre-train the base model with the existing annotations and then use this model as the starting point. For example:

prodigy textcat.batch-train your_dataset en_core_web_sm --output /path/to/model
prodigy textcat.teach your_dataset /path/to/model --label XXX -exclude your_dataset

If you’re using a custom recipe, you can specify the dataset name(s) to exclude as the 'exclude' setting returned by your recipe. This can be a list of one or more strings:

@prodigy.recipe('custom-recipe')
def custom_recipe(dataset):
    # your recipe here
   return {
        'dataset': dataset,
        'exclude': [dataset],  # always exclude the current set
        # other recipe config here
   }

Topic		Replies	Views
Textcat - same data keeps appearing usage , textcat	3	517	July 23, 2019
Text Classification, queue contains already labeled annotations when session is closed/reopened textcat , solved	3	481	June 6, 2018
--exclude in textcat teach is not working as expected. textcat , more-info-needed	2	398	December 15, 2020
textcat.teach showing same text twice (and not using active learning?) textcat	15	2300	August 15, 2018
Automating the annotation for textcat.teach base on score usage , textcat	4	1049	October 25, 2017

Resume Annotation Session with Prodigy - Text Classification

Related topics