Image classification (choice) - Duplicated images

ines · May 16, 2019, 11:01am

In the "controller", so after your recipe function was executed and has returned its components, and before Prodigy starts up the annotation server.

Yes, absolutely. The entire task dictionary will be saved in the database, and you can get all existing annotations for a given dataset in the database. Let's say your examples look like this:

{"text": "Hello world", "meta": {"id": 123}}
{"text": "Blah blah", "meta": {"id": 456}}

When you annotate them, they'll be saved to the dataset. In your recipe, you can then call db.get_dataset to load them and get the meta.id field from each examples. You now have a list of values that you can compare the incoming examples against.

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset(dataset)
# Get the meta.id field for each example
meta_ids = [eg["meta"]["id"] for eg in examples]

def filter_stream(stream):
    for eg in stream:
        if eg["meta"]["id"] not in meta_ids:
            yield eg

If you can express it in Python, you can pretty much add any conditional logic here. It's probably not very useful, but you could even send an example out if its text is longer than X characters, of if it was annotated before but rejected and its ID is Y and some other custom meta property is Z. Or you could send a certain example out only if today is Monday or Tuesday

Topic		Replies	Views
Seeing the same images that have already been annotated usage , image , solved	3	743	November 11, 2020
Duplicate images in image.manual image , streams	1	447	December 6, 2021
Manual Image Annotation: Duplicate Image usage , ner	4	255	July 26, 2022
duplicate images when annotating done , image , streams	7	1144	September 8, 2020
Multiple Questions Per Image custom , front-end	2	1164	January 8, 2018

Image classification (choice) - Duplicated images

Related topics