Users and data in prodi.gy

Hello,

I found you quite late, and I didn’t have time to evaluate the beta version of prodigy. Now I am looking for a comparison tool with very specific requirements, and please let me know if prodigy can be configured to meet these. Please let me know if this is the appropriate place to ask or whether there is any sales contact.

Description:

  1. The task is about comparing pairs of data (location details) and decide whether 2 locations are referring to the same point of interest. For this task, for each pair to compare, I want to show the “annotators” (some of) the details of 2 locations (e.g. address, which might be written different in the two instances, but be the same). The annotator should decide whether these locations match, or don’t match or cannot tell. From UI perspective, this is pretty much similar to the A/B evaluation.

  2. I want to have 2 annotators (ann1, ann2) and 1 supervisor (super). Both annotators will tag the same set of pairs, and the supervisor will resolve any disagreements or will be able to review the work made by the annotators.

  3. At the end of the day, I should be able to export the IDs of the locations and the decision of the annotators.

Questions:

  1. Is there user’s access control in prodi.gy? E.g. ann1, ann2 and super should have different accounts, in terms of holding different decisions over the data.

  2. How the data is provided to the prodigy? Via some kind of SQL?

  3. How can I control the data (pairs) served to each user?

Thank you,
Gerasimos

Hi,

Thanks for the questions. And yes, this is the best place to ask from our perspective — it’s very useful to us if we can have the information in one place, so that other people can read the answers as well.

It should be easy to set up the task the way you want in Prodigy. You can write custom Python functions (which we term a “recipe”) to organise the annotation task. The recipe function just has to return a dictionary of components. The components define the sequence of examples, which front-end view to use, and optionally a callback for when the examples are received.

For your use-case, you’ll probably want two recipes: one for the annotator, and one for the supervisor. The annotator recipe might look something like this:


@recipe('locs.annot',
  dataset=("Name of dataset to store the annotations",),
  annotator_id=("ID of annotator who will work on the task",),
  db_string=("Connection string for source data",)
)
def annotate_locations(dataset, annotator_id, db_string):
    data = get_my_data(db_string) # Load your data however you like
    stream = make_tasks(data)
    stream = add_annotator_id(stream, annotator_id)
    return {
        'dataset': dataset,
        'stream': stream,
        'view_id': 'classification' # Could use one of the other modes, but to keep the example simple...
    }    

def make_tasks(data):
    '''Make annotation tasks for Prodigy. The specifics will depend on how your 
    data is structured, and which view ID you use.'''
    for loc1, loc2 in data:
        task = {'text': "{}\n\n{}".format(loc1, loc2), "label": "Same place?"}
        yield task

def add_annotator_id(examples, annotator_id):
    for eg in examples:
        eg['annotator_id'] = annotator_id
        yield eg

Once you have your recipe, you can run it from the command-line, with something like prodigy locs.annot location-pairs annotator1 /path/to/my/db/settings -F my_recipe.py. This would start the annotation server, saving the annotations in the database under the dataset name you provided. You would then put up a second instance of the server on a different port for the second annotator. You could have both annotators writing to the same dataset, or you could give them their own datasets, whichever makes the adjudication easier. Once the annotations are collected, you would then run another server for the adjudication. The built-in A/B evaluate recipe should be perfect for this, but you can write your own recipe if you need something slightly different.

Thank you Matthew :slight_smile: