What’s the difference between an example and a task in Prodigy?

As far as I can make out from the docs, an annotation task is more or less the same as an example. Is the distinction simply that an example is the raw data and a task is the pairing of the data with the annotation by way of the interface?

I also see mention of the word ‘question’ in similar places to example and task as above. (In the REST API, for example). Is that something similar?

It'll be easier to discuss if you have references to the docs, that way we can also confirm if the docs are accurate, but I think I might also be able to clarify to diving deeper into a single annotation.

An Example

So I just ran this command to get a single annotation out of a dataset.

python -m prodigy db-out go-emotions | head -1

This is what that annotation looks like. It's an annotation for a text classification recipe.

{
  "text":"A data scientist extracts information out of usually very big data sets, applying statistics and the likes.",
  "excitement":1,
  "_input_hash":123760682,
  "_task_hash":-355201304,
  "label":"excitement",
  "_view_id":"classification",
  "answer":"reject",
  "_timestamp":1651844384
}

The annotation here is "reject" because the provided text does not imply "excitement".

You'll notice that there are two hashes. These have been added by Prodigy and these serve as a way to determine the "uniqueness" of an annotation, which helps the tool figure out if something has been annotated before. There's one for the "input" and another one for the "task". The annotation example is unique with regard to both. If the same input (which in this case is defined by the hash of the "text" key) is supplied with another task (determined by the hash of the "label" key) then it'll result in a new unique annotation.

I hope this sheds some light on the difference between a "task" and an "example". In my mind, the task is part of an example, but an example will also need to have input attached.

If your impression is that my explanation here is inconsistent with the docs: please let me know! I may have glanced over a detail or we might need to rephrase an item in the docs.

1 Like

Ok so this explains the difference between a task and an example, but then where do the words question and annotation fit in?

(Context: I'm working on something that involves an abstraction for different annotation tools so was trying to compare how the different components are named among the various tools. During this process, I found what seemed like different ways of talking about similar concepts in Prodigy docs.)

The clearest way to show this is maybe the Glossary you include, which e.g. states "Annotation tasks are also often referred to as “(annotation) examples”."

There's frequent mention of 'annotation question' in that same page and elsewhere in the docs.

I guess the precise relationship between these four terms just isn't clear from the docs to me and some of the terms seem to be loosely or interchangeably used.

I fear this is one of those moments in natural language where words are indeed interchangeably used. From my end, I prefer to think in terms of the hashes. Partially because it's more tangible that way, but also because it makes sense that the same sentences can be annotated multiple times if there are multiple tasks for it. This could be multiple labels for text classification but perhaps also NER and spancat annotations on the same sentence.

Since you mention "an abstraction" if there is anything you can share that is specific about that I might be able to help you out in more detail.

Got it.

For the abstraction, I'm leading an effort at ZenML to add annotation into the MLOps lifecycle / workflow for pipelines that we support. I wrote about some of the high-level motivations here and here.

We're starting off with open-source tools since that's what we usually start off with when adding new components / abstractions. Label Studio and Rubrix are what we'll start with. But we build for things to be fully extensible, so I will pretty soon add in a Prodigy integration since it's what I use for my own projects.

1 Like