What do the accept, reject and ignore buttons do?

I was looking through the docs and I couldn't find an actual explanation of what the different buttons do.

My sense of this right no:

  • 'accept' -- add an entry into the database, with annotations, along with the 'accept' attribute
  • 'ignore' -- don't add anything into the database
  • 'reject' -- add an entry into the database, with annotations, along with the 'reject' attribute

If I'm annotating a computer vision dataset with image.manual, when would I ever want to be using 'reject', in other words? I am thinking maybe the different buttons are used for when you want to check pre-annotated data?

If I'm annotating everything for the first time, I'm basically just going to be using accept and ignore, right?

Thank you!

Hi @strickvl ,

You're right that when drawing bounding-boxes, ACCEPT and IGNORE should suffice. Say in an image.manual task, we want to draw bounding-boxes for all "cats":

  • Accept: "these are all the cats in this image, I have drawn bounding boxes for each of them, please save them in the database"
  • IGNORE: "this image is corrupted and not in good quality. I want to remove this from both my training and test set." It can also just mean: "I don't know the answer and I just want to move on."

In the context of computer vision, you can use REJECT in the following cases:

  • Simple binary image classification: "is this an image of a cat?" ACCEPT means "yes", REJECT means "no", and IGNORE is for corrupted / weird images.
  • Reviewing bounding-box annotations: you can either correct them or just REJECT the given annotation.
2 Likes

Thank you for this explanation. Is clear and useful. I wonder if some version of that might not be useful in the docs?

Hello @ljvmiranda921 ,

What would be the meaning of "REJECT" for ner.manual task?

Thank you.

Hi @dave-espinosa ,

In this case, REJECT means "the text itself may be corrupted / weird and let's not use it to train our model."

Hello @ljvmiranda921, and thank you very much for your quick response!

I was following Prodigy documentation (more specifically the suggestion "When you view or export your data later, e.g. with db-out, you can then explicitly filter out those examples and deal with them." provided here), because I was looking towards a robust way to separate "ACCEPTED", "REJECTED" and "IGNORED" texts, for later analysis. I exported a small annotation job (ner.manual), consisting of 5 texts, in which I purposedly "IGNORED" the first, "REJECTED" the second and "ACEPTED" the 3 latest. BTW, I am using jsonlines python library to manually open and handle those files. I tried the following script to see what I was obtaining:

# 'demo_df' is the file name I am analyzing here:
with jsonlines.open('demo_df.jsonl') as reader:
    for obj in reader:
        print(obj.keys())

For which I obtained the following output:

dict_keys(['text', '_input_hash', '_task_hash', '_is_binary', 'tokens', '_view_id', 'answer', '_timestamp'])
dict_keys(['text', '_input_hash', '_task_hash', '_is_binary', 'tokens', '_view_id', 'answer', '_timestamp'])
dict_keys(['text', '_input_hash', '_task_hash', '_is_binary', 'tokens', '_view_id', 'spans', 'answer', '_timestamp'])
dict_keys(['text', '_input_hash', '_task_hash', '_is_binary', 'tokens', '_view_id', 'spans', 'answer', '_timestamp'])
dict_keys(['text', '_input_hash', '_task_hash', '_is_binary', 'tokens', '_view_id', 'spans', 'answer', '_timestamp'])

From that result, I have the following questions:

  1. I was hoping to see "a larger difference" between the "IGNORED" and the "REJECTED" texts, is "OK / normal" what I am getting here?
    1.1. If the output is indeed correct, how to difference an "IGNORED" from a "REJECTED" text?
  2. If I use "Prodigy defaults", will I always have the same keys than the ones shown in my current experiment (i.e., 'text', '_input_hash', '_task_hash', '_is_binary', 'tokens', '_view_id', 'spans', 'answer', '_timestamp')?
    2.2. In which case those keys would vary?

Thank you very much for your support!

Hi @dave-espinosa !

  1. Curious as to what "a larger difference" means. If you meant difference in the context of ner.manual, IGNORE may mean "I'll skip this for now because I'm not sure what the answer / entities are, I just want to move on", and REJECT may mean let's not use these samples in our dataset.
  2. These hashes should be the same if you're using ner.manual (or if you're just running Prodigy by default). They may vary if you customized something in your recipe or did something different when saving into the database.

Hello @ljvmiranda921 , hope you are doing fine! Sorry about the delayed reply :grin:

Regarding your question:

Curious as to what "a larger difference" means.

First things first, the concepts are clear, sorry about the confusion :sweat_smile:.

My goal is to separate a set (not sure if the nomenclature is correct) of annotated samples, by "only ACCEPTED", "only REJECTED" and "only IGNORED", for later analysis. I checked the documents, and it seems not to exist any command to do that directly by CLI, for which I decided to do it more or less manually. Since I am just taking my first steps with Prodigy, I decided to annotate a small batch of texts, finding after my small experiment that each jsonline contains a dictionary, in which I thought I was gonna find some flag or key, stating a sample was "ACCEPTED", "REJECTED" or "IGNORED", but I could not find any; later on, I thought I was gonna find some difference between the number of keys in the "REJECTED" and "IGNORED" samples, but they seem to be equal.

How to do separate those 3 sets?

Thank you very much.

Hi! Sure no worries :smiley:

, in which I thought I was gonna find some flag or key, stating a sample was "ACCEPTED", "REJECTED" or "IGNORED", but I could not find any

I presume you're looking for the answer key? You can filter from that, something like:

for example in data.items();
    if example["answer"] == "accept":
        # do something
    elif example["answer"] == "ignore":
        # do something

You can find an example JSONL file here and you can check that the answer key holds the value if that particular example was accepted or ignored. Note that you can get this JSONL file whenever you run db-out.

So to recap, after annotation, export your dataset into a JSONL file using db-out, and you can write a script to filter the annotator's response by looking at the "answer" key. Lastly, can you show an example of your dictionary? The number of rejected and ignored samples can be equal if the annotator did so.

1 Like

Hello @ljvmiranda921 , hope you are doing fine!

Thank you very much, that is exactly what I was looking for!

Please consider my original query solved.

Best Regards!

Hi,

Maybe this is better suited for an independent question, but I've come back to this thread a few times now. I think some clarification on the meaning of these buttons depending on the type of task is necessary somewhere in the documentation.

For instance, when using prodigy train from the output of the annotations, I believe we drop anything tagged with an IGNORE attribute from the training data, but then how are the REJECT examples treated? Does this change by annotation task? If not, then shouldn't the definition of them be the same for all annotation types?

Thanks!

hi @vsocrates!

Thanks for your question and your feedback! I've taken down a note to update docs or new content on the buttons.

I think it's better to think categorize different meanings for the "answer" key by manual (non-model) vs. binary (model in the loop) rather than the task (e.g., ner, spancat, etc.)

For both textcat.manual and ner.manual, "REJECT" means about the same: "The REJECT button is less relevant here, because there’s nothing to say no to – however, you can use it to reject examples that have actual problems that need fixing, like noisy preprocessing artifacts, HTML markup that wasn’t cleaned properly, texts in other languages, and so on."

You can view those in the FAQ in the docs (textcat and ner).

However, for binary labels like ner.teach, ACCEPT/REJECT mean what they sound like.

If the suggested entity is fully correct , you can hit ACCEPT. If it’s entirely or partially wrong , you can hit REJECT. It’s important to only accept suggestions that are fully correct.

That's a fair question. For our upcoming v2, we're rethinking several designs including the teach commands. If so, then most of the recipes would have a more consistent ("manual") definition of the buttons. Regardless, we'll definitely take this feedback as we're working on v2.