Reviewing a database with -A and -AS arguments

Hi,

I want to review NER annotations from a database where two annotators participated with named sessions, let's call them jane and joe. In this database, there are three types of annotated examples:

  1. Examples annotated by both jane and joe where both identified the same entities (both agree)
  2. Examples annotated by both jane and joe where they labelled entities differently (both disagree)
  3. Examples only annotated by either jane or joe

When running the review recipe I am getting the following behaviour when using the arguments -A and -AS:

  • With no arguments: I can review all three types of annotated examples mentioned above
  • With -A: From what I understand I should be able to review only types 2 and perhaps 3? I am only seeing type 2 examples
  • With -AS: From what I understand I should see examples of type 1 and 2, that is, examples annotated by both jane and joe regardless of wether they agree on the named entities. However, I am seeing examples of the three types, including the ones only annotated by one annotator (type 3)

Shouldn't the -AS (--accept-single) auto accept examples annotated by a single annotator?

How can set up the review recipe to show examples of types 1 and 2 (annotated by both annotators) and auto accept the ones with a single annotator (type 3)?

The reason I want to do this is to address the annotations where jane and joe disagree, and check that they are annotating correctly even on the ones that they agree.

Thanks.

Hi @ale,

The -A --auto-accept flag automatically accepts examples with no conflict so you should be seeing examples type 2 and 3. I just double checked, though and I need to report that there's a bug there that results in type 3 examples not being shown. It will be fixed in the next release.

The way it currently works, the -AS --accept-single flag only applies of -A is set as well and it additionally filters out the examples annotated by a single annotator, so with -A -AS you should only be seeing examples type 2.

When -AS is used on its own it does nothing, which is why you are seeing all the examples. The CLI helpdoes say it "Also excludes" these examples, but it I admit it might be confusing, we'll improve that for sure.

How can set up the review recipe to show examples of types 1 and 2 (annotated by both annotators) and auto accept the ones with a single annotator (type 3)?

I'm afraid it's currently impossible as the -AS is only applied if -A is specified. One workaround here would be to preprocess the dataset by excluding the examples annotated by single annotator outside Prodigy and then use the preprocessed dataset in review without any flags. If you db-out your dataset and store it on disk, the script could be as follows:

import srsly

# Read data from file
data = srsly.read_jsonl("test.jsonl")

# Group examples by task hash using defaultdict
from collections import defaultdict
task_hash2examples = defaultdict(list)
for eg in data:
    task_hash2examples[eg["_task_hash"]].append(eg)

# Filter examples with more than one annotation
new_data = [eg for examples in task_hash2examples.values() if len(examples) > 1 for eg in examples]

# Write filtered data to a new file
srsly.write_jsonl("only_overlapping_examples.jsonl", new_data)
1 Like