How to capture discrepancies between two annotators

I want to capture and count the number of samples that had differences between the annotators annotations. Specifically, for an NER task. For example, a discrepancy for my purpose is a difference between the highlighted text. I understand there is a review recipe with a an auto accept function. This auto accept seems to do exactly what I want but rather than just skipping examples that have no differences, I want to count the ones that have differences. How can I do this?

One approach I thought about was to pull the db into python and match up the tokens. Although, seems cumbersome as there may be a lot highlighted text.

Maybe this code:

def filter_auto_accept_stream(
    stream: Iterator[Dict[str, Any]], db: Database, dataset: str
) -> StreamType:
    Automatically add examples with no conflicts to the database and skip
    them during annotation.
    task_hashes = db.get_task_hashes(dataset)
    for eg in stream:
        versions = eg["versions"]
        if len(versions) == 1:  # no conflicts, only one version
            if TASK_HASH_ATTR in eg and eg[TASK_HASH_ATTR] in task_hashes:
            sessions = versions[0]["sessions"]
            if len(sessions) > 1:  # multiple identical versions
                # Add example to dataset automatically
                eg["answer"] = "accept"
                db.add_examples([eg], [dataset])
            # Don't send anything out for annotation
            yield eg

found by exploring the package: python -c "import prodigy;print(prodigy.__file__)"

Hi @klopez !

I think you're on the right track, and using an "offline" Python script should do the trick. You can use db-out and it should provide you with a JSONL output of your database with the highlighted spans. In my opinion, it should be more convenient to work with those if you just want to obtain the differences.

1 Like