Enable to annotate same input text twice


It seems that the default behavior of the prodigy will ignore input text that has been annotated before. We have a requirement that we want to evaluate how consistent is our annotators given the same annotation. Is there any way to disable ignoring the same text behavior?

Thank you :slight_smile:


We actually have a full solution for sending the same examples to multiple annotators in Prodigy. The docs for setting this up are here:

It's quite simple, you just need to add "feed_overlap": true to your prodigy.json config file.

Great! That's really what I am looking for :'D

Hi again. Actually, we have a requirement that we don't want all examples should be annotated twice. Let's say we have 100 examples and 2 annotators, and we only want 20% of them overlapped. That means there are 20 same examples both annotators will see and 40 examples are unique to each annotator.

We have the tasks division logic already, but we have a problem that Prodigy automatically ignores the text that has been annotated before. Is there a way to disable this? I tried to set auto_exclude_current to false on prodigy.json, but it does not work. (we are using ner_manual recipe)

Thanks for your advice :slight_smile:

Setting "auto_exclude_current": false in the config should work to not exclude examples that are already present in the current dataset so I'm not sure what the problem could be here. It also depends on the setup and how you're running the instances/sesions for the annotators.

That said, if you already know how you want to divide the data, you could also just save the result to different datasets? This would also make it much easier to skip examples that one person has already annotated if you ever restart the server.

Setting "auto_exclude_current": false in the config should work to not exclude examples that are already present in the current dataset

This exact behavior that we really want. We solved the issue by directly adding "auto_exclude_current": False to the return value of ner.manual recipe.

return {
        "view_id": "ner_manual",
        "dataset": dataset,
        "stream": stream,
        "exclude": exclude,
        "before_db": remove_tokens if highlight_chars else None,
        "config": {
            "lang": nlp.lang,
            "labels": labels,
            "exclude_by": "task",
            "ner_manual_highlight_chars": highlight_chars,
            "auto_count_stream": True,
            "auto_exclude_current": False

@ines One another question related to the same scenario. Since we have a system that manages the sources for prodigy instances, there is a possibility that leads to an empty source for a prodigy instance (if there is no task for one annotator).

However, the prodigy exits when the case of an empty source.

prodigy-annotator-1_1  | โœ˜ Error while validating stream: no first example
prodigy-annotator-1_1  | This likely means that your stream is empty.This can also mean all the examples
prodigy-annotator-1_1  | in your stream have been annotated in datasets included in your --exclude recipe
prodigy-annotator-1_1  | parameter.

How can we run an instance with an empty source? I expect that it can run with a message "No tasks available"

At the moment, I think there's no easy way to disable the first task validation because the assumption is that you might as well not start the server if you already know that the stream will be empty and there won't be anything to do once the app is running.

So if I understand your scenario correctly, you still want to start the server in that case and just show "no tasks available" to the annotator? I guess one workaround could be to create a dummy task dictionary that includes something like "text": "No tasks available" and a unique _task_hash that's never filtered out. It's a bit hacky, but it should work: Prodigy will start because it has something to show, and the annotator will see the dummy task :sweat_smile: