use custom textcat manual recipe in python with feed_overlap = False

Hi! First of all, thank you for this amazing annotating software!

I am currently annotating for text classifier with my classmates using ngrok and found out that there are lots of overlapping sentences that we annotate.

So far we are adding our names to URL with ?session="name". but it doesn't seem to let us avoid annotating same sentences.

After searching, I found out this variable feed_overlap which is set to be True by default. If I understood correctly, I need to set this to False in prodigy.json. So I tried to make custom recipe, so I can set config in there. But an error occurred like this: ✘ Can't find recipe 'custom_textcat'. (Although I was not really sure what is going on behind the scene)

What should I do to set this feed_overlap? Does this really allow to randomize the order of sentences in dataset too?

One more question: let's say the size of sentences.jsonl is 5000. If we have annotated 4000 sentences and exit the command (so I see ✔ Saved 4000 annotations to database SQLite) If we run the same command again, do we have to annotate some of those 4000 sentences again? or does it avoid sentences that are already annotated in the dataset?

@prodigy.recipe(
    "custom.textcat",
    dataset=recipe_args["dataset"],
    file_path=("Path to texts", "positional", None, str),
    label=prodigy.recipe_args['label']
)
def custom_textcat(dataset, file_path, label):
    stream = JSONL(file_path)
    return {
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": stream,  # Incoming stream of examples
        "config": {
            'feed_overlap': False
        }
    }

prodigy.serve("custom_textcat test ../data/sentences.jsonl --label {}".format("a,b,c,d"))

Hi! It sounds like you just have a small typo in your prodigy.serve command: it expects the name of the recipe, so the first argument of the @prodigy.recipe decorator. If you use custom.textcat instead, it should work as expected :slightly_smiling_face:

If feed_overlap is set to True, there will be overlap in the examples that are sent out: every annotator will see the same examples and you have multiple annotations for the same text. If you set feed_overlap to False, everyone will see different examples.

Prodigy won't randomize the examples, though – they will be sent out in the same order. If you want every annotator to label different examples but in a random order, that's trickier. For each example you send out, you then have to check if it's already in the dataset or was sent out to a different annotator. You could do this with a custom recipe and run multiple instances of Prodigy on different ports.

By default, Prodigy will skip examples that are already annotated in the current dataset. (Under the hood, Prodigy uses hashes of the examples to decide whether to send them out again. You can read more about this here.)

So when you start the server again affter saving 4000 examples, Prodigy should start again at example 4001. (It will still iterate over all examples again to check the hashes, but that should be pretty quick.)

Thank you! It works great now :slight_smile:

1 Like