I am new at using Prodigy, and first I'd like to thank the developers, you made a great work, it is easy to understand and the documentation and support are really complete and helpful!
Here is my question: I'd like to make an annotation task, with several annotators and I'd like to know if there is way to give a different order of presentation to each annotator. The main idea is that we want to avoid a possible influence of the order in our experiment.
Thanks in advance for your answer and the attention given to this question,
The easiest way to implement something like this would be to start a separate instance for each annotator and use a custom stream for each of them, with a different order of examples (randomly, or specific based on the annotator). Since the only difference between the instances is the stream, you might not even need a custom recipe and can just use a custom loader. See here for an example: https://prodi.gy/docs/api-loaders#loaders-custom
If you want your loader to be more elegant, you could use a library like typer to let it take arguments on the command line, so you could do something like loader.py --annotator estelle | prodigy ... or loader.py --random --n-examples 10 | prodigy ... etc.
(If your data is in JSON and you know jq, there's probably also a super elegant way to do the shuffling/ordering in a single line on the CLI, then pipe that forward to the recipe and set --loader json. But I'm not a jq wizard, so I couldn't give you any code example for the jq part )
I wanted to make a little feedback on the solution I implemented to be able to customize the order of presentation depending on the annotator.
It was finally quite easy using this little trick:
I use a CSV containing the name of each annotator and the order of the files (I only work with 10 files so it's not a heavy file to process) and I use sys.argv to study the arguments.
The input command is the same as always :
prodigy sort-video estelle utils/order_files.csv -F test_recipe.py
except the name of the dataset is also the name of the annotator and I replaced the source with my CSV file.
To get the appropriate stream for the annotator, in my "get_stream" function, I first check if the annotator is in the CSV file, if not I create an order of file for this annotator and create a JSON file with the right order of presentation. Then I get the JSON file as my stream.
At the end each annotator has its own instance and I dedicate a port per annotator.
(Tip: I am currently using NGROK to deploy my solution).
I don't know if it's very clear and the solution is ideal, but I hope it will eventually help some people wanting to do the same thing as me!