Running a shell script from Prodigy

Hi,
I want to run a shell script "script.sh" using a certain text inputs from the user, is that possible using a prodigy recipe?

Thanks!

Hi! What exactly does your script.sh do? Is it supposed to provide the data to annotate, or do something with the annotations submitted by a user?

There are different ways you could integrate your script, depending on what it does:

  • If it loads data, you can make it write the JSON objects to stdout and then pipe it forward to the recipe, e.g. script.sh | prodigy .... Setting the source argument to - will read from standard input, e.g. from the previous process. You can see an example here: https://prodi.gy/docs/api-loaders#loaders-custom
  • If it does something else within the annotation process, you could use Python's subprocess to call the script in a subprocess. If needed, this also lets you capture the output.
  • Another option is to run the script as a separate process, e.g. if it submits created annotations to a third-party storage or something like that. You can also schedule it to run automatically in the background.

Hi Ines,
Thank you for your reply,
What I wanted to achieve was that all the commands to be written on the terminal including the "prodigy ..." command must be ran from within the script and thus once the annotator has responded to the text_input, the json files must be loaded onto prodigy via the script.

Hi! I just want to make sure I understand the question correctly: Do you have a more specific example? So in your recipe, you're collecting free-form text input that's stored in the data, and then you want to do something else with that? Or queue it up in Prodigy again?

Sorry if I wasnt clear in my last reply. So the idea is that I want to get some responses from the user using text_inputs and automatically push in the responses as arguments to a bash script. I want the bash script to run once the annotator has filled in all the text_inputs. The bash script in return would have some "tmux" commands and a prodigy command to allocate JSONL type files to the annotator.

What I want to know is if I can use a bash script to run a basic JSON loader prodigy command which will allocate more files to the annotator. If yes, then how?

Thank you!

Thanks for the explanation! Is it important that you run this as part of the same annotation session? Once you've collected all the text inputs, the follow-up data will likely be pretty different from the original examples you collected, so it might make sense to do that in a different session and save the result to a separate dataset?

In that case, you could have your bash script access the collected examples (e.g. by calling db-out with your dataset, which will output the JSON line by line). You can then pipe that forward to your script, or save out an intermediate file that just contains the raw text inputs that you can then pass to your bash script. You can then have your script output the next examples to annotate, and pipe them forward to Prodigy:

If you want to do this as part of the same annotation session, this might be a bit trickier, but it should work. You could have a custom recipe with a stream that counts the examples that were sent out for annotation. So if you have 100 examples and the counter reaches 100, you know that you've sent out all questions, and you can start queueing up the follow-up questions based on the annotations already in your dataset.

It probably makes sense to work with intermediate files on disk here because otherwise, calling your shell script with arguments and capturing the output in Python can easily get a bit messy. So you'd export the annotations from the given dataset, call your shell script with a path to that file in a subprocess, and make it write out the follow-up questions. You can then load those back in within your stream generator and add them to the stream, so they'll be queued up next. Just make sure to re-run that logic once you've received the last batch of annotated examples. The annotator may still work on the last batch of two in the backgroumd, and you don't want the stream to run out while you wait.