Hi! Streams in Prodigy are regular Python generators – so you can set them up however you like and also make it respond to outside state, read from an external source (database, REST API) etc. For instance, here's a pseudocode example of loading data from something like a paginated API:
def custom_stream():
page = 0
while True:
examples = get_new_examples(page)
yield from examples
page += 1
You could also use the files in a directory and after each iteration (all examples in the file are sent out), check if there's a new file you can read from. I don't know where your original review data lives – but if you can retrieve it in Python, you could also do it directly in the recipe script, so you can skip the whole export step alltogether.
If you don't want to edit the recipe script (e.g. if you're using a built-in recipe), you can also write a custom loader script that writes to stdout and then pipe that forward. See here for an example.
Prodigy will typically create two datasets: one with the name you've given when you run the recipe and one timestamped dataset per session. In a custom recipe, you can also return a get_session_id
callback to customise how the session IDs are generated.
You might also want to check out the named multi-user sessions (see the "Multi-user sessions" section in your PRODIGY_README.html
for details). This allows you to append something like ?session=johannes
to the URL in the web app and associate all annotations you collect with that session. You can also customise whether all sessions should see the same examples or whether everyone should see different questions.