Is it possible for me to control the entire active learning loop?

This is not necessarily true. Most use cases definitely involve loading data from a file or a single source, because that's the most common way people go about annotating their data. But in the end, the stream is just a Python generator that Prodigy keeps requesting batches of tasks from. How that batch is composed is up to you – so you could easily implement your own logic that takes previous user decisions into account, randomly adds data from different sources or uses other factors to determine what to send out for annotation. (Maybe you only want to annotate fun and light texts on Mondays and keep the difficult stuff for Wednesdays :wink: Prodigy itself is completely indifferent to that and will just ask you to annotate whatever your stream produces.)

In theory, you could go as far as customising the app.py to plug in your own logic and provide it via the endpoints that the web app interacts with. But I'm not sure this is really necessary here – if you start with a blank recipe and don't use any of the built-in models or active learning components, you can use all of Prodigy's scaffolding like the web app, web server, database and CLI but still be able to fully control what data is going in and what's done with the annotations you receive back.

The web app mostly interacts with Prodigy via two REST endpoints:

  • /get_questions – Get a batch of batch_size examples from the stream. Called on load and whenever the queue is running low.
  • /give_answers – Send a batch of annotated examples back. Called periodically when enough annotations are collected, or when the user hits "Save" manually.

On the recipe side, those are implemented via the following two components:

  • stream – A generator that yields examples based on any logic you need.
  • update – A function that receives a list of annotated examples and does something – e.g. updates a model, modifies the stream of examples based on the annotations, outputs stuff somewhere etc.

You can also start the recipe and Prodigy server programmatically from within a Python script. So your custom app could have the user make a search, click around (whatever you need), fetch some data for annotation, start Prodigy and have the user annotate it. You could even make your generator return a "fake" annotation task that tells the user to readjust their search after X examples (or once a certain distribution of annotation tasks is received back). If you view Prodigy as more of an abstract framework that streams data through a web application, I think there are a lot of creative solutions and use cases you can come up with :blush: