Change annotation recipe when prodigy is already running

Hi, I’m really liking prodigy so far. Truly great work!

What we are trying to do is have a server show some interfaces to our customers, one of which would be prodigy. An important part of the workflow we are pursuing is having an overview screen were our customers could see how the app is performing at a glance.

From that screen the user should be able to select a label and this should redirect him/her to prodigy’s interface with the corresponding view_id along with its parameters already set.

What is the recommended way of doing this? I’ve been looking all around the documentation, support and spacy’s own code, but I can’t figure out a good way to do it. I’m hoping that I won’t need to kill prodigy and start it from zero every time the user clicks a label.

How do you handle this in the demo app? How to change the view_id on the fly?

Thanks

Hi and thanks! :smiley:

Most Prodigy recipes are inherently stateful and they make sure to include an isolated, standalone server, REST API and web application. Switching between interfaces would introduce all sorts of problems and open questions: You’d need to make sure that the answers are stored with the correct task, make sure to re-fetch unanswered questions that were sent out etc.

All the logic that orchestrates an annotation workflow happens in Python, i.e. at the recipe level. In most cases where you want to change the "interface", what you actually want to do is change the entire task – the way the stream is composed, filters that are applied to the stream and possibly even the update callback and other settings.

The demo app is a separate app compiled for demo purposes: it skips all the parts that are problematic, doesn't send anything to a server and discards all answers.

You've probably done this already, but depending on what you're trying to build, you might want to check that it's compatible with our license terms. While it's no problem to spin up annotation tasks and have your customers or other people do the labelling, integrating Prodigy into existing applications usually isn't permitted.

If what you're looking for is a way to manage multiple annotators and projects and retrieve statistics, you might be interested in the upcoming Prodigy Scale. It comes with an autoscaling cluster to host multiple annotation feeds and takes care of the starting/stopping, user management and reconciling of annotations. See here for details:

Thanks for your response!

Most Prodigy recipes are inherently stateful and they make sure to include an isolated, standalone server, REST API and web application. Switching between interfaces would introduce all sorts of problems and open questions: You’d need to make sure that the answers are stored with the correct task, make sure to re-fetch unanswered questions that were sent out etc.

I think I was not very clear, sorry. For the time being, our main concern would be to use the textcat and ner recipes (to annotate chatbot data). If we spin up an instance with the, say, texcat recipe, how can I change from annotating category A to category B? Since the recipe is the same, I shouldn't have any problems with the data already processed, right?

The demo app is a separate app compiled for demo purposes: it skips all the parts that are problematic, doesn’t send anything to a server and discards all answers.

Even so, in the background are you killing the process with every change? I ask because the change is instant in the demo app, and when I start a new prodigy server locally it takes several seconds for it to be ready.

You’ve probably done this already, but depending on what you’re trying to build, you might want to check that it’s compatible with our license terms. While it’s no problem to spin up annotation tasks and have your customers or other people do the labelling, integrating Prodigy into existing applications usually isn’t permitted.

We checked the license, for sure! What we are seeking is for our customers to label their data and have our services train and update from the annotated data. For our customers Prodigy is just a tool, not the product itself.

If what you’re looking for is a way to manage multiple annotators and projects and retrieve statistics, you might be interested in the upcoming Prodigy Scale.

Actually, we are very interested in trying it out! I already applied for the beta :wink:

Thanks again

Do you want to use a recipe with a model in the loop? For NER, it's pretty easy to annotate multiple labels at the same time – but if you're annotating with a model, you do have to leave it to the model to decide which label to show you and ask you about. Otherwise, it'd be really difficult to make the active learning work effectively.

If you're annotating without a model (using a manual or choice recipe), you could in theory manipulate what's sent out via the stream generator. In Prodigy, streams are simple Python generators that yield dictionaries (the examples to annotate) and while they're running, they can respond to (external) state changes and conditionals. For example:

def get_stream(examples, other_examples):
    for eg in examples:
        if SOME_CONDITION:
            yield eg
        elif SOME_OTHER_CONDITION:
            yield something_else
    for eg in other_examples:
        # and so on

While the general config is defined once per session/process, setting up a more complex stream generator could be an option to dynamically decide which examples to send out (e.g. which labels, which types of data and so on).

That's true – when you start the server locally, it'll load your data, compose the stream (often by processing the text with a model) and then start the server. The demo doesn't need to do any of that, because every demo user gets the same example data, so we can have a single server that keeps running and already has all of the demo data pre-loaded and pre-processed. When you switch the demo interface, it'll clear all application state and fetch the demo data for the selected view.

This wouldn't really work if you run Prodigy yourself, at least not with the current REST API. Each process has its own standalone REST API and web application that's only concerned with the current recipe.

That sounds good then – just wanted to bring it up (also in case others find the thread later, to avoid confusion)!

I understand, it makes perfect sense for the NER case as you describe. My problem comes with the binary text categorization case. If I understand correctly, there is no clear way to have the server switch categories. I would have to kill the server and start it again with the new category as the --label parameter in order to achieve that?

That seems promising! I could fit the logic needed in the stream and even call the model here in order to decide what to show in the UI. I'll look into that and report my findings back, as I suspect this could be useful for others too.