Start multiple sessions from Flask

Hi,
I'm afraid this might be a stupid question, and not primarily related to prodigy.

I'm working on a quality assurance system for a text categorization model. For each review task a seperate data set is going to be created. Each of my annotators has their dedicated port to work on. Only one annotator is going to work on each task. So really nothing fancy here.

I got a ticket system where the annotator can pick their tasks. The ticket system will than call a Flask REST API that will format the recipe string and start a webserver via prodigy.serve(). Everything works fine. The only problem is once prodigy.serve() is called the application is blocked, and no other sessions can be started from this Python programm. So basically how do I start multiple prodigy session from within a single python program?

Thank you.

Hi @MaxB!

Thanks for your question! It is a great question.

What's the error message you're seeing? If you open your browser's developer tools, do you see a request error there? Maybe CORS related? Is there any traceback in the terminal?

It seems your goal is to use Prodigy from a Flask application. Here's a past post on why to be cautious on running Prodigy within any other web server (like Flask):

This answer can vary but have you seen this FAQ on multi-sessions/annotators? Here's an excerpt:

One option we recommend is to divide up the annotation work so that each annotator only needs to deal with a small part of the annotation scheme. For instance, if you’re working with many labels, you would start a number of different Prodigy services, each specifying a different label, and each advertising to a different URL. Prodigy can be easily run under automation, for instance within a Kubernetes cluster, to make this approach more manageable. If you do want to have multiple annotators working on one feed, Prodigy has support for that as well via named multi-user sessions. You can create annotator-specific queues using query parameters, or use the query parameters to distinguish the work of different annotators so you can run inter-annotator consistency checks.

Are you aware of multi-user sessions? If this doesn't fit your goals, definitely keep searching Prodigy support. There are 50+ issues on handling multiple sessions and hopefully it can provide you more help depending on your goals.

Hope this helps!

What's the error message you're seeing? If you open your browser's developer tools, do you see a request error there? Maybe CORS related? Is there any traceback in the terminal?

I can start one prodigy session without problem. But like pointed out in the linked comment, once a prodigy session is started via prodigy.serve it's there is a process running and the endpoint won't respond because the return statement is never reached.

My endpoint looks somewhat like this (I'm using flask-restful)

        ticket = get_ticket_by_id(args["ticket_id"])
        cmd_string = backend.format_prodigy_string(ticket)
        prodigy.serve(cmd_string, port=ticket['Port'])

is there a way to start the process "in the background"? For me this is the first time I'm developing a complicated web service. I'm a career changer that has a solid grasp on NLP, but very limited knowledge of web development. So I might miss something very obvious here.

I thought by making each task it's own dataset and giving each annotator a fixed port to work on, I could get around the more complicated solutions.

hi @MaxB!

Thanks for the background! That makes sense how you're approaching -- definitely running on each port is best given different data/tasks but challenging if you're trying to run simultaneously. Running simultaneous processes is even harder to run without containers and/or orchestration engine like Kubernetes, which is likely out-of-scope.

Another other idea you may want is to run these multiples processes in separate terminals rather than in Python. You can do this manually (e.g., open up different terminal windows) or you could use a terminal multiplexer like screen (see comment below):

Or tmux is another common option as mentioned below.

You can run on different ports like prodigy.server() in terminal by prefixing each command:

PRODIGY_PORT=8080 prodigy ner.manual dataset1 en_core_web_sm data1.jsonl --label A,B,C
PRODIGY_PORT=1234 prodigy ner.manual dataset2 en_core_web_sm data2.jsonl --label D,E,F

Hope this helps and let me know if you have any further questions!