Prodigy crashing

I've got prodigy hosted within our environment, and on a port forward, so I access via my IP address and port. My team never had this issue historically but it seems that now whenever we launch a teach (prodigy textcat.teach) labelling session. They crash randomly, more for some users than others. We use this format to start sessions:
http://MYIP:MYPORT/?session=username

but lately I've had to advise to just ignore the crashed session and use a number with your name like so:
http://MYIP:MYPORT/?session=username1
http://MYIP:MYPORT/?session=username2
http://MYIP:MYPORT/?session=username3
http://MYIP:MYPORT/?session=username4

but even doing this it seems that we get to the point where we can only label one or two items before it crashes again.

Its very odd as we didn't have this problem previously. Im wondering what solutions there are to this? could it be dependant on the EC2 type that the session is launched from? I doubt its the data as we've labelled with the same data sets previously and not had this issue.

Would appreciate any advise.

hi @bev.manz!

Thanks for your question and welcome to the Prodigy community :wave:

You mentioned this problem wasn't an issue in the past. What changed?

Did you have this same architecture on local machines and only now moved to AWS (since you mentioned EC2)?

Or alternatively, is it textcat.teach? Did this same problem happen with non-model-in-the-loop recipes like textcat.manual or ner.manual? Just as a test, get some sample data and run through a few examples for manual recipes, trying to hit multiple sessions at the same time. If that works, then this would suggest it's something wrong with running the model.

Are your documents large or very long?

Granted you're using textcat.teach which may not be as intensive as ner.teach due to beam search.

Any possible issues with memory too?

Are you modifying the port or using the default 8080 port? Any chance there's anything else that runs on that port on your machine?

Have you also started using Prodigy logging?

You can add PRODIGY_LOGGING=basic or PRODIGY_LOGGING=verbose to your Prodigy command to add more detailed logs. I typically recommend always running this in development environments.

Related, you can also add /docs to view the interactive API so you can try to figure out if one endpoint is crashing.

Also, it may not be relevant, could there be any issues with security since you're running on http://, not https://?

As a best practice, you may want to avoid having users try different session names. I know you're doing this for now b/c of the crashing as a short term solution.

But I mention this in the future if you need to analyze annotator quality (e.g., track inter-annotator agreement) you'll appreciate having correct session names. In fact, a good practice is setting PRODIGY_ALLOWED_SESSIONS so that you have a finite choices of what sessions are allowed. For instance, PRODIGY_ALLOWED_SESSIONS=alex,jo would only allow ?session=alex and ?session=jo , and other parameters would raise an error. This prevents fat finger errors with invalid session names.

This is secondary to your key problem so you can ignore for now. Hopefully if some of the ideas above can help identify the crashing problem, this may be something good to build into your process afterwards.

Hope this provides some avenues to explore. Let us know if this works (or doesn't). Also, feel free to provide logs or other details.

Hello, thank you and thanks for the speedy response!

The only change that has occurred is that we've change EC2 size from one that had no gpu to one with gpu. Basically from a m4.4xlarge to a p3.2xlarge. I'm not sure if this could be related? I've monitored the runtime through htop in terminal though and can't see prodigy using much memory or cpu (max 2GB).

We've only used textcat.teach and when I say we've not had this issue previously, I mean it has just not been so bad, so it used to only crash for shorter periods of time and not as regularly.

They can vary in length, there are definitely texts that are quite long, however it doesn't seem like this is impacting the system as you can label some very long texts very quickly and it can crash on the shortest piece of text. It also crashes as you select the label, for example freezes with a highlighted frame of either red or green around the text depending on your 'accept' or 'reject' the piece of text.

I've not noticed anything memory related.

The only other detail is that I seem to be able to repeat the crash. For example if I have a crashed session on:
http://MYIP:MYPORT/?session=username3
then close the tab, then reopen the same url, it will allow me to smoothly label the same 4 items then crash on the 5th exact piece of text that it crashed on previously.
However if I open a new url say username4, then it will smoothly label for quite a while and can sometimes not crash even on very large pieces of text.

This is a good point I will test it again with logging, hopefully this will highlight some issue.