I've got prodigy hosted within our environment, and on a port forward, so I access via my IP address and port. My team never had this issue historically but it seems that now whenever we launch a teach (prodigy textcat.teach) labelling session. They crash randomly, more for some users than others. We use this format to start sessions: http://MYIP:MYPORT/?session=username
but lately I've had to advise to just ignore the crashed session and use a number with your name like so: http://MYIP:MYPORT/?session=username1 http://MYIP:MYPORT/?session=username2 http://MYIP:MYPORT/?session=username3 http://MYIP:MYPORT/?session=username4
but even doing this it seems that we get to the point where we can only label one or two items before it crashes again.
Its very odd as we didn't have this problem previously. Im wondering what solutions there are to this? could it be dependant on the EC2 type that the session is launched from? I doubt its the data as we've labelled with the same data sets previously and not had this issue.
Thanks for your question and welcome to the Prodigy community
You mentioned this problem wasn't an issue in the past. What changed?
Did you have this same architecture on local machines and only now moved to AWS (since you mentioned EC2)?
Or alternatively, is it textcat.teach? Did this same problem happen with non-model-in-the-loop recipes like textcat.manual or ner.manual? Just as a test, get some sample data and run through a few examples for manual recipes, trying to hit multiple sessions at the same time. If that works, then this would suggest it's something wrong with running the model.
Are your documents large or very long?
Granted you're using textcat.teach which may not be as intensive as ner.teach due to beam search.
You can add PRODIGY_LOGGING=basic or PRODIGY_LOGGING=verbose to your Prodigy command to add more detailed logs. I typically recommend always running this in development environments.
Related, you can also add /docs to view the interactive API so you can try to figure out if one endpoint is crashing.
Also, it may not be relevant, could there be any issues with security since you're running on http://, not https://?
As a best practice, you may want to avoid having users try different session names. I know you're doing this for now b/c of the crashing as a short term solution.
But I mention this in the future if you need to analyze annotator quality (e.g., track inter-annotator agreement) you'll appreciate having correct session names. In fact, a good practice is setting PRODIGY_ALLOWED_SESSIONS so that you have a finite choices of what sessions are allowed. For instance, PRODIGY_ALLOWED_SESSIONS=alex,jo would only allow ?session=alex and ?session=jo , and other parameters would raise an error. This prevents fat finger errors with invalid session names.
This is secondary to your key problem so you can ignore for now. Hopefully if some of the ideas above can help identify the crashing problem, this may be something good to build into your process afterwards.
Hope this provides some avenues to explore. Let us know if this works (or doesn't). Also, feel free to provide logs or other details.
Hello, thank you and thanks for the speedy response!
The only change that has occurred is that we've change EC2 size from one that had no gpu to one with gpu. Basically from a m4.4xlarge to a p3.2xlarge. I'm not sure if this could be related? I've monitored the runtime through htop in terminal though and can't see prodigy using much memory or cpu (max 2GB).
We've only used textcat.teach and when I say we've not had this issue previously, I mean it has just not been so bad, so it used to only crash for shorter periods of time and not as regularly.
They can vary in length, there are definitely texts that are quite long, however it doesn't seem like this is impacting the system as you can label some very long texts very quickly and it can crash on the shortest piece of text. It also crashes as you select the label, for example freezes with a highlighted frame of either red or green around the text depending on your 'accept' or 'reject' the piece of text.
I've not noticed anything memory related.
The only other detail is that I seem to be able to repeat the crash. For example if I have a crashed session on: http://MYIP:MYPORT/?session=username3
then close the tab, then reopen the same url, it will allow me to smoothly label the same 4 items then crash on the 5th exact piece of text that it crashed on previously.
However if I open a new url say username4, then it will smoothly label for quite a while and can sometimes not crash even on very large pieces of text.
This is a good point I will test it again with logging, hopefully this will highlight some issue.