We are also using a custom stream function which creates a generator of pre-signed s3 urls.
The app displays a Loading... message before showing the classic Oops something went wrong..... After inspecting the console, we can see that the get_answers request has timed out.
K8 ends up killing the app as it deems it unresponsive.
If you have a lot of users pulling from the stream concurrently, and there's a model in the loop, then the server might indeed take a long time. The problem gets worse when you have example selection logic such as active learning in place.
For instance, let's say you're using active learning and your model ends up sorting through about 100 examples to find 10 high impact ones. If all 26 annotators are doing that, you're analysing a lot of data and it's going to stall out.
I'll send you a calendar invite. There are a few things we could suggest, including a custom recipe that moves the model out into a different service that the main app then communicates with.