Prodigy crashes on SIGPIPE

Hi,

I'm using prodigy 1.10.5 and I've run into the following issue:

When a client ends connection abruptly (during an on-going HTTP request) prodigy crashes due to SIGPIPE.
In HTTP servers this is an undesired behavior, SIGPIPE should be handled gracefully. Below is output of strace before the crash happens.

10:19:03 clock_gettime(CLOCK_MONOTONIC, {tv_sec=861, tv_nsec=327235367}) = 0
10:19:03 read(15<socket:[46892]>, 0x555729f8c3bc, 256000) = -1 ECONNRESET (Connection reset by peer)
10:19:03 write(15<socket:[46892]>, "zammSDgmXZ4l1JEX1JFJ85CuO3x3Iqsp"..., 278768) = -1 EPIPE (Broken pipe)
10:19:03 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=295, si_uid=0} ---
10:19:03 +++ killed by SIGPIPE +++

I am able to reproduce this on a fresh AWS instance within docker container, t2.micro, Amazon Linux 2 (amzn2-ami-hvm-2.0.20201126.0-x86_64-gp2)

Dockerfile:

FROM python:3.7

RUN mkdir /images
RUN curl https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Caerte_van_Oostlant_4MB.jpg/1591px-Caerte_van_Oostlant_4MB.jpg --output /images/file.jpg

WORKDIR /prodigy
COPY prodigy-1.10.5-cp36.cp37.cp38-cp36m.cp37m.cp38-linux_x86_64.whl /prodigy
RUN pip install prodigy-1.10.5-cp36.cp37.cp38-cp36m.cp37m.cp38-linux_x86_64.whl

ENV PRODIGY_HOST=0.0.0.0
ENV PRODIGY_PORT=5000

CMD prodigy image.manual sigpipe_crash_repro /images --label cat,not_cat

Then:

$ docker build . -t sigpipe_repro
$ docker run -p 5000:5000 -it sigpipe_repro

Finally, visit prodigy in the browser and hit refresh before the request completes.

Hey @Buyan ! :wave:

Thanks for the detailed description and the complete example to try it! :nerd_face:

Nevertheless, I'm not being able to reproduce the issue :pensive:

I tried locally first, but then I created a full remote VM (in my case, on GCP), installed Docker, created the image with your instructions, and followed all the steps.

I also tried using Chrome's developer tools to simulate a slow connection, to be completely sure that the connection was terminated right in the middle of a response, while the server was sending the image. But still, it all seemed to "work correctly".

Could there be anything else interacting with it? Maybe any other AWS layer like a load balancer or similar?

On the other side, the Prodigy web API is built with FastAPI, and it is run with Uvicorn, and I have never seen that error with FastAPI and Uvicorn in general, so it seems quite strange. :thinking:

Hey @tiangolo, thanks for investing your time to investigate the issue.

I'm certain that this is specific to the underlying OS (I was unable to reproduce on macOS for example). Not sure what OS you were trying it with on GCP, but I bet that's why you couldn't repro.

This crash can happen while bundle.js is being served - I'd guess this rules out FastAPI, but of course I don't know your exact setup:

write(15<socket:[1669760]>, "(i.push(\"\"),a.push(\"\\n\"));for(va"..., 535465) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=556, si_uid=0} ---
+++ killed by SIGPIPE +++

If I just run uvicorn 'example' app in the same container, strace also reports SIGPIPE, but the process doesn't crash:

write(13<socket:[1671656]>, "dworldworldworldworldworldworldw"..., 9377503) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=663, si_uid=0} ---
brk(0x55d93b294000)                     = 0x55d93b294000
epoll_ctl(3<anon_inode:[eventpoll]>, EPOLL_CTL_DEL, 13<socket:[1671656]>, 0x7ffd0ef1e9b4) = 0
close(13<socket:[1671656]>)
...

Chrome Dev Tools network debugging is what I've been using initially but the following python snippet was causing the server to crash as well and was much more convenient to use - just be sure to run it outside of the docker container. I believe this also rules out any other 'AWS layers', as all the traffic is on the host.

import socket

INSTANCE_IP = '127.0.0.1'
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

sock.connect((INSTANCE_IP, 5000))
sock.sendall(b'GET /bundle.js HTTP/1.0\r\n\r\n')
sock.recv(50)
sock.close()

Thanks for the detailed report and the careful steps to reproduce it! :nerd_face::sunglasses::coffee:

That helped a lot to find the root source of the problem.

We are already writing a fix for it. :bug:

In case you are curious, we had logic to handle possible broken Unix pipes in CLI commands, for things like exporting the annotations in a dataset, where the next command in the pipe could exit with errors.

But then that logic was conflicting with the same SIGPIPE signal received by the server in some systems when a client closed the connection in the middle of a response (as you demonstrated). With the new fix, the CLI broken pipes are handled in a way that doesn't affect the internal signal handlers installed by Uvicorn. :sparkles:

Just released v1.10.6, which includes the fix described above :slightly_smiling_face: