Prodigy crashes on SIGPIPE

Buyan · December 5, 2020, 11:08am

Hi,

I'm using prodigy 1.10.5 and I've run into the following issue:

When a client ends connection abruptly (during an on-going HTTP request) prodigy crashes due to SIGPIPE.
In HTTP servers this is an undesired behavior, SIGPIPE should be handled gracefully. Below is output of strace before the crash happens.

10:19:03 clock_gettime(CLOCK_MONOTONIC, {tv_sec=861, tv_nsec=327235367}) = 0
10:19:03 read(15<socket:[46892]>, 0x555729f8c3bc, 256000) = -1 ECONNRESET (Connection reset by peer)
10:19:03 write(15<socket:[46892]>, "zammSDgmXZ4l1JEX1JFJ85CuO3x3Iqsp"..., 278768) = -1 EPIPE (Broken pipe)
10:19:03 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=295, si_uid=0} ---
10:19:03 +++ killed by SIGPIPE +++

I am able to reproduce this on a fresh AWS instance within docker container, t2.micro, Amazon Linux 2 (amzn2-ami-hvm-2.0.20201126.0-x86_64-gp2)

Dockerfile:

FROM python:3.7

RUN mkdir /images
RUN curl https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Caerte_van_Oostlant_4MB.jpg/1591px-Caerte_van_Oostlant_4MB.jpg --output /images/file.jpg

WORKDIR /prodigy
COPY prodigy-1.10.5-cp36.cp37.cp38-cp36m.cp37m.cp38-linux_x86_64.whl /prodigy
RUN pip install prodigy-1.10.5-cp36.cp37.cp38-cp36m.cp37m.cp38-linux_x86_64.whl

ENV PRODIGY_HOST=0.0.0.0
ENV PRODIGY_PORT=5000

CMD prodigy image.manual sigpipe_crash_repro /images --label cat,not_cat

Then:

$ docker build . -t sigpipe_repro
$ docker run -p 5000:5000 -it sigpipe_repro

Finally, visit prodigy in the browser and hit refresh before the request completes.

tiangolo · December 10, 2020, 12:38pm

Hey @Buyan !

Thanks for the detailed description and the complete example to try it!

Nevertheless, I'm not being able to reproduce the issue

I tried locally first, but then I created a full remote VM (in my case, on GCP), installed Docker, created the image with your instructions, and followed all the steps.

I also tried using Chrome's developer tools to simulate a slow connection, to be completely sure that the connection was terminated right in the middle of a response, while the server was sending the image. But still, it all seemed to "work correctly".

Could there be anything else interacting with it? Maybe any other AWS layer like a load balancer or similar?

On the other side, the Prodigy web API is built with FastAPI, and it is run with Uvicorn, and I have never seen that error with FastAPI and Uvicorn in general, so it seems quite strange.

Buyan · December 12, 2020, 7:55pm

Hey @tiangolo, thanks for investing your time to investigate the issue.

I'm certain that this is specific to the underlying OS (I was unable to reproduce on macOS for example). Not sure what OS you were trying it with on GCP, but I bet that's why you couldn't repro.

This crash can happen while bundle.js is being served - I'd guess this rules out FastAPI, but of course I don't know your exact setup:

write(15<socket:[1669760]>, "(i.push(\"\"),a.push(\"\\n\"));for(va"..., 535465) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=556, si_uid=0} ---
+++ killed by SIGPIPE +++

If I just run uvicorn 'example' app in the same container, strace also reports SIGPIPE, but the process doesn't crash:

write(13<socket:[1671656]>, "dworldworldworldworldworldworldw"..., 9377503) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=663, si_uid=0} ---
brk(0x55d93b294000)                     = 0x55d93b294000
epoll_ctl(3<anon_inode:[eventpoll]>, EPOLL_CTL_DEL, 13<socket:[1671656]>, 0x7ffd0ef1e9b4) = 0
close(13<socket:[1671656]>)
...

Chrome Dev Tools network debugging is what I've been using initially but the following python snippet was causing the server to crash as well and was much more convenient to use - just be sure to run it outside of the docker container. I believe this also rules out any other 'AWS layers', as all the traffic is on the host.

import socket

INSTANCE_IP = '127.0.0.1'
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

sock.connect((INSTANCE_IP, 5000))
sock.sendall(b'GET /bundle.js HTTP/1.0\r\n\r\n')
sock.recv(50)
sock.close()

tiangolo · December 17, 2020, 12:24pm

Thanks for the detailed report and the careful steps to reproduce it!

That helped a lot to find the root source of the problem.

We are already writing a fix for it.

In case you are curious, we had logic to handle possible broken Unix pipes in CLI commands, for things like exporting the annotations in a dataset, where the next command in the pipe could exit with errors.

But then that logic was conflicting with the same SIGPIPE signal received by the server in some systems when a client closed the connection in the middle of a response (as you demonstrated). With the new fix, the CLI broken pipes are handled in a way that doesn't affect the internal signal handlers installed by Uvicorn.

ines · February 14, 2021, 3:13am

Just released v1.10.6, which includes the fix described above

Topic		Replies	Views
Prodigy crashing on AWS	2	30	July 16, 2024
Auto-Save on exit? usage , docker	3	4602	April 25, 2019
Auto Closing of Server after Update usage , server	1	759	February 18, 2020
Trouble shoot network problem usage	7	544	January 7, 2019
Prodigy doesn't launch or die Getting Started install , server	1	471	June 21, 2021

Prodigy crashes on SIGPIPE

Related topics