[BrokenPipeError: [Errno 32] Broken pipe when using custom loader with remote RDS host

alexf_a · May 29, 2020, 3:55pm

Hello,

I just purchased the Prodigy tool and am trying to set it up in a virtual machine that is connected to a secure database (we're dealing with sensitive data that we don't want to persist on the machine).

I read, in the custom loader docs, that you can write a loader to load data from a SQL query and pipe it to prodigy.

Here is the Python I used to do that

import boto3
import psycopg2
import argparse
import json
import sys

#Parse command line args
parser = argparse.ArgumentParser("Parse args to load data from remote RDS")
parser.add_argument("host", type=str)
parser.add_argument("db", type=str)
parser.add_argument("table", type=str)
args = parser.parse_args()
#Create db connection
pw = "some_password"
conn = psycopg2.connect(database=args.db, user="user", password=pw, host=args.host, port="5432")
cur = conn.cursor()
#Get all rows
q = '''SELECT * FROM ''' + str(args.table)
cur.execute(q)
rows = cur.fetchall()
#Output to stdout for Prodigy to read
for row in rows:
    task = {"text": row[1]}
    #I also tried this with print
    sys.stdout.write(json.dumps(task))

Here is the command I run

python3 load_prodigy_data.py "host" "db_name" "table_name" | prodigy ner.manual new_db en_core_web_log

I tested out the data loading script without piping to prodigy and it worked fine.

Let me know what I'm missing

Alex

ines · May 29, 2020, 4:07pm

Hi! Your script looks reasonable Could you share the full error message and traceback of the broken pipe error (presumably raised by psycopg2)?

alexf_a · May 29, 2020, 4:23pm

Traceback (most recent call last):
  File "load_prodigy_data.py", line 25, in <module>
    sys.stdout.write(json.dumps(task))
BrokenPipeError: [Errno 32] Broken pipe
Killed

Here ya go

ines · May 29, 2020, 4:33pm

Thanks! Maybe you're actually hitting this. Could you try the suggestion from here and see what happens? https://stackoverflow.com/a/16865106/6400719

alexf_a · May 29, 2020, 4:47pm

Now the stack trace disappears and it just says

Killed

alexf_a · May 29, 2020, 8:15pm

Resolved! It was actually a memory issue with the VM...very misleading error

ines · May 30, 2020, 9:56am

Yay, glad it worked! (Maybe there's some Python magic you can use that ensures your errors are raised more explicitly as you're piping forward. It seems to be common issue, so maybe there's a solution.)

Topic		Replies	Views
Sample code for streaming examples from a database? usage , custom , solved	2	1045	September 9, 2018
Batch size ignored for custom loader? usage , streams	15	754	November 5, 2020
Custom loader and Stream not compatible? usage	6	33	August 29, 2024
Streaming data from a dataframe directly usage , solved , streams	3	891	August 27, 2020
Create Custom Loader usage , ner , custom	21	3876	August 14, 2019

[BrokenPipeError: [Errno 32] Broken pipe when using custom loader with remote RDS host

Related topics