[BrokenPipeError: [Errno 32] Broken pipe when using custom loader with remote RDS host

Hello,

I just purchased the Prodigy tool and am trying to set it up in a virtual machine that is connected to a secure database (we're dealing with sensitive data that we don't want to persist on the machine).

I read, in the custom loader docs, that you can write a loader to load data from a SQL query and pipe it to prodigy.

Here is the Python I used to do that

import boto3
import psycopg2
import argparse
import json
import sys

#Parse command line args
parser = argparse.ArgumentParser("Parse args to load data from remote RDS")
parser.add_argument("host", type=str)
parser.add_argument("db", type=str)
parser.add_argument("table", type=str)
args = parser.parse_args()
#Create db connection
pw = "some_password"
conn = psycopg2.connect(database=args.db, user="user", password=pw, host=args.host, port="5432")
cur = conn.cursor()
#Get all rows
q = '''SELECT * FROM ''' + str(args.table)
cur.execute(q)
rows = cur.fetchall()
#Output to stdout for Prodigy to read
for row in rows:
    task = {"text": row[1]}
    #I also tried this with print
    sys.stdout.write(json.dumps(task))

Here is the command I run

python3 load_prodigy_data.py "host" "db_name" "table_name" | prodigy ner.manual new_db en_core_web_log

I tested out the data loading script without piping to prodigy and it worked fine.

Let me know what I'm missing :slight_smile:

Alex

Hi! Your script looks reasonable :slightly_smiling_face: Could you share the full error message and traceback of the broken pipe error (presumably raised by psycopg2)?

Traceback (most recent call last):
  File "load_prodigy_data.py", line 25, in <module>
    sys.stdout.write(json.dumps(task))
BrokenPipeError: [Errno 32] Broken pipe
Killed

Here ya go

Thanks! Maybe you're actually hitting this. Could you try the suggestion from here and see what happens? https://stackoverflow.com/a/16865106/6400719

Now the stack trace disappears and it just says

Killed

Resolved! It was actually a memory issue with the VM...very misleading error :thinking:

1 Like

Yay, glad it worked! (Maybe there's some Python magic you can use that ensures your errors are raised more explicitly as you're piping forward. It seems to be common issue, so maybe there's a solution.)