TypeError: prodigy.components.loaders.CSV() takes no keyword arguments

zparcheta · May 17, 2023, 1:47pm

Hi all! First of all I am a new user of prodigy so I am not sure if I am using it well or effectively there is a bug.
I try to load a stream:

stream = CSV("scorecard_ES_B.csv", delimiter=";")

But I get the following error:

stream = CSV("scorecard_ES_B.csv", delimiter=";")
TypeError: prodigy.components.loaders.CSV() takes no keyword arguments

I read in documentation that CSV has "delimiter" as additional keyword argument.
Please help

Best regards

ryanwesslen · May 17, 2023, 7:56pm

hi @zparcheta!

Thanks for your question and welcome to the Prodigy community

Yes, I think you're right. I got the same problem. I went to the source code and found we don't even use the delimiter as an argument

Prodigy pro tip: you can find the built-in recipe code by looking at the Location: path of prodigy stats. This one is in the folder components/loaders.py. You could even modify this now so that your built-in recipe works correctly.

The good news is this looks like a minor fix that I can put in a ticket. It likely won't get released until our next patch.

In the meantime, I've created a custom Python script that can convert your .csv to a .jsonl (Prodigy's preferred format):

# csv_to_jsonl.py
import csv
import json
import typer
import srsly

def csv_to_jsonl(csv_file, jsonl_file, delimiter):
    with open(csv_file, 'r') as file:
        reader = csv.DictReader(file, delimiter=delimiter)
        rows = list(reader)

    jsonl_data = []
    for row in rows:
        json_data = {"text": row.pop("text"), "meta": row}
        jsonl_data.append(json.dumps(json_data))

    with open(jsonl_file, 'w') as file:
        file.write('\n'.join(jsonl_data))

    typer.echo(f"Conversion complete. JSONL file created: {jsonl_file}")

def main(csv_file: str, jsonl_file: str, delimiter: str):
    convert_csv_to_jsonl(csv_file, jsonl_file, delimiter)

if __name__ == "__main__":
    typer.run(main)

Let's assume you have this input.csv (semicolon delimited):

text;column1;column2
"This is a sentence";"Some meta info";"Some other meta info"

The additional columns are optional and you should only include them if you want them to be included in the interface as metadata.

You can convert it to a .jsonl by running:

python csv_to_jsonl.py input.csv output.jsonl ";"

# output.jsonl
{"text": "This is a sentence", "meta": {"column1": "Some meta info", "column2": "Some other meta info"}}

Alternatively as a second option, you can use it as a custom loader by modifying the script to be a Prodigy recipe and load it:

# csv_to_jsonl_prodigy.py
import csv
import json
import typer
import srsly
import prodigy

@prodigy.recipe("csv_to_jsonl") 
def csv_to_jsonl(csv_file, delimiter):
    with open(csv_file, 'r') as file:
        reader = csv.DictReader(file, delimiter=delimiter)
        rows = list(reader)

    jsonl_data = []
    for row in rows:
        json_data = {"text": row.pop("text"), "meta": row}
        print(json.dumps(json_data))

def main(csv_file: str, delimiter: str):
    convert_csv_to_jsonl(csv_file, delimiter)

if __name__ == "__main__":
    typer.run(main)

Notice that the only differences are adding the Prodigy decorator (@prodigy.recipe("csv_to_jsonl") , importing prodigy and printing out each json line instead of exporting it to a .jsonl file.

python -m prodigy convert_csv_to_jsonl input.csv ";" -F csv_to_jsonl_prodigy.py | python -m prodigy ner.manual ner_data blank:en --label ORG,PERSON -

Sorry again for the hassle but thank you a lot for bringing this to our attention!

zparcheta · May 24, 2023, 6:52am

Dear @ryanwesslen

Thank you very much for your help. It is very useful.

Topic		Replies	Views
Problems of commas in CSV file usage , custom , streams	3	802	July 10, 2020
Error running recipe with CSV file done , solved , streams	8	473	August 17, 2021
CSV not working usage , solved , streams	4	424	October 29, 2020
Adding column from CSV into meta with custom loader usage , solved	3	888	September 21, 2018
Custom JSONL output usage , solved	6	1267	March 13, 2020

TypeError: prodigy.components.loaders.CSV() takes no keyword arguments

Related topics