TypeError: prodigy.components.loaders.CSV() takes no keyword arguments

Hi all! First of all I am a new user of prodigy so I am not sure if I am using it well or effectively there is a bug.
I try to load a stream:

stream = CSV("scorecard_ES_B.csv", delimiter=";")

But I get the following error:

stream = CSV("scorecard_ES_B.csv", delimiter=";")
TypeError: prodigy.components.loaders.CSV() takes no keyword arguments

I read in documentation that CSV has "delimiter" as additional keyword argument.
Please help

Best regards

hi @zparcheta!

Thanks for your question and welcome to the Prodigy community :wave:

Yes, I think you're right. I got the same problem. I went to the source code and found we don't even use the delimiter as an argument :slight_smile:

Prodigy pro tip: you can find the built-in recipe code by looking at the Location: path of prodigy stats. This one is in the folder components/loaders.py. You could even modify this now so that your built-in recipe works correctly.

The good news is this looks like a minor fix that I can put in a ticket. It likely won't get released until our next patch.

In the meantime, I've created a custom Python script that can convert your .csv to a .jsonl (Prodigy's preferred format):

# csv_to_jsonl.py
import csv
import json
import typer
import srsly

def csv_to_jsonl(csv_file, jsonl_file, delimiter):
    with open(csv_file, 'r') as file:
        reader = csv.DictReader(file, delimiter=delimiter)
        rows = list(reader)

    jsonl_data = []
    for row in rows:
        json_data = {"text": row.pop("text"), "meta": row}
        jsonl_data.append(json.dumps(json_data))

    with open(jsonl_file, 'w') as file:
        file.write('\n'.join(jsonl_data))

    typer.echo(f"Conversion complete. JSONL file created: {jsonl_file}")

def main(csv_file: str, jsonl_file: str, delimiter: str):
    convert_csv_to_jsonl(csv_file, jsonl_file, delimiter)

if __name__ == "__main__":
    typer.run(main)

Let's assume you have this input.csv (semicolon delimited):

text;column1;column2
"This is a sentence";"Some meta info";"Some other meta info"

The additional columns are optional and you should only include them if you want them to be included in the interface as metadata.

You can convert it to a .jsonl by running:

python csv_to_jsonl.py input.csv output.jsonl ";" 
# output.jsonl
{"text": "This is a sentence", "meta": {"column1": "Some meta info", "column2": "Some other meta info"}}

Alternatively as a second option, you can use it as a custom loader by modifying the script to be a Prodigy recipe and load it:

# csv_to_jsonl_prodigy.py
import csv
import json
import typer
import srsly
import prodigy

@prodigy.recipe("csv_to_jsonl") 
def csv_to_jsonl(csv_file, delimiter):
    with open(csv_file, 'r') as file:
        reader = csv.DictReader(file, delimiter=delimiter)
        rows = list(reader)

    jsonl_data = []
    for row in rows:
        json_data = {"text": row.pop("text"), "meta": row}
        print(json.dumps(json_data))

def main(csv_file: str, delimiter: str):
    convert_csv_to_jsonl(csv_file, delimiter)

if __name__ == "__main__":
    typer.run(main)

Notice that the only differences are adding the Prodigy decorator (@prodigy.recipe("csv_to_jsonl") , importing prodigy and printing out each json line instead of exporting it to a .jsonl file.

python -m prodigy convert_csv_to_jsonl input.csv ";" -F csv_to_jsonl_prodigy.py | python -m prodigy ner.manual ner_data blank:en --label ORG,PERSON -

Sorry again for the hassle but thank you a lot for bringing this to our attention!

1 Like

Dear @ryanwesslen

Thank you very much for your help. It is very useful.