Prodigy supports CSV files as data source by default, and as detailed here https://prodi.gy/docs/api-loaders#input, it supports a "Meta" column for metadata that will then be included in the task object as {"text":"task text provided", "meta":{"meta":"metadata value provided"}}. Is it really not possible to provide metadata with a custom key in the CSV format? I.e., I would ideally want to provide a column called "ID", resulting in {"meta":{"ID": ... }}, but that doesn't seem to work. Can't inspect the code as it's compiled.
Is this possible with the default CSV loader, or do I have to create my own?
Currently, if you have a column called meta then it will be added as metadata. This does imply that the CSV loader will only be able to encode one column of metadata.
Is there a reason why you can't turn the CSV data into jsonl? If you're using pandas you might be able to run something like below to turn a CSV table into a list of dictionaries.
list_dict = [{**d, 'meta': {'meta1': d['col_1'], 'meta2': d['col_2']}}
for d in df.to_dict(orient="records")]
You can then use srsly to save this to disk, which should allow you to use the JSONL loader.
My team has built a data management system around prodigy, and we wanted to support upload of CSV datasource files. I was hoping to be able to use the built-in functionality and support arbitrary metadata, but I can certainly create either a middleware layer to convert the CSV to JSONL, or my own CSV data loader.