Importing a CSV with multiple labels

What’s the best way to import a CSV with multiple labels, using db-in?

I could alternatively split up the dataset and import each label separately, but I don’t see a flag for setting the label similar to the --answer flag.

I just read this other answer and perhaps the best way is to convert my CSV into JSON with labels. Just wondering whether there is an easier way to import the labels in the CSV directly (it’s one-hot encoded).

Do you mean multiple labels as in, multiple labels on each example?

db-in lets you load data in all file formats supported by Prodigy. The CSV loader supports the column headers Text, Label and Meta, and will transform those accordingly on import or on load (see the “Input formats” section of the README for details). For example:

Text,Label,Meta
This is another sentence.,NEGATIVE,0.1
{"text": "This is a sentence", "label": "POSITIVE", "meta": {"meta": 0.1}}

However, if your data is more complex than that, or if you’re trying to import NER annotations, the best strategy is probably to convert it to JSON or JSONL first and then import it via db-in.

Awesome, this solves my issue. Thanks answering and for building a great tool!

1 Like