keeping information from training data in the dataset


I am testing prodigy for NER training.
In my input CSV, I have a column with an ID which identifies my data. I'd like to retrieve that column in the dataset which I output from prodigy after the training. Is there a way to do that?


Hi! If you're loading from a CSV file, Prodigy will only read the text, meta and label columns (see here).

If you want to pass other arbitrary meta information (like IDs) through, the easiest way is to just convert your data to JSONL (newline-delimited JSON). All other fields will be stored in the database with the annotation task, and when you run prodigy db-out, it should be included in the data.