Hi,
I am new to Prodigy, and try to load in a dataset with multiple and exclusive labels. The json file with two data look like this: [ {"text": "This is about robot science", "label": "TECHNOLOGY"}, {"text": "This is about money","label": "ECONOMY"} ]
The data is imported as follows: prodigy db-in data_minimal data.json
Then I start Prodigy with the following command: prodigy textcat.manual data_minimal data.json --label TECHNOLOGY,ECONOMY
But I only get one label that consist of both labels merged:
What I really want to do is to train a bert model for email categorization and then use Prodigy to manually control emails that are wrongly classified.
Is there a basic tutorial except for the support sites? I couldn't find anything really basic about how to get started on Youtube.
Thanks
Anders
Hi! Are you using Windows PowerShell by any chance? It seems to have this quirk that needs you to add strings explicitly in quotes, so try --label "TECHNOLOGY,ECONOMY" instead.
Btw, you shouldn't have to export anything into your dataset before you start annotating – the data will be read in automatically from the file when you start the server. The dataset is intended for the collected annotations, not the raw data. If you import raw data, you end up with unannotated examples in your dataset, which is typically not what you want.
Thanks a lot for rapid and concise answer! Using quotes solved my first problem. If I understand you correctly, you recommend me to start the annotation session with the following command: