Multilabel text classification

Hi,
I am new to Prodigy, and try to load in a dataset with multiple and exclusive labels. The json file with two data look like this:
[
{"text": "This is about robot science", "label": "TECHNOLOGY"},
{"text": "This is about money","label": "ECONOMY"}
]

The data is imported as follows:
prodigy db-in data_minimal data.json

Then I start Prodigy with the following command:
prodigy textcat.manual data_minimal data.json --label TECHNOLOGY,ECONOMY

But I only get one label that consist of both labels merged:

image

What I really want to do is to train a bert model for email categorization and then use Prodigy to manually control emails that are wrongly classified.
Is there a basic tutorial except for the support sites? I couldn't find anything really basic about how to get started on Youtube.
Thanks
Anders

Hi! Are you using Windows PowerShell by any chance? It seems to have this quirk that needs you to add strings explicitly in quotes, so try --label "TECHNOLOGY,ECONOMY" instead.

Btw, you shouldn't have to export anything into your dataset before you start annotating – the data will be read in automatically from the file when you start the server. The dataset is intended for the collected annotations, not the raw data. If you import raw data, you end up with unannotated examples in your dataset, which is typically not what you want.

1 Like

Thanks a lot for rapid and concise answer! Using quotes solved my first problem. If I understand you correctly, you recommend me to start the annotation session with the following command:

prodigy textcat.manual - data.json --label "TECHNOLOGY,ECONOMY"

Will Prodigy still save the annotated data in "data_minimal" then? Or where else will it be stored?
Anders

I can answer that. It'll save it to your database directly (into the dataset you expressed). See details here. So it would look something like this

prodigy textcat.manual email-labels ./data.json --label "TECHNOLOGY,ECONOMY"

assuming you have data.json looking something like this

[{"text": "some text"}, {"text": "some other text"}]

The above command would save your annotations into a dataset called email-labels

1 Like

Thanks a lot! It takes some time to get used to the syntax here, but this brought me a long way further.