No tasks available for textcat.teach with multiple labels

Hi

I am trying to use prodigy for annotations. I have 30K samples unlabeled data in .text file. This data i want to annotate using prodigy with 10 labels. for this i have followed below mentioned way,

with open('user_says_from_orig_data_23_march_2.txt','r') as fin:
     for line in fin:
        task = {'text': line}
        print(json.dumps(task))

this python script for streaming my data to prodigy

python my_script.py | prodigy textcat.teach lpdata3 en_core_web_sm --label  l1,l2,l3,l4,l5 etc.

Here problem is when i run this command i am able access prodigy web api, form where after doing some 4 to 5 samples annotations. i am seeing No tasks available.

what am i doing wrong?
Is there any better way to proceed for this use case?

And also if i am not giving any lables i can see on 2K samples on web API. Why its is not showing 30K samples?

Thanks for the report! Are you using the latest version, Prodigy v1.4.1?

And are you using a patterns file with examples of the labels you're trying to annotate? When you start from scratch, Prodigy has no concept of any of those labels, and if you just stream in a lot of raw data, it can take a lot of time for the model to learn.

To pre-select texts based on keyword matches, you can pass in a patterns.jsonl file via the --patterns argument. The patterns file can include token descriptions like the ones used by spaCy's Matcher. For example:

{"label": "CITY", "pattern": [{"lower": "new"}, {"lower": "york"}]}
{"label": "CITY", "pattern": [{"lower": "paris"}]}

For example, if you're classifying whether a text is about a city, the above patterns will tell Prodigy to select all texts mentioning "new york" and "paris" and label them "CITY", so you can say yes or no to them. You can find more details on this in your PRODIGY_README.html. Our video tutorial on training a new entity type also shows the usage of patterns. The patterns in textcat.teach work the same way.

Your script looks good – but you might not even need it. Prodigy supports loading in data from a .txt file and the built-in loader includes a few more additional checks, too – like making sure your example texts aren't empty strings.

prodigy textcat.teach lpdata3 en_core_web_sm user_says_from_orig_data_23_march_2.txt --label  l1,l2,l3,l4,l5 etc.

I'm not sure I understand this correctly – do you mean the total examples you're annotating? Prodigy's textcat.teach recipe uses the model in the loop to suggest the most relevant examples for annotation. These are usually the ones that the model is most unsure about, i.e. the ones with a prediction closest to 0.5. This also means that Prodigy won't ask you about all examples – only the most important ones that produce the best possible training data. This is usually faster and more efficient than labelling every single example.

If you need to annotate all examples in your dataset to create gold-standard annotations, you might want to use a different recipe instead and skip the active learning component. For example, you could use the choice interface, display each example with your label options and let the annotator select one or more. You can find an example of this in the custom recipes workflow here.