I have a news dataset of 70K headlines and I am trying to categorize them by topic (10 in total). I store the headlines in a
txt file and also pass a set of patterns in
This is what I use to run prodigy:
prodigy textcat.teach my_dataset en_core_web_lg data/my-data.txt --label t1,t2,t3,... --patterns data/my-patterns.jsonl --loader txt
I associate around 20 pattern words to each topic.
Irrespective of whether or not I run things with the patterns
jsonl I keep getting a No tasks available prompt after around 20-30 annotations.
With 10 topics and 70K headlines I doubt the active learning is good enough to disambiguate between all the topics?
Many thanks for any help!