I am sorry about I am not a english speaker, maybe there has lots of mistake.I hope you forgive me.
In my case, I need build a model to classify article. In the example there only 1 label. I need thousands label.
because i have too many label.so there is what i do:
1. build a jsonl file
{"text":"article_text1","label":"label1","answer":"reject"} {"text":"article_text1","label":"label2","answer":"accept"} {"text":"article_text1","label":"label3","answer":"accept"} {"text":"article_text2","label":"label1","answer":"reject"} {"text":"article_text2","label":"label2","answer":"accept"} {"text":"article_text2","label":"label3","answer":"reject"} ......
2. import into dataset
3. use "textcat.batch-train"
first time I import 49760 row data from jsonl into dataset it cover 80 labels
new model looks good. but new problem come -- I need add more label into model
if I continue add row data in same dataset, it will become very huge.
"textcat.batch-train" will very slow, the data of full amount of label will be a disaster.
if I use a new dataset and a trained model, it will raise a exception:
"ValueError: operands could not be broadcast together with shapes"
how can i do for iterate, I need continuous add label.