I'm trying to improve a textcat model with the recipe textcat.teach
using a 65k items dataset, but when open prodigy session, only loads 2-3 items and after them, shows the "no tasks available" message
is near to impossible have only 3 items wrong
If I reload the page, shows the same items with others labels
the dataset hasn't duplicates, that was checked before
update: I rised from 10 to 100 the batch size, and now it works in theory
"in theory" because only is showing a little amount of categories, while the dataset more items in the others categories
Are you able to share a small subset of the data that demonstrates the same effect? If the same phenomenon occurs for the 1st ten examples then that'd be interesting to investigate deeper. I might be able to reproduce the error locally if you have those examples.
One question about your labels. You seem to run --label /app/data/categories such that it points to a file on disk. But, per the docs, the --label argument is meant to indicate a name of a single category. Do you have a label called /app/data/categories or were you hoping to consider many labels defined in a file?
The label argument is meant to accept a single string with the names of the label to attach. I fear that you may be adding a label called /app/data/categories instead of the names that you're interested in.
The example that's listed on the docs shows a string split by ,, not a filepath.