I was using the train-curve
recipe on a NER dataset with 8 types of entity, but the recipe only give the evolution of accuracy in all entities, which makes it hard to know on which label I should add more examples.
Because in my case when I learn the train
recipe I can see that some entities are better predicted than others.
thanks.
The train-curve
recipe runs the training several times with different portions of the data and is intended to give you a rough idea of whether you're on the right track by simulating training with different dataset sizes. Outputting results per label would make the output a lot more verbose and I'm not sure it'd be very useful and conclusive. The results could be pretty arbitrary because you're holding back large portions of the data (e.g. 75% in the first run).
That said, under the hood, all the train-curve
recipe does is call into train
with different values for --factor
(by default, 0.25
, 0.5
, 0.75
and 1.0
). So if you want the detailed results for each experiment, you could just call into train
directly.
I see. I just wanted to know for my worst entity type prediction whether adding more annotation on them could help. I think I'll have to do that by myself with a stratified sampling and a changing factor
Thanks