feature request : more details on train-curve recipe results

I was using the train-curve recipe on a NER dataset with 8 types of entity, but the recipe only give the evolution of accuracy in all entities, which makes it hard to know on which label I should add more examples.
Because in my case when I learn the train recipe I can see that some entities are better predicted than others.
thanks.

The train-curve recipe runs the training several times with different portions of the data and is intended to give you a rough idea of whether you're on the right track by simulating training with different dataset sizes. Outputting results per label would make the output a lot more verbose and I'm not sure it'd be very useful and conclusive. The results could be pretty arbitrary because you're holding back large portions of the data (e.g. 75% in the first run).

That said, under the hood, all the train-curve recipe does is call into train with different values for --factor (by default, 0.25, 0.5, 0.75 and 1.0). So if you want the detailed results for each experiment, you could just call into train directly.

1 Like

I see. I just wanted to know for my worst entity type prediction whether adding more annotation on them could help. I think I'll have to do that by myself with a stratified sampling and a changing factor
Thanks