Impact of active learning on NER accuracy interpretation




I am using NER and wonder whether active learning impacts the interpretation of the accuracy.

My thoughts are the following:

  • active learning only selects most challenging examples
  • the accuracy in the evaluation set might be lower than if I would use randomly drawn examples for the evaluation
  • that might mean, that, e.g., 65% is in reality 65+x%

Overall, the question is theoretical, with print stream I can look at the results and I see I like it. However, I was wondering when with more examples (500->1000) I just saw a minimal increase in accuracy (which might also be just right).