sent.correct vs sent.teach accuracy degredation

TLDR - Does using both sent.correct and sent.teach for the same dataset reduce training performance?

I have 100 annotations in a job postings dataset that explicitly delineate the correct sentence start through sent.correct. These data achieve ~80% when training using default config and a 70/30 split.

However, I noticed a signficant decrease in training scores after I created 100+ additional annotations using sent.teach.

It appears that the false answers from sent.teach affected training performance. I was able to copy the examples from the db, drop the dataset, and import back in.

How many examples with labels do you have? I usually prefer to have at least ~500 examples in a validation set before I take performance numbers very serious. The reason is that there might be a risk of overfitting on a subset of the data that isn't representative of the task.

That said if you have a relatively small dataset to train/score on, it's possible that the active learning approach, the one learning on newly labelled subsets, overfits a bit on those newly labelled subsets.

My gut feeling is that this issue will go away once you have more labels. This is my gut feeling though, if you have a much larger dataset with labels and if the issue persists then it'd certainly be interesting to dive into a bit more.