I'm creating a new dataset, and so far I made about 500 annotations. I ran the train-curve command, and the score decreased from 0.3 to 0.29 in the last sample. What should I do at this point to make sure I don't annotate a dataset that won't work? Are there some strategies like going back to make sure that the annotation was more consistent, or troubleshoot and find a root cause if possible? Should I just keep annotating and hope that it improves?