Hi community,
I am new to prodigy and a bit confused about the process.
I am working on a project to classify sentiment on some news and journal articles with the following labels: positive, negative and neutral. I started the annotation process by using a manual recipe and as of now we have collected about 500 samples. I run an experiment and trained a model on that data and got an accuracy of 70%.
The manual process is still running but I currently would like to switch to the active learning mode using the model I have trained and a bit confused on the steps I need to take.
Hi! In that case, you could use the model you trained on your previously collected 500 annotations and use that as the input model when you run textcat.teach and save the annotations to a new dataset. Based on the model's predictions, the recipe will then select the most uncertain predictions and will let you accept/reject them.
Depending on the number of labels you have, it might make sense to start with a subset of labels, especially those that may need the most improvement. If you run the training with --label-stats, it will give you a breakdown of the accuracy per label, so you can see if there's one label in particular that stands out.
After you've annotated with textcat.teach, you can then train a new model from scratch using the previously collected annotations + the new annotations.