Hello!
I'm trying to create a model that sorts news article headlines by polarity and update it in textcat.teach. There are three classes: positive, neutral, and negative. They are exclusive and the samples belong to only one class.
When I updated the model, the score got worse.
What is the cause of this? Is this task unsuitable for textcat.teach, or is my procedure wrong?
Below are my steps.
Hi! The textcat.teach recipe will only suggest you examples to annotate, so if the annotations you collect here are consistent, whether they were created with textcat.teach or some other process doesn't matter – it's all about how you train from them.
Instead of updating the model artifact multiple times, try training from the blank base model using all annotations you've collected. If you train multiple times with different datasets, it's much harder to reason about the results and you may have to deal with "forgetting effects" etc.
It also looks like you're not using a dedicated evaluation set and just hold back 10% of the data, which seems very little? This means that you can't really compare the accuracy between training runs – what ends up in those 10% will be super different each time because the data is different, and potentially not representative. That makes it very hard to know if your model is improving.
The textcat.teach recipe will only suggest you examples to annotate,
I was totally misunderstood about this! Thank you for teaching me.
Instead of updating the model artifact multiple times, try training from the blank base model using all annotations you've collected.
Following this advice, I tried changing step 4 this way. Train a blank base model using the annotation db in step 1 and the annotation db in step 3. The eval split has been increased from 10% to 20%.
The model trained with 1000 annotations made with textcat.manual was Best F-Score 64.987, but why is it so bad when combined with the annotations made with textcat.teach?
I wonder if there's a weird/unintended interaction here because of the different data types you've collected and the mix of complete and incomplete annotations: the textcat.manual annotations contain all labels and a definitive answer, whereas the textcat.teach annotations are binary yes/no answers and may not include the final answer (e.g. you may only know that label X doesn't apply to a given text). Prodigy should be able to handle both types, but maybe something is going wrong somewhere
Could you try running prodigy data-to-spacy with your two datasets, train with spacy train directly and check the results you get there?
Thanks for your advice! textcat.teach annotations do not have a definitive answer. Certainly it is. Unfortunately, I'm new to spaCy and it seemed difficult to try spacy train, so I tried two that seemed easier.
I removed the answer: reject sample.
I removed those samples that only exist in textcat.teach annotations. textcat.teach annotations has been reduced from 4000 to 1600.
I made the two data formats the same.
The formats of the label and the answer part of the two data are different as follows.
The active learning mostly happens during annotation and helps with the example selection – it should obviously als have an impact on training because the selected examples are better, but that's a bit more indirect. The "magic" happens when you annotate.
If you want to experiment with converting your annotations, I would recommend doing it the other way around and creating one annotation per example with multiple options.
Also keep in mind that unless you use a dedicated evaluation set, the results you're seeing aren't necessarily comparable. If you're evaluating binary decisions on a selection of sparse binary annotations, you may see a higher number at the end, but that doesn't mean that your model is "better".