I´m working on a sentimental analysis using textcat and have a few questions.
I have a dataset with 5500 annotations labeled with POSITIVE and NEGATIVE.
I exported the dataset and for each positive label i add the same row as negative reject. And vice versa.
I have created a swedish model that I use as base model for training.
I use it when I annotate and also when I train a new model using textcat.batchtrain. I always output to a new model.
In the forum you are talking about train from a fresh model. Does my swedish base model act as fresh model or do I need to create a total new one, en empty one(nlp-to-disk), to train from?
My labels are POSITIVE and NEGATIVE with uppercase. In my data manipulation script where I add a negative reject row for each positive row I added a new label with lowercase negative, by mistake. The outputed model ended up with 4 labels and I got a much better result on the new lowercase labels than the annotated uppercase. Is there any explanation behind this? Is anything stored in the “from model” during annotation or batch train?