I´m working on a sentimental analysis using textcat and have a few questions.
I have a dataset with 5500 annotations labeled with POSITIVE and NEGATIVE.
I exported the dataset and for each positive label i add the same row as negative reject. And vice versa.
I have created a swedish model that I use as base model for training.
I use it when I annotate and also when I train a new model using textcat.batchtrain. I always output to a new model.
My questions:
-
In the forum you are talking about train from a fresh model. Does my swedish base model act as fresh model or do I need to create a total new one, en empty one(nlp-to-disk), to train from?
-
My labels are POSITIVE and NEGATIVE with uppercase. In my data manipulation script where I add a negative reject row for each positive row I added a new label with lowercase negative, by mistake. The outputed model ended up with 4 labels and I got a much better result on the new lowercase labels than the annotated uppercase. Is there any explanation behind this? Is anything stored in the “from model” during annotation or batch train?
Thanks!