Best way to create a model for sentimental analysis

mikael · December 6, 2018, 1:35pm

I´m working on a sentimental analysis using textcat and have a few questions.

I have a dataset with 5500 annotations labeled with POSITIVE and NEGATIVE.

I exported the dataset and for each positive label i add the same row as negative reject. And vice versa.

I have created a swedish model that I use as base model for training.
I use it when I annotate and also when I train a new model using textcat.batchtrain. I always output to a new model.

My questions:

In the forum you are talking about train from a fresh model. Does my swedish base model act as fresh model or do I need to create a total new one, en empty one(nlp-to-disk), to train from?
My labels are POSITIVE and NEGATIVE with uppercase. In my data manipulation script where I add a negative reject row for each positive row I added a new label with lowercase negative, by mistake. The outputed model ended up with 4 labels and I got a much better result on the new lowercase labels than the annotated uppercase. Is there any explanation behind this? Is anything stored in the “from model” during annotation or batch train?

Thanks!

honnibal · December 8, 2018, 5:44am

If your model already has weights for text classification, then yeah I would recommend starting from a new model, rather than resuming training. It's better to train from random weights each time instead of resuming from the previous training, because it's a bit easier to reason about, and you might avoid overfitting better. The other thing you might want to do is download some Swedish vectors from here: Word vectors for 157 languages · fastText . You can use these to initialise a model with spacy init-model. Pretrained vectors are likely to be pretty helpful for your problem.

The situation you describe with the four labels is very confusing! I'm not sure what could be going on there. If you keep finding the same thing --- that this weird doubling of the labels improves the scores --- I'd be curious to dig a little deeper.

Topic		Replies	Views
Multilabel text classification annotation approach usage , textcat , solved	6	1546	November 6, 2018
from textcat.manual to textcat.teach usage , textcat , best-practices	1	514	February 13, 2022
textcat.batch-train question	7	419	November 28, 2022
Problem with annotation usage , textcat , solved	5	662	June 2, 2020
How can I improve a textcat model? usage , textcat	1	731	May 6, 2019

Best way to create a model for sentimental analysis

Related Topics