How does the Spacy language model classify before any human annotation?

curious · March 6, 2020, 10:05pm

I'm using the Spacy lg language model as the active learning model for my text classification annotation. I want to know know it works.

The language model wasn't trained with my classification task. How does the model do the classification? Does it try to do the classification using the label I am providing? If that is the case, does it imply that I should choose a meaningful label for my classification problem? Or does the model just randomly choose any instance?

Thanks.

ines · March 7, 2020, 12:15am

When you call nlp.begin_training, the model weights are initialized randomly. So before you update the model with your examples, it will predict something completely abitrary, based on the random weights. The label names have no impact – but of course, whether you have one or five labels and whether they're mutually exclusive makes a difference.

If you're training a model from scratch with a recipe like textcat.teach, the main difficulty is to get over the "cold start problem" and have the model make more meaningful suggestions that you can interact with. For that, you need to update it with enough positive and negative examples. That's usually where the --patterns come in handy – they pre-select examples based on trigger words and phrases so you can start off with enough positive examples to update the model in the loop.

curious · March 9, 2020, 2:12pm

Thanks. That's what we're thinking - use patterns to bootstrap the training process.

curious · March 10, 2020, 9:41pm

Follow up question. In order to get a stable performance, can we initialize the model with the same weights, instead of random weights?

Topic		Replies	Views
Active learning for a multilabel text classifer textcat	1	1126	December 14, 2017
using saved texcat trained model for new data set usage , textcat , spacy , solved	1	428	May 1, 2020
Use SpaCy textcat weights in a Prodigy TextClassifier textcat , solved	3	617	September 19, 2019
textcat.teach not taking into account label value textcat , done	4	602	December 7, 2018
Can't improve textcat model performance textcat	2	389	May 3, 2020

How does the Spacy language model classify before any human annotation?

Related topics