Prodigy batch train and contextual weights

madhujahagirdar · January 27, 2018, 9:07pm

If I take a standard GloVe vector embeding and use text classification batch train feature of prodigy, does the batch train algorithm alter the weight of the words based on the contextual meaning of the words or do I have to do ner batch train and then use text classification in prodigy to take account of that.

honnibal · January 27, 2018, 9:12pm

There are two model architectures available for text classification:

The low_data=False architecture uses two convolutional layers after the word vectors, so the model can learn from phrases up to three words long.
The low_data=True architecture does not use convolutional layers, so it can only learn from single words.

By default low_data=True is enabled if you have <1000 examples. You can also pass low_data=False explicitly when creating the TextCategorizer object, to make sure you are getting contextual meanings. You can find the two architectures defined within spaCy here: https://github.com/explosion/spaCy/blob/master/spacy/_ml.py#L469

madhujahagirdar · January 27, 2018, 9:28pm

I have more 1 million pre-trained classified text data for a specific label, so, when I use prodigy batch train, I assume that it would learn that contextual meaning and the resulting spacy model would contain adjusted weights based on the contextual meaning based on the healthcare-specific data. Once I have the spacy model, can I then use it as the generalized model (not just for the pre-trained label for which I was using it for ) for any classification problem not the current one at hand?

madhujahagirdar · January 27, 2018, 9:31pm

Additionally, does the text classification do all the Standard Tokenization, Lemmatization, stop word removal etc., during the training, I assume yes (might be a dumb question).

honnibal · January 28, 2018, 4:59pm

The text classification tokenizes, but it doesn’t lemmatize or remove stop words. Those processes aren’t always good, and the CNN benefits from the presence of function words to learn from non-compositional phrases.

madhujahagirdar · January 28, 2018, 6:55pm

Once I build a spacy model can I use it as generalized model?
My understanding is that it might be not effective as the Attend layer would optimize the probability weights in vector space for the current label under consideration. For health-care specific use case then, would it be better if i :

take a corpus of million events (docs) do NER batch training using GloVe or Word2Vec, as the starting point, and develop a base model ( applicable to all use cases) or develop a health-care-specific word2vec if we have more than 10 million events?
Use Prodigy, and either manually annotate or use pre-trained data to develop spacy model for a specific use case. [using the above word2vect step]
Use prediction to predict the outcome of certain text

madhujahagirdar · January 29, 2018, 9:07pm

any thoughts on above?

honnibal · January 29, 2018, 9:47pm

I might be misunderstanding the question here. Let me know if it seems so.

The choice of doing NER training or text classification training mostly depends on your end goal. In some situations you can do sentence classification instead of NER to achieve some purpose; in other cases you definitely want entity recognition. It depends on the task, and what sort of output you need.

If per-sentence labels can solve your business need, I would definitely try to train the text classifier first. Sentence labels are generally easier for the model to predict, and also faster to annotate. Labelling specific sequences of text introduces a lot of difficult decisions your task might not care about. These decisions slow down learning because if the model is off by one token, it must consider its answer incorrect. Similar considerations apply during annotation.

madhujahagirdar · January 30, 2018, 1:59am

Maybe I will put in a different way. I have two tasks to be performed for text classification on a set of same millions of documents. I have pre-annotated data for task 1 and not for task2. If I load the data for task1 and run the prodigy batch train, it generates a spacy model. Can I use it as the generalized model for task1 and task2 or its only applicable for task1.

If the output model is for task1 only then, My question is, how do I create generic spacy vector model for health-care specific data, which is not specific to task but for the specific domain using prodigy or spacy ?

Topic		Replies	Views
Sentiment of single words/phrases usage , textcat , spacy , solved	2	1031	May 2, 2019
Do the outputted models using textcat.batch-train make use of word vectors? usage , textcat , spacy	2	595	March 28, 2019
Use SpaCy textcat weights in a Prodigy TextClassifier textcat , solved	3	616	September 19, 2019
Pretraining support usage , textcat , spacy , solved	2	1045	May 21, 2019
Prodigy textcat train optimization?? usage , textcat , spacy	3	538	March 23, 2020

Prodigy batch train and contextual weights

Related topics