Traning/validation in Textcat/

Tatiana · May 26, 2020, 9:54am

Hello, I have some questions regarding texcat/spacy, I found some answers here and git page, but i would be very appreciated if you could help me please.
I use code:

pipe_exceptions = ["textcat", "trf_wordpiecer", "trf_tok2vec"]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    
with nlp.disable_pipes(*other_pipes):  # only train textcat
    optimizer = nlp.begin_training()
    #print("Training the model...")
    print('{:^5}\t{:^5}\t{:^5}\t{:^5}'.format('LOSS', 'Prec', 'Recall', 'Fscore'))
    tmp=[]
    tmp2=[]
    acc = []
    acc2=[]
    for i in range(n_iter):
        losses = {}
        # batch up the examples using spaCy's minibatch
        batches = minibatch(train_data, size=compounding(4., 32., 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer, drop=drop_rate, losses=losses)

        tmp.append(losses['textcat'])## save losses
        acc.append(class_report(nlp.tokenizer, textcat, train_texts, train_cats)['accuracy'] ) ## calculate accuracy for train set
        
        losses_dev={}
        batches2 = minibatch(dev_data, size=compounding(4., 32., 1.001))
        for batch2 in batches2:
            texts2, annotations2 = zip(*batch2)
            nlp.update(texts2, annotations2, sgd=None, losses=losses_dev)   #No weigth update here on test data 
        with textcat.model.use_params(optimizer.averages):
        # evaluate on the dev data split off in load_data()
            scores = evaluate(nlp.tokenizer, textcat, dev_texts, dev_cats)
        print('{0:.3f}\t{1:.3f}\t{2:.3f}\t{3:.3f}'  # print a simple table
              .format(losses['textcat'], scores['textcat_p'],
                      scores['textcat_r'], scores['textcat_f']))
        
        tmp2.append(losses_dev['textcat'])
        acc2.append(class_report(nlp.tokenizer, textcat, dev_texts, dev_cats)['accuracy'] )

++++++++++++++++++++++++++++++++++++

Initially I used example from https://spacy.io/usage/training#textcat. Do i undestand correctly in that example, that losses that are calculated based on dev_texts do not bring any updates to a model and parameters, just use to print? I am asking, since i am thinking if i need to split into train/test or train/test/final evaluation?
With code above and 'bow' type, i got output losses and plots.
'LOSS', 'Prec', 'Recall', 'Fscore'
|3.215|1.000|0.766|0.867|
|2.314|1.000|0.844|0.915|
|1.494|1.000|0.937|0.968|
|1.138|1.000|0.969|0.984|
|0.844|1.000|0.984|0.992|
|0.607|1.000|0.984|0.992|
|0.422|1.000|1.000|1.000|
|0.408|1.000|1.000|1.000|
|0.301|1.000|1.000|1.000|
|0.275|1.000|1.000|1.000|
|0.257|1.000|1.000|1.000|
|0.181|1.000|1.000|1.000|
|0.215|1.000|1.000|1.000|
|0.115|1.000|1.000|1.000|
|0.138|1.000|1.000|1.000|
|0.132|1.000|1.000|1.000| etc losses are still going down
It looks strange to have so high accuracy.

image517×567 23.8 KB

What might be a reason for it? It looks wrong for me.
I check here Help needed to get started with text classification
It was discussion about bow vs cnn. Since i have compared several types, i have feelings that 'default' is working slower than 'ensemble'. I do not understand why. Should be the same? It migth be related to my previous question with too high accuracy. But in general yes, seems to be 'bow' has a bit higher accuracy and the fastest time, as i was in that discussion.
Is it possible somehow to link textat with any explainers? In ideal case extract kernel to see what are the most meaning features.
I am not sure if i need to do any additional text preprocessing (I almost have not done). I found a tip " Tip: merge phrases and entities", but it was not regarding textcat, so not sure what is the best strategy here?

Thank you.

Topic		Replies	Views
Pretraining support usage , textcat , spacy , solved	2	1045	May 21, 2019
Load error after adding custom textcat model to the pipeline textcat , spacy	7	2081	June 26, 2019
SpanCat and TextCat textcat , custom , spancat	1	27	September 17, 2024
Custom spacy pipe for Prodigy view textcat , spacy	2	670	November 21, 2019
Save trained model and add to a pretrained model usage , textcat , spacy , solved	4	1507	September 19, 2019

Traning/validation in Textcat/

Hello, I have some questions regarding texcat/spacy, I found some answers here and git page, but i would be very appreciated if you could help me please. I use code:

Related topics

Hello, I have some questions regarding texcat/spacy, I found some answers here and git page, but i would be very appreciated if you could help me please.
I use code: