List of existing classification models?

sdspieg · November 30, 2022, 3:43pm

I just watched Vincent's nice video (we love those - please keep them coming!) on using sentiment models to bootstrap a compliment classification model. I was not aware of that 'Sentimany' repo (nor of any of the other sentiment models) which made me wonder whether these is any overview somewhere of existing classification models (i.e. not just for sentiments, but also for other classes (compliments, insults, threats, hate speech, aggressive language, irony, sarcasm, evidence, hypotheses, etc. etc.) that could be used in similar ways as shown in the video. Thanks!

koaning · December 5, 2022, 11:04am

Happy to hear it!

So some of the models in sentimany are trained myself, and the full story is that I wrote that repo as part of an exercise explained on my personal blog. If this sounds interesting you may also enjoy this episode of the Huggingface podcast where I talk more about it.

As far as "general pretrained models" go, I think there are two avenues.

There is a model on huggingface that I re-use. Be aware that it's typically hard to know for sure if the models have been trained on clean data, or if the data that it's using is really relevant for your, but the models are available. Be aware that many of these models are BERT models which really take up a lot of compute.
You can also try to find relevant datasets and train your own models on those. I usually like to train a simple bag of words model in scikit-learn and export this via ONNX for re-use later. That's what sentimany also uses. If you're interested in learning more about that, there's a calmcode course here.

As always, be careful that you don't put all your eggs in the "pretrained models"-basket. While they can help you prioritise data to annotate, they are just a proxy. Your dataset is probably going to be unique enough that there are edge cases that these pre-trained models fail to capture.

Topic		Replies	Views
Best way to create a model for sentimental analysis usage , textcat	1	533	December 8, 2018
German Insult Classifier usage , textcat , spacy	3	831	January 14, 2019
Prodigy Tutorial Video: Finding Bad Labels for Text Classification project , news	1	316	June 17, 2022
Best Practices for text classifier annotations usage , textcat , best-practices	7	4837	March 24, 2021
Using Text Classification usage , ner , textcat , spacy , custom	3	414	July 30, 2019

List of existing classification models?

Related Topics