Textcat with Floret Models?

imaurer · November 12, 2022, 3:20pm

I have been experimenting with Floret and I really appreciate how fast the vectorization runs and how it handles spelling variation.

Ideally, I will be using these word vectors in 2 ways:

Classifying sentences with a set of labels.
Loading the sentence vectors into a vector database (pg_vector or weaviate) to do KNN queries.

I would like to use Prodigy to create a text classification model built on these Floret word vectors to do the labeling. Is this possible?

koaning · November 14, 2022, 1:38pm

The train command from Prodigy wraps around the train command in spaCy but with some helpful settings. In particular, the --base-model setting will allow you to point to a spaCy model on your disk, which could be a model that contains the floret vectors.

You can create a local model with Floret vectors with the init vectors cli command in spaCy. Once that model exists locally, you should be able to point to it from Prodigy train.

In terms of a vector database, you might also get away with using a more lightweight library instead. You may enjoy using annoy or PynnDescent. Should you choose annoy, I do recommend tuning the number of trees depending on how large your dataset is.

Let me know if you come across any issues though!

Topic		Replies	Views
Using Fastext vector model in Prodigy? usage , spacy , solved	7	3471	March 15, 2018
Do the outputted models using textcat.batch-train make use of word vectors? usage , textcat , spacy	2	625	March 28, 2019
PubMed word vectors textcat , custom , solved , medical	3	871	September 8, 2021
Word vectors: How do they work? usage	1	1449	April 8, 2018
Prodigy doesn't "converge" fast to initial word seeds usage , spacy , solved	7	938	February 13, 2018

Textcat with Floret Models?

Related topics