using saved texcat trained model for new data set


Sorry for my basic question but I could not figure it out even though I spent hours on documentation and here. Can you please explain how I can use my saved trained model on a new large dataset to classify the texts based on the trained model? In insult classifier video it is shown how to do it for a sentence or sample but I want to label whole unlabeled data set according to the saved model. Again, sorry for taking your time with this basic question, which I am really stuck with.

Thanks in advance.

Hi! The model you've trained is a regular spaCy model – so you can apply it to your large dataset just like you would with any other spaCy model. So you'd load your texts and your model, process the texts with your model, and use the doc.cats property to access the predicted categories. Check out the spaCy docs on efficient processing here. The result could look something like this:

your_data = load_list_or_stream_with_lots_of_texts()
for doc in nlp.pipe(your_data):
    # Do something with the predicted categories here
1 Like