More efficient preprocessing with similarity

AnnaAnia · February 25, 2020, 10:18am

Hi,

We are using an entity annotated with prodigy in your similarity function to suggest to a user the most appropriate product code from 4.5k codes where the code description fits the entity

Before applying the similarity function we manipulate the word frequency of the 4.5k code descriptions to surface the most important words. This gives us a huge column where each rows has to be nlp'ed - this take ca 14mins.

Can you recommend a more efficient way of doing this?

ines · February 25, 2020, 11:14am

How are you currently doing it? Are you using nlp.pipe and disabling the components you don't need? See here for details on efficient processing: https://spacy.io/usage/processing-pipelines#processing

AnnaAnia · February 25, 2020, 12:04pm

Thanks Ines. We do need all the pipes but have been able to recode so that for the largest task they get disabled and restored later. Thank you

Topic		Replies	Views
Entity resolution with Prodigy usage , ner , third-party	2	785	March 26, 2024
Best way to prepare a long text for annotations usage , spacy , solved	4	2009	August 29, 2018
training a new entity type with Prodigy usage , ner	4	579	March 8, 2019
spaCy, prodigy, annotation usage , ner , solved	2	650	February 8, 2019
Will NER improve Text Categorization?	2	338	July 18, 2022

More efficient preprocessing with similarity

Related Topics