Is there a vectorized way to get [label, text]?

chbeyer · March 16, 2020, 4:54pm

Hi Ines!

I am wondering if there is a way that I can get [label, text] in one shot as a vector without looping through? I understand that ner.pipe does it for you for all texts. But what you get back is an object that contains .ents with 'Lbel' and 'text' in them.

Thank you very much!

ines · March 17, 2020, 12:11pm

I'm not 100% sure I understand the question correctly – so you want to extract a vector of the text and entity as a vector/array instead of strings?

Internally, spaCy stores everything as IDs and the strings are only computed when you access them. Same with the Span objects like the doc.ents, which are only views of the Doc. So instead of accessing the entity spans and getting their texts, you can also get the Token.orth, Token.ent_id and Token.ent_iob for each token, or use Doc.to_array for a single numpy array.

chbeyer · March 17, 2020, 6:34pm

Thanks for your response, Ines!

To clarify, I created a list of text strings to feed into my trained prodigy model:
my_texts = list(my_pd_frame)
Run through my model:
mydocs = list(nlp.pipe(my_texts))

But, I cannot convert my mydocs list to np_array. Are you suggesting to convert nlp.pipe object?

Thank you very much in advance for your help!

ines · March 18, 2020, 9:45am

mydocs here is a list of spaCy Doc objects. Doc objects provide various methods and attributes for accessing the annotations – for instance, Doc.to_array, which outputs the attributes you're interested in as a numpy array. For example:

my_np_arrays = [doc.to_array(["ORTH", "ENT_TYPE", "ENT_IOB"]) for doc in nlp.pipe(my_texts)]

Topic		Replies	Views
NER: fast way of entity label extraction usage , ner , spacy	2	452	April 3, 2020
How to use two .txt files one with vectors the other with words usage , spacy , solved	4	1967	May 26, 2018
Convert Gensim FastText to spaCy-readable Word2Vec format for terms.teach recipe spacy , terms , solved , gensim	4	1524	September 11, 2020
Spacy NER model results into a format of prodigy dataset jsonl format Getting Started usage , ner , spacy , solved	2	425	October 14, 2020
Loading fasttext vectors to spacy/prodigy ner , spacy , solved	9	1559	February 13, 2022

Is there a vectorized way to get [label, text]?

Related topics