Vectors for Entity Linking

mattb · March 5, 2024, 6:27pm

I'm working toward an entity linking stage in my pipeline, but hit a wall. I've been using en_core_web_trf, because I wanted the context sensitivity of the transformer model and have that trained and ready, but when I went to build my knowledge base for entity linking that model doesn't have the Vector output.

I think I have a few choices, and I'm hoping someone can steer me in the right direction. I could 1) give up on using transformers and use the web_lg model, or 2) I can try and figure out how to store the tensor output in the vector field of the KB.

The first path seems the most sensible, but I don't want to give up the context the transformer provide...the second path seems like I'll have to do a lot more coding to bring the tensor output into every step of Entity Linking.

The last idea I had was to add tok2vec to my pipeline just for the vectors used in the KB similarity lookup...I lose the context benefit in that one scenario, but am otherwise ok. Obviously this means having transformers and tok2vec in my pipeline, which could maybe cause problems?

Let me know if I'm missing something or if you have a suggested path for me!

Matt

magdaaniol · March 6, 2024, 10:15am

Hi @mattb and welcome to the forum!

The short recommendation would be your strategy 2) i.e. setting custom hooks for vector representation derived from trf tensors.
This spaCy discussion thread is about exactly the same issue. I would direct you there for a more technical advice on how to achieve that: how to train an NEL model with transformers? · explosion/spaCy · Discussion #7315 · GitHub

Topic		Replies	Views
Merging NER models usage , ner , spacy	2	1204	January 24, 2020
Similar models to en_core_web_lg/en_vectors_web_lg usage , spacy	5	1268	February 25, 2021
Spancat: use of embeddings, compatibility with transformers, upstream to relationship extraction usage , relations , spancat	4	777	November 17, 2021
Can't find model 'en_vectors_web_lg'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. install , solved	3	3199	July 28, 2020
Training new entity type with en_pytt_bertbaseuncased_lg model usage , ner , transformers	5	2030	August 30, 2019

Vectors for Entity Linking

Related topics