Vectors for Entity Linking

I'm working toward an entity linking stage in my pipeline, but hit a wall. I've been using en_core_web_trf, because I wanted the context sensitivity of the transformer model and have that trained and ready, but when I went to build my knowledge base for entity linking that model doesn't have the Vector output.

I think I have a few choices, and I'm hoping someone can steer me in the right direction. I could 1) give up on using transformers and use the web_lg model, or 2) I can try and figure out how to store the tensor output in the vector field of the KB.

The first path seems the most sensible, but I don't want to give up the context the transformer provide...the second path seems like I'll have to do a lot more coding to bring the tensor output into every step of Entity Linking.

The last idea I had was to add tok2vec to my pipeline just for the vectors used in the KB similarity lookup...I lose the context benefit in that one scenario, but am otherwise ok. Obviously this means having transformers and tok2vec in my pipeline, which could maybe cause problems?

Let me know if I'm missing something or if you have a suggested path for me!


Hi @mattb and welcome to the forum!

The short recommendation would be your strategy 2) i.e. setting custom hooks for vector representation derived from trf tensors.
This spaCy discussion thread is about exactly the same issue. I would direct you there for a more technical advice on how to achieve that: how to train an NEL model with transformers? · explosion/spaCy · Discussion #7315 · GitHub

1 Like