replace sens2vec with transformer model from hugging face

bayethiernodiop · May 10, 2020, 8:16pm

Hello, I watched this video https://youtu.be/59BKHO_xBPA where Ines explains how to do NER in prodigy. My question is how hard and what would be the steps to replace the sens2vec with a model from hugging face transformers or even any model that can compute embedding for words
prodigy sens2vec.teach ........
Thanks in advance.

ines · May 11, 2020, 9:59am

Hi! That's a nice idea and shouldn't be too difficult to implement.

You can find the source of the sense2vec.teach recipe here. However, I think the terms.teach code might be a better place to start and use as a template, because it doesn't contain all the sense2vec-specific stuff for retrieving vectors keyed by word and tag (POS, entity label).

github.com

explosion/prodigy-recipes/blob/master/terms/terms_teach.py

import prodigy
from prodigy.components.db import connect
from prodigy.components.sorters import Probability
from prodigy.util import split_string, set_hashes
import spacy
from spacy.tokens import Doc
from typing import List


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
    "terms.teach",
    dataset=("The dataset to use", "positional", None, str),
    vectors=("Loadable spaCy model with word vectors", "positional", None, str),
    seeds=("One or more comma-separated seed terms", "option", "o", split_string),
)
def terms_teach(dataset: str, vectors: str, seeds: List[str]):
    """

This file has been truncated. show original

If you use a model via spacy-transformers, the above code may almost work out-of-the-box and you won't have to change much, since the .similarity() methods uses the transformer model embeddings if they're available.

But even if you do decide to re-implement it, the idea is pretty simple:

Keep a "target vector" of the seed terms and the terms you've accepted.
Loop over the vocabulary and compare each term's similarity to the target.
If it's above a certain threshold, send it out. Otherwise, skip it.

bayethiernodiop · May 11, 2020, 11:55am

Awesome, thanks a lot. and btw thanks for the nice feature about relation labelling

bayethiernodiop · May 11, 2020, 3:13pm

I took a look to the pattern.match but it is said in the doc that it only works with token in the vocabulary which means i can't use it since i wanna use this to help on a NER task (many words) before manual labelling. Am i wrong ?

ines · May 12, 2020, 9:01am

I'm not sure I understand the question, sorry! So did you already collect terms using your custom terms.teach recipe, or are you still working on that?

bayethiernodiop · May 12, 2020, 10:52am

No, I can't use terms.teach since it can only handle token in vocabulary, but i have entities that i want to have their "synonyms" and they are spans of words. just as you did in you food NER example with sens2vec.

ines · May 13, 2020, 8:20am

Well, you do need embeddings for those phrases and you need to load those phrases from somewhere and iterate over them. In terms.teach, we're using the model's vocab. In sense2vec.teach, we're using the entries in the sense2vec vectors. So whichever embeddings you're using, you need to extracting potentially similar candidates so you can check the similarity and decide what to suggest.

bayethiernodiop · May 13, 2020, 10:34am

ah I see I just have to modify the code in terms.teach even when using spacy transformers.
Thanks

Topic		Replies	Views
word embeddings from trained NER model? usage , ner , transformers	3	1711	December 13, 2021
Obtain a list of similar words from my own trained model ner , spacy , off-topic	1	492	September 3, 2020
Prodigy sense2vec.teach recipe with gensim w2vec usage , spacy , terms , solved , sense2vec	3	613	March 6, 2021
custom sense2vec usage	5	1443	August 15, 2021
Workflow re: Custom Sense2Vec on New Data ner , textcat , spacy	10	2753	April 20, 2020

replace sens2vec with transformer model from hugging face

Related topics