I am looking to train a custom semantics dependency parser but I am not sure if I can specify relations between phrases instead of single words. All the example i have come across annotate dependencies at a word level. I am guessing that merging the phrase into a single token object is not a good idea in my case because the phrase can contain some terms which will hit in the embeddings model and help with the ‘embed’ stage of the model training process using the already existing word2vec model. However the phrase as a single token will not hit anything in the word2vec model I have.
In summary, what is the format i should use to specify relations between multi word phrases for training dependency parse. Or am i missing something Thanks a lot!
PS: Some related info (not question). My plan is to use the mark recipe in prodigy with my custom code driving the phrase identification to get enough lines tagged to get over the cold-start problem and then use the dep.teach recipe after that. At this point i think (and hope) that should work.