I am using Prodigy to build a model that classifies the scholarly literature on COVID. My initial task is quite straightforward -- to separate empirical publications (i.e., actual science) from non-empirical publications (e.g., personal essays, position papers, critiques). My collection is around ~275k, so this is definitely an NLP task. I'm really impressed with the results, so far -- and, of course, the software!
Here is my question: I'm using the
en_core_web_lg word vectors in the model. I would actually like to try using the word vectors trained on the PubMed research, which I can obtain here:
Here is the file manifest:
I'm interested in trying any of the .bin models, but am not having luck getting them into my model. Is there any special configurations I need to do after obtaining these files? These are Word2Vec models. Is there some documentation I missed?