Synthetic NER data

I would like to use apache c-take or similar system to generate healthcare concepts
on the input text, focused on healthcare. Once I have the data from these systems,
can I convert it to prodigy format (json or another format) and do db-in and then build a word2vec using terms.train
to have a word2vec specific to healthcare.

In general: yes, there should be no problem with using annotations from a different tool in Prodigy. You can either import them into a dataset, or just make a .jsonl file and use it as the input.

However, I’m not sure a word2vec model is what you want to build. word2vec usually works on raw text — you don’t need any annotations. Sometimes you can benefit from annotations before training word2vec, to learn vectors for longer phrases. Is that what you’re looking to do, or do you have something else in mind?