I would do this as a multiple-pass annotation procedure. First label the top-most category, DRUG, and then create a recipe that enqueues the examples of DRUG you’ve annotated, for annotation into the subcategories.
When you do the second pass, you might want to group up the examples by type. If the token “citalopram” is a DRUG in a particular context, it’s probably always going to have the same subtype. So you can save yourself a lot of work by making that decision once, rather than for every occurrence of the phrase.
If you can satisfy your objectives by having subtype schemes that are unambiguous, that will make both the annotation and the machine learning much easier: you can deal with ambiguity once, at the top-most category that the NER model deals with. Then you have manually vetted dictionaries that map common entities to your subtypes.
Finally, you might use the word vectors to resolve any entities the model has recognised that aren’t in your dictionaries. You would have a prototype vector for “antidepressant” made by averaging the vectors of your antidepressant terms, and another prototype vector for “sedative”. Creating the prototype is as simple as making a
Doc object with all the terms in that category. So you would have something like
antidepressants = Doc(words=['citalopram', 'lexapro', ...]), and then you’d ask