I’m starting with Spacy + Prodigy and natural language processing. By the moment I need a very easy task but, to be honest, it is taking too much time. This is the thing:
- I have a list of ~3000 pharmaceutical active ingredients.
- I have a lot of clinical notes from several hospitals.
- I must build a report of the pharmaceutical active ingredients included in the clinical notes.
At the moment, I’m trying to create a new entity “Pharmaceutical Active Ingredient” and train Spacy to learn all of them. But I’m not sure if this is the right way, as what I need to detect is the exact name of the pharmaceutical active ingredients, and maybe the right way could be a match process.
On the other hand, the clinical notes texts are the result of an OCR process over real scanned clinical notes, so the NLP process must be tolerant to, for example, mismatching characters in the name of an active ingredient.
I bought a Prodigy license as I thought this software was the right way to train Spacy to detect the active ingredients, but now I’m a bit lost.
I would really appreciate your help in this issue.
Thanks in advance and best Regards,