I am dotnet c# developer. I have a requirement in my project where I have to remove all PII information from text inside a .doc/.docx file (Patient notes of GP practice).
Vehicle Identification etc.
I learned python and tried spacy. Its good but then I want to add custom entities like street names, building names etc. I am really new to python and
data science so am not sure what will work best for me. I looked at prodigy document for creating my own dataset but then it's not free and even if I buy
I am not sure how useful it is in my case.
Can somene please guide me on this.