Adding proper nouns of people to existing model

Hi,
I am a new to NLP and prodigy.
I have few thousand documents with the title of the document as the person name. I need to extract the name of the person from the document. I have written the code and it is giving me about 60% accuracy as the names are Indian.
How do I feed the model with the Indian / Asian names.
I read a lot of examples where in one should feed in sentences and then annotate them.

More over, in case we have to train the system using the sentences, is there a specific format which I can generate (from DB) and just load it with the existing model.
Secondly, how do I update my existing model with new values?

Thanks,

Hi @tushar, welcome to Prodigy!

How do I feed the model with the Indian / Asian names.
I read a lot of examples where in one should feed in sentences and then annotate them.

When annotating names, it is important that you must annotate them in context. The same even goes when labelling them. So you can do these in many ways:

  1. You can use patterns to pre-annotate or highlight the entities to help you in annotation. For example, you can check for titles (Dr., Mr., Ms., Ph.D., M.D.) or reference from a gazetteer.
  2. If the name exists in a sentence, perhaps you can also use other parts of speech to aid you. For example, a PERSON entity can be the doer or receiver of an action.
  3. If your documents follow a particular format (it's an application form, etc.), perhaps looking at a particular portion of that document will help (or perhaps just looking for particular keys "Name" ,"First Name", etc.)

More over, in case we have to train the system using the sentences, is there a specific format which I can generate (from DB) and just load it with the existing model.

Not entirely sure what you meant by this, but it's possible to train directly from prodigy. You can reference the annotated dataset for that command.

Secondly, how do I update my existing model with new values?

If you want to add new annotated data, you can either (1) resume the training from the new model with new data, or (2) merge the new and old dataset and train the NER model from scratch.