Use Case Feasibility

Krish · July 26, 2019, 3:42pm

Hi,

I am currently working on building a NER based system which detects SSN, account_ids in free form text fields (unstructured text ).
I have manually annotated around 1000 sentences with tags and I am planning to use this data set for training the other huge chunk of 50000 sentences using prodigy.
q1) is it possible to create two new entity types using the data set containing the 1000 sentences ?
q2) If yes what would be the ideal workflow ?
q3) Is the prodigy model capable of picking the context from sentences (training data )?

Thanks

ines · July 28, 2019, 4:15pm

Yes, you can definitely try to train that. You can pre-train a model using the 1000 examples you already have, and then improve it with the other unlabelled examples you have. For instance, you could use a recipe like ner.make-gold to see the model's predictions and correct them by hand.

You probably want to train a new entity recognizer from scratch instead of updating a pre-trained model, because you'll likely see a lot of conflicts between the pre-trained entity types for numbers etc. and the types you want to add. Teaching a pre-trained model a completely new definition with such little examples is really tricky and you'd always be fighting the existing weights.

You might also want to try augmenting your model with rules to improve the runtime accuracy.

In general, predicting spans of tokens based on the context is one of the key benefits of NER, yes

Prodigy itself only ships with the annotation models – not the actual models you're training. The built-in recipes use spaCy for that, but you can also bring your own. spaCy's NER models (like many other similar implementations) are sensitive to the very local context, i.e. the surrounding words.

Topic		Replies	Views
spaCy, prodigy, annotation usage , ner , solved	2	722	February 8, 2019
Improve trained models with annotations usage , ner , training	3	520	September 20, 2021
Work Flow for extending an NER model with new entity types ner , best-practices	1	1426	June 1, 2019
Transfer Learning for NER usage , ner	6	2508	May 24, 2021
Train NER model to improve existing entities spacy vs prodigy ner , spacy	1	953	December 9, 2019

Use Case Feasibility

Related topics