Help with building NER for job descriptions

ines · December 17, 2018, 2:09pm

Yes, exactly. The main idea here is that you want to get over the cold-start problem (where the model knows nothing), pre-train it so it predicts something and then use the existing model's prediction to collect better annotations to improve it, update the improved model with more examples, and so on.

You might have to experiment with a few different approaches to find out what works best. Maybe it makes sense to start off with annotating a few hundred examples by hand to give the model something to learn from, and then move on to binary annotation. Maybe it works best to use match patterns to suggest candidates in context and accept/reject them. Hopefully Prodigy makes it easy to run those experiments and quickly try things out.

I think you might want to generalise even one step further and label things like PROGRAMMING_LANGUAGE or PROGRAM. Predicting those basic "categories of things" based on the local context is often easier than trying to encode too much subtle information at once.

If you've trained a model that's good at predicting things like programming languages, software/tools and other things people might put on their CVs, you can then move on to the next step and decide whether those entities are in fact skills – e.g. by looking at their position in the document.

I discussed some ideas for an information extraction project on company reports in this thread - maybe some of these could be relevant to your project as well:

Btw, if you've browsed the forum, you might have seen this already – but if not, I'd definitely recommend checking out @honnibal's talk on solving different NLP problems and designing label schemes. The example around 11:38 is especially relevant:

Topic		Replies	Views
Annotating custom entities in job descriptions usage , custom , hr	9	1115	June 2, 2019
NER - Multi-entity and proper use of datasets ner , database , best-practices	2	1948	February 7, 2019
Multi-word entity seeding, entity context usage , ner	19	3827	November 1, 2019
Web NER usage , ner , solved	2	448	August 11, 2018
Name Entity Recognition Workflow usage , ner	2	426	February 19, 2020

Help with building NER for job descriptions

Related Topics