Yes, exactly. The main idea here is that you want to get over the cold-start problem (where the model knows nothing), pre-train it so it predicts something and then use the existing model's prediction to collect better annotations to improve it, update the improved model with more examples, and so on.
You might have to experiment with a few different approaches to find out what works best. Maybe it makes sense to start off with annotating a few hundred examples by hand to give the model something to learn from, and then move on to binary annotation. Maybe it works best to use match patterns to suggest candidates in context and accept/reject them. Hopefully Prodigy makes it easy to run those experiments and quickly try things out.
I think you might want to generalise even one step further and label things like PROGRAMMING_LANGUAGE
or PROGRAM
. Predicting those basic "categories of things" based on the local context is often easier than trying to encode too much subtle information at once.
If you've trained a model that's good at predicting things like programming languages, software/tools and other things people might put on their CVs, you can then move on to the next step and decide whether those entities are in fact skills – e.g. by looking at their position in the document.
I discussed some ideas for an information extraction project on company reports in this thread - maybe some of these could be relevant to your project as well:
Btw, if you've browsed the forum, you might have seen this already – but if not, I'd definitely recommend checking out @honnibal's talk on solving different NLP problems and designing label schemes. The example around 11:38 is especially relevant: