Parsing/Identifying sections in job descriptions

Hi! I remember answering some questions about a job description project a while ago, so maybe this thread might be useful, too:

By definition, a named entitiy is usually a "real world object" – like a person, an organisation, a product or other distinct names like that. This is also what the underlying statistical model is optimised for. If your goal is to label longer phrases or even paragraphs, this is not typically an end-to-end problem for named entity recognition.

Instead, your task might be a better fit for a combined approach using the entity recognizer to label more generic entity types, the dependency parse to select the whole phrase you're interested and maybe the text classifier to assign top-level topics to the extracted paragraphs.

I explain this idea in more detail in this thread, which should be pretty relevant to your use case as well:

Yes, that's one workflow we've built in. You can also use Prodigy to label your data 100% from scratch and without a model in the loop. The data can be exported in a straightforward JSONL file, so you don't have to use spaCy and can use the annotations with any other tool or process.

In general, Prodigy's philosophy is to make annotation faster and more efficient by breaking larger tasks down into smaller decisions and automating/scripting as much as possible. Another thing we advocate for is running smaller experiments and iterating on your data. NLP is pretty experimental so you just need to try lots of stuff. For example, you might want to try out different labelling strategies to see which one is most promising. Can you teach the named entity recognizer your new entity definition, or does it make more sense for a text classification task? Do fine-grained or more generic categories work better? Is the data suitable and does the model improve if you add more data from source X? If you can just try these things quickly by labelling a few examples and running a few experiments, you'll ideally spend less time shooting in the dark and are able to focus on the most promising solution quicker.

Btw, if you haven't seen it yet, you might also want to check out our prodigy-recipes repo, which shows some examples of how you can script Prodigy to build different annotation workflows and do your own automation:

That said, if you do find that Prodigy just isn't the right tool for you, we're happy to issue a refund. It's a pretty specific developer tool for a pretty specific use case, and we believe in keeping the scope focused. So Prodigy can't be the right tool for everything and everyone – and that's okay :slightly_smiling_face: