Slightly complicated NER question

Hi! First of all: Awesome tool!!

I’m trying to tackle a slightly more advanced problem now and I was hoping for some expert advice before I dove in. I’m looking to create a series of custom tags where recommendations/comparisons occur. For example, I may see sentences like:

In the scenario of [Problem], we recommend the use of [tool1] over

Since [tool2] has [some failure mode], [tool1] is typically used for

I would like tag the problem, recommended tool and rejected tool as 3 separate tags. Since the two tools share the same dictionary, I’m assuming that I should just train one NER tag (tool) and use the dependency tools downstream to classify them. Is that a solid approach?

Also, when training multiple NER entities (e.g. Problem and Tool), is there a performance difference between introducing both simultaneously vs training them iteratively (i.e. train the first, then second, then the third, etc.). The tags occur at different frequencies so they won’t be represented equally in the training data.



Glad you like it! I think your instincts about how to approach this sound good. Specifically:

  1. Yes, I would train the classifier for [tool] first, and use the same tag for both contexts. The NER model is best at learning the shape of phrase and maybe a couple of words either side. It’s not so good at things like relationships — that’s what the dependency parser is for. Make sure you use the terms.teach recipe first to make a terminology list, it should be very good for your problem. (See the video tutorial for details: )

  2. I would train the tool first. It’s more efficient for you to work on one thing at a time, and you’ll inevitably evolve your definitions of what you’re tagging over time. You can then have a more solid definition of what you’ll tag as “problem” given your experience on “tool”.

  3. In terms of performance differences, you’ll probably want to merge the datasets before running ner.batch_train on the combined data. We don’t have a tool for this yet because you might have conflicting spans, which is something you’ll have to have your own policy for dealing with.

  4. Try coming up with a rule-based approach using the existing dependency annotation before you dive into training the parser. If you can learn tags for a few categories of words or phrases, you might be able to come up with a nice rule set. For instance, maybe you want to classify the sentence as a “conditional recommendation” (A is better than B if you’re C), etc. There’s a bit of an art to this because you’re trying to come up with the best “good enough” approximation, based on your data.

A general tip: try to avoid making entity or dependency categorisations that can’t be localised to specific words and phrases. For instance, there’s many ways of expressing a recommendation syntactically, so having a dependency pattern cover all of them is going to be difficult to learn. The parser will struggle if you’re using the same tree-structure over very different groups of words. You can read about how the parser works here:

For optimal results in designing your dependency relation, you want to be thinking about the decisions the model will have to make to construct the tree. If you work through that post and can understand the dependency parsing algorithm, imagine you’re the parser making the decision. Give yourself two words of forward context, the stack, and the buffer — and think about how you have to choose the action. What information do you need to consider? If you actually step through this, you’ll get a better sense for what the model will and won’t be able to learn.