@wpm If you segment the document into paragraphs, you can run the NER over all the paragraphs though, right?
I think it makes good sense to use the text classifier during training to find paragraphs with a high enough density of entities to make your annotation effort productive. But at runtime, where you just want the tool to extract entities, you may as well run it over the whole text.
As far as doing joint learning goes: there are a few ways you can do this. One solution would be to share the CNN layer between the NER component and the text classifier. This may or may not help: it does help a little to share the weights between the POS tagger and parser, but the disadvantage is you have to train the two together, which is a pain.
Another way to do joint NER and textcat would be to condition the NER labels on the type label applied to the text. For instance, you might jointly learn role labels for movie reviews with a scheme like POSITIVE_ACTOR
and NEGATIVE_ACTOR
.
While it’s not a joint strategy, a cheap way of including text classification labels as features would be to add the label as a token in the sentence (likely the first token). I doubt this would be very effective, though.