I’m also interested in training NER and text classification on the same corpus so this thread is relevant to me. However, I may have a different use case than @cmtru because I want to do joint learning of these tasks.
As I’ve already described in other posts, I’m trying to do NER on documents, but the documents are tens of pages long. The context is likely too long for for CNN or LSTM type methods to be effective, so I need to find a way to segment the documents into smaller pieces.
Luckily the entities I’m trying to extract appear in contexts about a paragraph in length. So if I can find the right “paragraph of interest” I can do a good job of extracting entities from it. These paragraphs of interest are themselves variable in form, so it’s a machine learning task to distinguish them from the other paragraphs in the document.
I’ve been framing this as a two stage process. First a binary text categorization model identifies the likely paragraphs of interest, then an NER model extracts the entities from those paragraphs. Both the text categorization and NER models are trained using Prodigy’s standard active learning techniques. (You and @honnibal have been helping me find a way to seed this process with phrases instead of just words.) And if I want two separate models, the reasons you give earlier in this post for training them separately make sense.
However, it seems like joint learning might be more effective. Instead of using the text classifier to make a hard decision about whether to examine a particular paragraph, it should merely contribute a probability. Likewise, the presence of the named entities I’m looking for can be a clue that the paragraph that contains them is one I care about. Basically I have two separate but related kinds of signal, and I want to combine them, both at runtime and during Prodigy’s active learning loop.
I don’t think Prodigy/spaCy is set up to do this kind of joint learning out of the box. Even if you have both NER and text categorization pipelines in the same model, the NER model doesn’t incorporate the textcat labels (as far as I can tell from watching @honnibal’s video tutorial about the NER model), and the text categorization model doesn’t take labeled NER spans as features. Am I correct about this?
I think if I want to do this kind of joint learning I have to write the model myself. Maybe use spaCy to extract features and then write my own CNN or LSTM in Keras that does NER with an additional paragraph-of-interest feature. Or maybe find a way to reframe paragraph detection as an attention mechanism. This seems doable, and because spaCy/Prodigy is a pluggable architecture, I’d be able to incorporate it, but it still seems like a lot of work, so I’m wondering if there’s some easier way to accomplish the task already built into these tools. (Like if I just attached a contained-in-a-paragraph-of-interest probability to each token as a feature, would that fold the segmentation signal into an NER model? Or is this a job for a multitask objective?)
Do I have to roll my own joint learning system, or is this capability already built into spaCy/Prodigy in a way that I’m just overlooking?