Btw, right now I am detecting periods like Q1 2011, Jan-Jun, June - September 2013 as DATE entity. Then I want apply a classifier to determine if the period is QUARTER, HALFYEAR, YEAR etc.
My question is, should I:
preprocess the entity span before applying the classifier, e.g. to change June - September 2013 into June - September since I do not care about the absolute periods? Or can I rely that the classifier will figure that out?
or should I improve my NER model by creating a new label PERIOD entity that would exclude any year-specifics? I suppose I’d have to retrain DATE instead - otherwise I would end up with a lot of overlapping.
Also; should I have the classifier as a standalone model, i.e. not part of the other spacy model?
Are you sure you want a classifier to do things like QUARTER, HALFYEAR etc? The model’s going to have to learn that on a substring basis, it can’t do the maths itself. I think you’ll be better off resolving the dates and implementing the period logic yourself.