prohibit_action() is experimental, but it should work. What happened when you tried it?
I think NER is probably not a great fit for the type of task you’re doing. The NER model reads the sentence left-to-right, so if the context that disambiguates whether the entity is a start or end date occurs after the entity, the model will have no chance to get it right.
I think you should separate the task into two processes: one to recognise the dates, and another to recognise whether it’s a start date or an end date. If you have a lot of dates which are neither, you probably want to have a sentence classification model that tells you whether the sentence is relevant or not.
The design of how the date recognition and role classification components should work is an open question as well. You should probably start by just doing
ner.manual annotation for a small evaluation set. This way you can get some experience with what’s common in the text, and at the end of it you’ll have a pilot evaluation set. Probably the most efficient next step would be to write matcher patterns, and evaluate them on date recognition against your manual annotations. You might also find it useful to develop rules for whether it’s a start or end date as well.
When considering whether to create a rule-based solution or a machine learning-based one, it’s worth imagining the different effort vs efficacy curves the two approaches might have on your problem. It’s usually the case that eventually a machine learnt solution will overtake a rule-based one, and on some problems, the machine learning solution pretty much dominates. For instance, if you’re classifying news articles by topic and spend five minutes annotating, you’ll almost certainly do better than spending five minutes making up rules. But on many problems, this isn’t the case. For date recognition within a specific domain, you’ll probably do better with rules until you have quite a lot of annotations. If the texts are well edited, the rules might actually be perfect, as you can exactly reverse engineer the rules which went into generating the text.
If rule-based systems are good early on in your problem, it’s still worth creating them as a bootstrapping process. You can use the rule-based system to help you annotate, making it much quicker to get to the point where the machine learning solution can take over. The key to doing this effectively is to switch between making evaluation data and doing the system work (on either the training data, rules or hyper-parameter selection). This way you can known whether you’re moving in the right direction, and you can figure out what to do next, and when to switch tactics.