Accept or reject partial pattern match under condition?

Let’s say I want to predict addresses, and I have patterns which are a subset of another pattern defined in the same patterns file. For instance:

{"label": "ADDR", "pattern": [{"shape": "ddd"}, {"is_alpha" = true}, {"lower": "street"}]}
{"label": "ADDR", "pattern": [{"shape": "ddd"}, {"is_alpha" = true}, {"lower": "street"}, {"lower": NW}]}

Because both 123 Abc Street and 456 Def Street NW might be valid addresses, the latter when a city has geographic cardinal suffixes. Now suppose I receive a sentence with the latter as a substring, and only 456 Def Street is highlighted, while NW is left out. Do I reject 456 Def Street and accept it at the next iteration when the full span is highlighted, or do I ignore it instead, because the rejection might confuse the model when it comes to predicting instances like 123 Abc Street?

Simply put, the acceptance of the first pattern (or whether or not it’s a partial match) is conditional upon the presence of the suffix (of whether it matches a more detailed pattern type). I also considered writing a separate patterns file for the geographic cardinals, and then combining the address and the geographic cardinal if they are adjacent. Would that be a better approach?

Ideally, you should always reject partial matches – especially if your goal is to train a model later on. Otherwise, you can easily end up with conflicting annotations, or with annotations that express the wrong thing. Rejecting is better than ignoring, because it allows you to define more fine-grained constraints for that particular context. Even if the partial match might be correct in a different context, in this context it's not.

If you accept a partial match on, say, "I live at 456 Def Street NW", the feedback the model gets is "In contexts like this, the analysis ['?', '?', '?', 'B-ADDR', 'I-ADDR', 'L-ADDR', '?']" is definitely correct." That's obviously bad, and you'd rather want to give the feedback ['?', '?', '?', '?', '?', '?', '?'] and in the next step, confirm ['?', '?', '?', 'B-ADDR', 'I-ADDR', 'I-ADDR', 'L-ADDR'] for this particular example and context. Even if for some reason you don't see a suggestion for the correct analysis, the model is still more likely to learn the right thing and end up producing the correct analysis (because you've explicitly rejected the incorrect partial one).

Btw, this thread also has more background on how the sparse annotations are interpreted:

Thanks for the explanation and the link! Will take a look at it.