ner.teach does not suggest multiple tokens

I think the problem here is that none of your patterns ever match – so all you get to see is the model's suggestions, which are completely random because it has no idea of your label "aliases" yet. The token based patterns describe one token per dict – so in the example I quoted above, spaCy / Prodigy will be looking for one token whose lowercase text matches "the existing 2021 notes", which will obviously never be true, because that string consists of 4 tokens.

Instead, you could phrase the pattern like this:

{"label": "aliases", "pattern": [{"lower": "the"}, {"lower": "existing"}, {"lower": "2021"}, {"lower": "notes"}]}

Also keep in mind that the idea of the patterns is to write "patterns", i.e. abstract descriptions of the tokens. This pattern here will match the exact string "the existing 2021 notes" – but unless this is a super common phrase in your data, it likely won't produce good results.

Instead, you could take advantage of the other token attributes accepted by the Matcher – for example "is_digit": true to match tokens like "2021", but also "1999" or "10". Or "like_num": true, which would match both "10" but also "ten".

{"label": "aliases", "pattern": [{"is_digit": true}, {"lower": "notes"}]}

To test your patterns interactively and check whether they match the way you expect them to, check out our interactive matcher demo:

Finally, I'm not 100% sure the entity definition you're going for here makes sense. Named entities should be internally consistent categories of "real world objects" or concepts, ideally even proper nouns. In your case, the patterns describe pretty long phrases and sentence fragments. Teaching the existing model that sort of definition will be really difficult.

Instead, you might want to consider focusing on improving the existing predictions of the smaller components and then using rules or the dependency parse to resolve the rest of the phrase (if the desired result is "the eixsting 2021 notes"). For example, the model already has a pretty solid definition of DATE and ORDINALnumbers. So instead of trying to teach it a completely different analysis, you could work on improving these predictions and ideally, also the parser on your specific data. You can then use the dependency parse to get the rest: "2012" refers to "notes" and the head of this phrase is "existing", and its article "the". This is a much better approach than framing this as a named entity recognition task.

These threads goes into more detail on statistical predictions vs. rules:

Also separately linking @honnibal's talk on how to define NLP problems and solve them through iteration. It shows some examples of using Prodigy, and discusses approaches for framing different kinds of problems and finding out whether something is an NER task or maybe a better fit for text classification, or a combination of statistical and rule-based systems.