Thanks for the reply, and I confess I definitely don't have as much experience as you all at the processes you describe and so it's possible i am being naive in my expectation that different passes at the same text would be annoying.
More specifically, the task at hand is to annotate customer service chats for making a chatbot. I am building models to do (or at least attempt to) NER for rather finicky things like addresses (extract down to suburb/postcode level), but also more straightforward things like intent (for chatbot flow) and sentiment (just useful to have).
Were the task simply +ve or -ve sentiment I would definitely agree that doing it separately makes sense because you can setup some sweet keyboard shortcuts and you're away, but in this case the intent and extracted entities are likely quite closely tied and so the annotator has to sort of mentally parse those aspects together anyway. For example, if someone is asking for help on a specific booking, that tells you the intent and also you might have to tag a booking number. If the intent is get a quote, you're probably tagging locations as well.
(Just 'for completeness' I did try using existing NER classifiers, including spacy, for the location tagging but unfortunately it wasn't very consistent, not helped by the fact a lot of our inputs come across as uncased which seems to hurt the location tags a lot, understandably - but I've had decent success with a plain CRF with some domain specific features)