Hi! It's difficult to give good advice here because it really depends on the documents, the types of data you want to extract and the features you want to use in your model. This will also inform how you set up the annotation task in the end. If you're working with regular text, a pipeline of OCR / text extraction plus a regular NLP model, e.g. a text classifier, may work fine. If you're working with PDFs with more complex layouts, framing it as a computer vision task may be a better option.
For some general discussion around working with text-based images, you might also find this thread interesting: