Cool I’ll take care of the preprocessing. However should I include a whole earnings report for each row in JSONL? I’ve just noticed that you recommend only giving it small phrases so I’m not entirely sure how to chunk my earnings report into a training JSONL dataset. The reports also include some markup tables and I suppose I should handle these without spaCy. I imagine having a pipeline where I first do some “document layout analysis”. Send some of the document to spaCy and the rest to a table parser.