I’m listening for hacky suggestions, thank you. I don’t think I can train a good model without using additional features.
My overall challenge are outlined here. For the training part I am considering chunking every html content into a json line. But for the final model I will keep the whole document into Doc
. Does that sound reasonable?