If you know that all your data is tabular, and you just want to annotate the text contents, you should probably develop a custom recipe. You should also probably not try to train a model using the standard components like the text classifier or the named entity recognizer, as your data isn’t primarily textual, and you’ll probably be better off using other approaches.
If you’ve got a lot of these tables to annotate, you probably want to check just how many total field names you have. I expect you can probably assume that if you’ve mapped
Operating income (loss) to
EDIT once, that’s always going to be the name. I’m sure you’re not going to hit one exceptional table where
Operating income (loss) instead needs to be annotated
Even if you have hundreds of thousands of these tables to annotate, if you extract the unique text field names, you might find you only have like 3000 of them. If so, you probably want to do context-independent annotations, and just hit “ignore” if you do hit a case where it’s unclear.
Finally, I wouldn’t worry about stuff like extracting SEK as currency and billions as the denomination in the annotations. You should just have a rule-based process for that.