I am a completely beginner in A.I. I have what a think is a basic question:
What size should be the raw text I pass to prodigy?
I am trying to create a A.I that is going to ready a PDF and then it is going to identify the company that has created that document.
Should I pass an entire PDF text as a unique "raw text" or should I colect some raw text from that PDF?
I am asking it because when I run ner.manual, it collect line by line as a raw text and that thought comes up in my mind: Is it better to pass an entire document text in a unique line or should I collect some raw text into that document?
The main thing that's important is that your training and runtime inputs should match. So if you're training on single pargraphs, your model should also be run on single paragraphs. But if you have control over the preprocessing, that's usually no problem.