I am trying to annotate categories of text, as CIPHER or not.
I created dataset named “cipher”, i have my text in csv file. test.csv
i run this command.
prodigy textcat.teach cipher en_core_web_lg reviews.txt --loader csv --label CIPHER
then i open the annotation server and i get this error
ValueError: Error while validating stream: no first example. This likely means that your stream is empty.
That error usually means that there’s nothing to load from the file – either because there’s nothing in there, or because no example of the correct format was found (for instance, if none of the records have a text).
In your example, you’re loading in a file reviews.txt with the CSV loader – are you sure that’s correct? And did you have a look at the README and checked whether your data has the correct format? For CSV, the text should be available in a column “text” or “Text”. For TXT, each text should be on a new line. And for JSON or JSONL, each entry should have a key "text". You can find examples of this in your PRODIGY_README.html.
So I have text data i want to annotate and then classify as text or cipher,
the current format i have now is a csv file,
each row has separate text entry to be annotated or classified
what format should i put my csv file in,
it looks like i need to have (text, label, meta)
but right now i only have the text, and I am trying to build the model to predict the labels.
If you’re using textcat.teach, you’re already passing in the label via the command line: --label CIPHER. So your CSV really only needs to contain one column, text. Alternatively, you could also convert your data to plain text (one text per line) or JSON – just make sure to adjust the --loader argument in that case.