documents length and annotation time

I attempted to upload a sample of texts from a csv dataframe (with a limited amount of metadata) through Jupyter Notebook and annotated them with the following command:

! prodigy ner.manual my_set blank:en ./random_directives.csv -- label ENTITY

The documents vary in length but some of them are thousands words long. Unfortunately, after just 10-15 documents prodigy starts to slow down and even briefly interrupt its activities. Is the issue due to the excessive document length? How can I solve the problem, beside reducing it and clean the documents as much as possible?

Prodigy doesn't require any data to be uploaded so when you start the server, the data is streamed in and then saved to the database as annotations come back.

A few thousand words shouldn't be a problem in terms of size – after all, it's just JSON being sent across a REST API. That said, you might want to set the PRODIGY_LOGGING=basic environment variable to see more logging info, maybe this will give you some clues what might take long.

That said, are you sure you want to annotate examples that are thousands of words long? It just makes annotation more difficult because your annotators have to read everything before they can submit a single answer and it takes longer to collect individual datapoints. There's also not really an advantage in annotating really long documents for NER, because your model's context window will always be much smaller. Also see here for background: