ner.manual skips 10 lines in text file when browser is refreshed

ines · September 27, 2018, 8:47am

Yes, this is currently expected, because on each load, the app makes a request to the server and asks for the next batch (by default, the batch size is 10). The annotated tasks are sent back to the server periodically, so when a new batch is requested, Prodigy can't yet know whether a question that was previously sent out was already annotated or not. (Annotating all sentences / examples is also a pretty specific goal that only applies to some use cases and data streams.)

If it's important to you that all sentences are annotated, and you do want to handle cases where the annotator refreshes their browser, you ideally want to reconcile the questions/answers at the end of a session, and compare the _task_hash to find examples in your data that you don't have an answer for in the dataset. You can either do this in a custom recipe within the stream generator, or as a separate session that you run after the previous one finished.

Prodigy is very agnostic to what existing annotations in a dataset "mean". But you can tell it to explicitly ignore identical questions that are already present in one or more datasets by using the --exclude option – for example, --exclude dataset_one,dataset_two.

Topic		Replies	Views
Basic question about batch persistence usage	2	752	October 9, 2019
Missing data usage , solved	5	786	October 15, 2020
Annotation tasks finish even when more samples are in the jsonl dataset usage , solved , streams	5	445	April 8, 2022
Losing tasks while reloading page. usage	2	698	October 15, 2018
Edit saved annotations ner , solved	4	1372	March 2, 2018

ner.manual skips 10 lines in text file when browser is refreshed

Related topics