ner.manual order of texts

ines · December 28, 2019, 12:32pm

Hi! The JSONL loader will load the file line by line and send out the examples in order. There's no magic going on here, and it really just iterates over the lines.

Prodigy will skip duplicates and examples that are already in the dataset – so if you're using the same dataset for all your experiments and you annotate the first 20 texts in the first run and start the server again, Prodigy will resume at text 21. That's typically the desired behaviour, since you want to start where you left off and not repeat any examples. If you want to start at the beginning again, the easiest and cleanest solution would be to use a fresh dataset.

If you want to make sure that Prodigy only sends out a new batch of questions when all previous questions have been answered, you can also set "force_stream_order": true in your prodigy.json. By default, if you open up the app twice on two different devices, you'd get two different batches of examples: the first one, and the second one. Prodigy will then wait to receive the answers. With "force_stream_order": true, Prodigy will keep sending the first batch until it has received the answers and then move on to batch 2. This can be relevant if the order of quesions matters a lot, and you don't want it to be disrupted if the user refreshes the app. Just make sure you only have one user per session then – otherwise, you'll end up with duplicates.

Yeah, if you call list, you're evaluating the whole generator and are essentially tokenizing and loading 17k texts into memory. That's one of the main reasons we chose JSONL as the default file format: it can be read in line by line and using generators, you can process the inocming texts in smaller batches, perform potentially expensive preprocessing and respond to outside state (like an updated model).

Topic		Replies	Views
Cant load pre-annotated ner jsonl usage , ner , solved	8	1182	June 24, 2020
HTML to jsonl and NER task workflow usage , ner , solved	6	851	July 19, 2019
Non-random batches across Annotators usage , front-end , multi-user	1	444	October 3, 2022
ner.manual skips 10 lines in text file when browser is refreshed usage , front-end , solved	8	1310	September 28, 2018
Missing data usage , solved	5	786	October 15, 2020

ner.manual order of texts

Related topics