Apologies if this is a duplicate of another question-- I've searched the forum and can't seem to find an answer that applies to this situation!
We're using prodigy's ner.manual recipe to annotate a set of 17,000 texts (some a few sentences and some a few paragraphs) for a new label with a blank:en model. We've had a couple of fits and starts-- perfectly normal-- as we adjust our annotation strategy, patterns, and the set of texts that we're using in jsonl. As a result, we've done a bit of db-in-ing, -out-ing, and db-deleting, and are ready to get going again.
Our use case involves annotating sets of texts in a specific sequence, but after regenerating the jsonl with the texts grouped in desired order, it appears that Prodigy always starts at a random line. Some of our db futzing was an attempt to clear what we thought might be an index cache at which Prodigy begins serving texts.
- Is this how the loader for jsonl is intended to perform (loading at what we perceive to be a random jsonl line)?
- I've tried a time or two to modify the ner.manual recipe to turn the generator into a list, but that results in hanging at startup because I assume that prodigy/spacy are trying to tokenize everything at once-- so I think that wouldn't solve the problem/is not practicable. Would using a different loader make a difference to the order in which texts are presented?
Thanks, as always, for your excellent work!
Edit: I used my special reading eyes on the loader documentation and will see if I can whip up a custom loader to do what I need!