I have already annotated a dataset and exported the annotations using prodigy's db-out command.
Now, I want to increase the size of dataset by programatically augmenting the spans and correspondingly also the tokens as I know the different set of values which my named entities can take.
For this, I need to understand that in the tokens field in the jsonl file exported using db-out, there's a key for every token called ws.
Hi! "ws" stands for "whitespace" and indicates whether the token is followed by a space or not, just like Token.whitespace_ in spaCy. This allows you to reconstruct the original text from the tokens, and it can be used in the UI to display tokens in a more readable way. (If you leave out the key in the data you load it, it defaults to true = followed by whitespace).