Feature Request: Option to skip the first N samples

It would be nice to have a skip option to ignore the first N samples of the data. This is especially useful when we have to restart a recipe with the same dataset (after updating code, crash, etc.)

Something like:

skip = int(os.getenv('SKIP', 0))
if skip > 0:
    for idx, _ in enumerate(stream):
        if idx == skip - 1:
            print('Skipped {} examples from source'.format(skip))
            break

Another option would be to dedupe the examples between DATASET and SOURCE so we don’t duplicate labels (as an option since you might want multiple labels per sample).

Thanks