We are trying to review some audio annotations done by labelers. After encoding our
jsonl file with the right data to
base 64, we ended up with a 5gb encoded
jsonl file supposedly for ~80 videos.
Running this locally with
cat ~/audio_b64.jsonl | prodigy audio.manual rev_audio - --loader jsonl
couldn't load our file for revieweing the annotations, and prompted with this in our terminal:
⚠ Warning: filtered 99% of entries because they were duplicates. Only 1 items were shown out of 77. You may want to deduplicate your dataset ahead of time to get a better understanding of your dataset size.
Any idea what might be the cause of that? What can be an alternative?