Yes, your general approach is correct. JSONL is newline-delimited JSON, so to export any data in that format, you can call
json.dumps() and add a
\n to each line. To load it back in, you can read in each line and then call
json.loads() on each line to transform them back to a dictionary.
Alternatively, you can also use Prodigy’s internal helper functions
util.read_jsonl (returns a generator) and
util.write_jsonl. The code looks as follows:
from pathlib import Path
"""Read a .jsonl file and yield its contents line by line.
file_path (unicode / Path): The file path.
YIELDS: The loaded JSON contents of each line.
with Path(file_path).open('r', encoding='utf8') as f:
for line in f:
try: # hack to handle broken jsonl
def write_jsonl(file_path, lines):
"""Create a .jsonl file and dump contents.
file_path (unicode / Path): The path to the output file.
lines (list): The JSON-serializable contents of each line.
data = [ujson.dumps(line, escape_forward_slashes=False) for line in lines]
Finally, there are also libraries that handle JSONL for you and give you more options. This one for example: