jsonl format

ines · August 29, 2018, 11:19am

Yes, your general approach is correct. JSONL is newline-delimited JSON, so to export any data in that format, you can call json.dumps() and add a \n to each line. To load it back in, you can read in each line and then call json.loads() on each line to transform them back to a dictionary.

Alternatively, you can also use Prodigy’s internal helper functions util.read_jsonl (returns a generator) and util.write_jsonl. The code looks as follows:

import ujson
from pathlib import Path

def read_jsonl(file_path):
    """Read a .jsonl file and yield its contents line by line.
    file_path (unicode / Path): The file path.
    YIELDS: The loaded JSON contents of each line.
    """
    with Path(file_path).open('r', encoding='utf8') as f:
        for line in f:
            try:  # hack to handle broken jsonl
                yield ujson.loads(line.strip())
            except ValueError:
                continue


def write_jsonl(file_path, lines):
    """Create a .jsonl file and dump contents.
    file_path (unicode / Path): The path to the output file.
    lines (list): The JSON-serializable contents of each line.
    """
    data = [ujson.dumps(line, escape_forward_slashes=False) for line in lines]
    Path(file_path).open('w', encoding='utf-8').write('\n'.join(data))

Finally, there are also libraries that handle JSONL for you and give you more options. This one for example:

https://jsonlines.readthedocs.io/en/latest/

Topic		Replies	Views
Convert CSV to JSONL usage , solved , streams	25	4819	June 5, 2022
How to creat a jsonl file with a raw text in format of .txt usage , solved	3	709	October 13, 2021
Need to create a jsonl file on python according to certain format usage , third-party	1	810	October 2, 2019
srsly cant read exported jsonl from Prodigy usage , solved	2	444	January 31, 2022
Convert pandas dataframe to suitable jsonl file usage , solved , streams	7	2310	August 5, 2020

jsonl format

Related topics