- [[python - pandas - json]] # Idea Each line is a JSON object. It's like having a dictionary on each line. It's great for data streaming because it preserves data integrity. Unlike csv files, JSON has universal format specification so it's far less susceptible to parsing issues. ```python # each line is a json object {"name": "Gilbert", "wins": [["straight", "7♣"]]} {"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]} {"name": "May", "wins": []} {"name": "Deloise", "wins": [["three of a kind", "5♣"]]} ``` ## Python reading/writing jsonlines ```python import json import pandas as pd import polars as pl test = [{'a': 'b'}, {'a': 'b'}, {'a': 'b'}] # w+ for write, a+ for append with open("data.jsonl", 'a+') as f: for item in test: f.write(json.dumps(item) + "\n") pl.from_dicts(test).write_ndjson("data.jsonl") # default is read with open("data.jsonl") as f: for line in f: print(line) # string print(json.loads(line)) # string to dict # read pd.read_json("data.jsonl", lines=True) pl.read_ndjson("data.jsonl") # write df.to_json("file.jsonl", orient="records", lines=True) df.write_ndjson("file.jsonl") ``` ## R read jsonlines - [Fast JSON, NDJSON and GeoJSON Parser and Generator • yyjsonr](https://coolbutuseless.github.io/package/yyjsonr/index.html) ```r library(yyjsonr) read_ndjson_file("data.jsonl") ``` # References - [python - Loading JSONL file as JSON objects - Stack Overflow](https://stackoverflow.com/questions/50475635/loading-jsonl-file-as-json-objects) - [How to Love jsonl — using JSON Lines in your Workflow | by Alex Galea | Medium](https://galea.medium.com/how-to-love-jsonl-using-json-line-format-in-your-workflow-b6884f65175b) - [JSON Lines format: Why jsonl is better than a regular JSON for web scraping | by Dmitry Narizhnykh | HackerNoon.com | Medium](https://medium.com/hackernoon/json-lines-format-76353b4e588d) - [jsonlines - Create JSONL with Python - Stack Overflow](https://stackoverflow.com/questions/57071390/create-jsonl-with-python)