- [[python - pandas - json]]
# Idea
Each line is a JSON object. It's like having a dictionary on each line. It's great for data streaming because it preserves data integrity. Unlike csv files, JSON has universal format specification so it's far less susceptible to parsing issues.
```python
# each line is a json object
{"name": "Gilbert", "wins": [["straight", "7♣"]]}
{"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]}
{"name": "May", "wins": []}
{"name": "Deloise", "wins": [["three of a kind", "5♣"]]}
```
## Python reading/writing jsonlines
```python
import json
import pandas as pd
import polars as pl
test = [{'a': 'b'}, {'a': 'b'}, {'a': 'b'}]
# w+ for write, a+ for append
with open("data.jsonl", 'a+') as f:
for item in test:
f.write(json.dumps(item) + "\n")
pl.from_dicts(test).write_ndjson("data.jsonl")
# default is read
with open("data.jsonl") as f:
for line in f:
print(line) # string
print(json.loads(line)) # string to dict
# read
pd.read_json("data.jsonl", lines=True)
pl.read_ndjson("data.jsonl")
# write
df.to_json("file.jsonl", orient="records", lines=True)
df.write_ndjson("file.jsonl")
```
## R read jsonlines
- [Fast JSON, NDJSON and GeoJSON Parser and Generator • yyjsonr](https://coolbutuseless.github.io/package/yyjsonr/index.html)
```r
library(yyjsonr)
read_ndjson_file("data.jsonl")
```
# References
- [python - Loading JSONL file as JSON objects - Stack Overflow](https://stackoverflow.com/questions/50475635/loading-jsonl-file-as-json-objects)
- [How to Love jsonl — using JSON Lines in your Workflow | by Alex Galea | Medium](https://galea.medium.com/how-to-love-jsonl-using-json-line-format-in-your-workflow-b6884f65175b)
- [JSON Lines format: Why jsonl is better than a regular JSON for web scraping | by Dmitry Narizhnykh | HackerNoon.com | Medium](https://medium.com/hackernoon/json-lines-format-76353b4e588d)
- [jsonlines - Create JSONL with Python - Stack Overflow](https://stackoverflow.com/questions/57071390/create-jsonl-with-python)