OpenReplay Logo
12k
12k

JSONL validator

Validate, inspect and split JSONL fine-tuning datasets for OpenAI and Anthropic — per-line error reporting, tokens per example and a one-click train/validation split, in your browser.

Dataset format
Click to upload a .jsonl file, or drag & drop it here
Processed locally
0
Total lines
0
Valid records
0
Errors
0
Duplicates
0
Total tokens

About this tool

JSONL (JSON Lines) is the format fine-tuning APIs expect: one complete JSON object per line, each typically a chat example with a messages array of role and content pairs. A single malformed line can fail an entire upload, so this validator parses your file line by line, reports the exact line number and reason for every error, and pretty-prints each valid record so you can read it.

Beyond syntax, it checks records against the OpenAI and Anthropic fine-tune chat formats, counts examples, shows tokens per example with the shared tokenizer, flags duplicates, and lets you download a randomized train/validation split. Paste text or load a file; large files are parsed in chunks so the page stays responsive.

Your dataset is parsed entirely in your browser — no file is uploaded, which matters when training data is sensitive.

Frequently asked questions

What is the JSONL format?

JSON Lines stores one JSON value per line, separated by newlines. Unlike a single JSON array it streams naturally and lets tools process records one at a time, which is why fine-tuning and logging pipelines prefer it.

What does the validator check?

First that every line is valid JSON, with the line number of any that are not. Then, optionally, that each record matches the chosen fine-tune format — an OpenAI chat example, for instance, needs a messages array whose entries each have a role and content. It also surfaces duplicates and per-example token counts.

Is there a file-size limit?

There is no hard limit, but very large files are bound by your browser's available memory since everything runs locally. The parser processes lines in chunks to keep the interface responsive while it works.

How does the train/validation split work?

It shuffles the valid records and partitions them by the ratio you choose, then offers each set as a separate JSONL download — a quick way to hold out evaluation data before training.