ToolJoltTools

Dataset Tokens, Epochs & Steps Calculator

Convert dataset size, batch and sequence length into optimizer steps and epochs — and check repeat-data limits.

Optimizer steps
Epochs over the data
Tokens per step (M)

The data-constrained scaling paper (Muennighoff et al. 2023) found up to ~4 epochs of repetition is nearly as good as fresh data; by 16 epochs, extra passes are almost worthless. The verdict badge applies that finding.

Formula

steps = budget ÷ (batch × seq) · epochs = budget ÷ dataset_tokens — repeated data loses value past ~4 epochs
References: Muennighoff et al. (2023), Scaling Data-Constrained Language Models

About Dataset Tokens, Epochs & Steps Calculator

Training plans live in three currencies — tokens, steps and epochs — and converting between them trips up everyone's spreadsheet. This calculator does the exchange: a token budget divided by your global batch and sequence length gives optimizer steps; divided by dataset size it gives epochs, with a research-backed verdict on whether your repetition count is healthy. The 4-epoch guideline from the data-constrained scaling work is built in, because the most expensive mistake in small-data training is believing the 40th pass still teaches anything.

How to use Dataset Tokens, Epochs & Steps Calculator

  1. 1Enter your values into Dataset Tokens, Epochs & Steps Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
  2. 2The result recomputes live using the formula shown on the page; there is no button to press.
  3. 3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.

Why use Dataset Tokens, Epochs & Steps Calculator?

  • Computes Dataset Tokens, Epochs & Steps instantly in your browser — no sign-up, no upload, no server round-trip.
  • 100% free and unlimited, with the exact formula shown: steps = budget ÷ (batch × seq).
  • Runs entirely client-side, so every value you enter stays private on your device.
  • Live recompute as you type, with a worked example and authoritative references for trust.

Frequently asked questions

How many epochs are safe for LLM pretraining?+

Muennighoff et al. measured it: up to ~4 epochs, repeated tokens are worth nearly as much as new ones; value then decays rapidly and is negligible past ~16 epochs. If your plan exceeds 4, the better spend is usually more data (even lower quality) or a smaller model.

Why do fine-tuning recipes use 1–3 epochs?+

Instruction datasets are tiny and the model already knows language — it only needs the format and behavior. Beyond ~3 epochs, memorization of specific completions sets in (eval loss rises, outputs parrot training examples). Small LR + few epochs is the standing recipe.

Do padding tokens count in these numbers?+

They consume compute but teach nothing — naive padding can waste 30%+ of a 'token budget' on short-sequence data. Packed sequences (concatenating documents to fill the context) make the budget honest; this calculator assumes packed tokens.

How do I pick the token budget itself?+

Start from Chinchilla (20 tokens/param) as the compute-optimal floor and over-train deliberately if the model will be served at scale — see our Chinchilla calculator for the trade-off. Then this tool converts the chosen budget into the steps your scheduler needs.

Related tools

Related ML & AI tools

Sponsored