What is a historian in industrial automation?

A historian is a specialized time-series database designed to store and retrieve high-volume process data from SCADA systems. It uses deadband compression to efficiently store millions of data points while preserving trends and anomalies.

What is deadband compression?

Deadband compression is a storage optimization technique used by historians that only records a new value when it deviates from the last stored value by more than a configured threshold, reducing storage requirements by 90% or more.

Aevus Learn · Industrial Data · 9 min read

What is a Historian? The time-series database that runs every plant you've worked at.

Every alarm investigation that starts with "what was happening at 2:47 a.m. last Tuesday?" is a query against a historian. Every regulatory report, every operations review, every "why did the compressor trip" post-mortem — same. Historians are the most-used and least-discussed piece of the industrial data stack. Here's what they actually are, what makes them weird, and why you can't just put your tags in Postgres.

Aevus / Intrepid LogicIntermediateFor engineers · data architectsUpdated 2026-05-21

The 30-second version

A historian is a specialized time-series database for industrial process data. It ingests tag values (pressure, temperature, flow, valve position, motor speed — whatever the plant produces), stores them with timestamps, compresses them aggressively, and serves queries optimized for "show me this tag's values between time A and time B, possibly aggregated, possibly with deadband filtering."

Conceptually it's "Postgres for process tags." Mechanically, it's optimized for a wildly different workload than Postgres, which is why the dedicated category exists.

The one-sentence test: if you've ever asked "what was tank-7 level at 14:32:18 on March 5th, and what was the average for the hour before?", you've queried a historian. If you've ever generated a regulatory report rolling up daily production averages over a year, that's millions of historian queries underneath.

Why "just use a regular database" doesn't work

Process tags look deceptively simple. Each tag is a (timestamp, value, quality) tuple. Why not put them in Postgres or MySQL? Three reasons:

1. The write volume is enormous

A medium-sized industrial plant has 50,000-200,000 tags. Polled every 1-5 seconds, that's billions of data points per year. A modest oil & gas operator with 200 wells, each averaging 100 tags, sampled every second, produces ~600 billion data points per year. General-purpose RDBMS write throughput collapses at that scale; specialized time-series engines and aggressive compression are structural requirements, not optimizations.

2. The compression opportunity is enormous

Process variables don't change every sample. A tank level might be 8.42 meters this second and 8.42 meters next second and 8.42 meters for the next five minutes. A historian uses deadband compression (only record values that exceed a configured change threshold) and swinging-door algorithms to preserve the shape of trends while reducing storage by 50-100×. Regular databases store every sample. Historians don't.

3. The queries are temporal-first

Almost every historian query has a time-range predicate (WHERE ts BETWEEN ... AND ...) and operates on a small number of tags. Historians physically organize storage by tag-then-time. Reading "all values of tag X between two timestamps" is essentially a streaming sequential read — orders of magnitude faster than a B-tree index scan against a row-oriented table.

The historian landscape — who runs what

Product	Vendor	Notes
PI System (PI Server)	AVEVA (formerly OSIsoft)	The industry default. Decades of installed base. Sophisticated tag-metadata model (Asset Framework). Premium pricing.
Wonderware Historian	AVEVA	SQL-Server-backed historian. Common in plants running AVEVA SCADA stack.
FactoryTalk Historian	Rockwell Automation	Bundled with FactoryTalk SCADA; OEM'd from PI under the hood for many years.
Ignition Tag Historian	Inductive Automation	Native module on Ignition platform. Pluggable storage (Postgres, MS SQL, MySQL, Oracle). Modern, growing fast.
InfluxDB	InfluxData	General-purpose time-series database. Common when IT teams own the historian.
TimescaleDB	Timescale	Postgres extension for time-series. Easier path when ops teams already speak SQL.
AWS Timestream / Azure ADX / GCP Bigtable	Hyperscalers	Cloud-native time-series. Increasingly common for greenfield analytics workloads, often alongside an on-prem historian rather than replacing it.
Sparkplug-on-MQTT	Open	Not strictly a historian — but a transport pattern that pairs with cloud-native time-series for modern IIoT.

"What historian should we run?" is one of the most context-dependent questions in the industrial data stack. Answer is shaped more by which SCADA platform you've committed to and which team operates the database than by feature parity — most modern historians are functionally equivalent for the core workload.

The data model — what's actually stored

Every historian has the same conceptual model, called slightly different things by different vendors:

// One row per tag in the tag metadata table

{

tag_id: "TANK7.LEVEL",

description: "Crude Oil Tank 7 Level",

units: "meters",

data_type: "float",

scan_rate: 1.0, // seconds

deadband: 0.05, // only store if change > this

eng_low: 0.0, eng_high: 15.0,

retention_days: 3650 // 10-year retention

}

// Billions of rows in the value table

{ tag_id: "TANK7.LEVEL", ts: "2026-05-21T14:32:18.42Z", value: 8.42, quality: "GOOD" }

The quality flag is non-obvious to newcomers but operationally critical. Values come back as GOOD, BAD, UNCERTAIN, or specific failure modes (sensor failure, communication failure, manually overridden). A historian that returns 8.42 without indicating that the sensor has been stuck for 3 days reads worse than no historian at all — it produces a credible-looking number for a dead sensor.

The queries that actually matter

Historians live or die by how well they answer five query patterns. Every operations engineer ends up writing some version of all five:

1. Point-in-time

"What was tank 7 level at 14:32:18 on March 5th?" — Simplest query. Historian returns the single value (or interpolated between two surrounding samples). Used in incident forensics.

2. Range

"Give me all values of tank 7 level between two timestamps." Used for trending and visualization. Typically returns post-compression values (which is what was actually stored) — not every original sample.

3. Aggregate

"What was the average / max / min / standard deviation of tank 7 level for each hour over the last 30 days?" Used in reporting and SPC analysis. Historians can compute these without re-reading every sample because they keep pre-aggregated indices.

4. Multi-tag time-aligned

"Get me tank 7 level and tank 7 inlet flow over the last 24 hours, both sampled at 1-minute intervals, time-aligned." Used in correlation analysis. The "time-aligned" part is harder than it looks — different tags have different scan rates and deadband compression, so the historian has to interpolate.

5. Event-based

"When did tank 7 level cross 12.0 meters in the last week?" Used in alarm review and event detection. Historians sometimes have a separate event/alarm log; the query is typically against that, not against tag history.

Compression, retention, and tradeoffs

Three knobs every historian deployment tunes, with operational consequences:

Deadband (per-tag)

How much a value must change before it's recorded. Set too tight, and you're storing samples that don't differ. Set too loose, and you lose excursions that matter. The right deadband is per-tag and depends on the sensor noise floor: a pressure transmitter with 0.01 PSI noise should have a deadband of at least 0.02 PSI, but the temperature sensor next to it might be 0.5°F.

Compression algorithm

PI's swinging-door algorithm and InfluxDB's Gorilla compression are the two most-cited. Both preserve trend shape (the curve still looks right when plotted) while throwing away samples that don't add information. Modern systems compress process data 10-50× without operational loss.

Retention policy

How long to keep data, at what fidelity. A typical industrial site keeps:

Last 30 days at full sampled resolution.
Last 2 years at hourly aggregates.
Last 10+ years at daily aggregates for regulatory / engineering review.

Some sectors mandate longer retention (pipelines under PHMSA, electric under NERC CIP) — and audit-grade unchanged retention, which means write-once storage, hash-chain verification, and off-site immutable copies.

How Aevus reads historians

Aevus does not run its own historian. The customer's historian is the source of truth for historical process data, and Aevus subscribes to it as a read client. Three patterns:

Direct historian client. Aevus's edge collector connects to PI, Ignition, Wonderware Historian, or InfluxDB as a read-only client. Subscriptions configured per-customer for the specific tags / asset hierarchy in scope.
API-mediated. If the customer has wrapped their historian in a standard data API (PI Web API, Ignition Web Dev), Aevus uses the API rather than the native client. Better isolation, audit trail.
Replicated subset. For large historians, Aevus subscribes to a filtered subset replicated to a cloud time-series store (Timestream or InfluxDB Cloud) for analytics workloads. The customer's authoritative historian is unchanged.

Aevus never writes to the historian. Read-only by IAM at the cloud account boundary — same architectural pattern as our read-only OPC UA and Modbus postures. Architecturally enforced via IL-9000.

That's industrial historians.

If your team is running PI, Ignition, Wonderware, or any time-series store at scale and wants predictive intelligence on top of it without changing a thing in your existing stack, Aevus is the conversation.

Request a pilot conversation →More articles in the hub Predictive failure (companion read)

What is SCADA?

Architecture, layers, and the signal path from sensor to control room.

Telemetry Basics

How industrial telemetry works from sensor to historian.

Data Fusion

SCADA, MES, CMMS, ERP into one decision surface.

See Aevus in action →

Ready to see how this applies to your operation? Start a pilot conversation — no commitment, no field changes.

← All articles in the Learning Hub