Aevus Learn · OT Networking · 8 min read

Network Redundancy in OT. Ring topologies, PRP, HSR — and when each one earns its cost.

Industrial networks are not enterprise networks. The traffic profile is different, the failure tolerance is different, the change-management discipline is different, and the redundancy options have their own vocabulary — STP, RSTP, MRP, PRP, HSR, DLR. Half of them are over-deployed. The other half are under-deployed. This article is the practical version of where each one fits.

Aevus / Intrepid LogicIntermediateFor engineers · network architects · securityUpdated 2026-05-21

The 30-second version

Network redundancy in OT is the practice of designing the data network so that a single cable cut, switch failure, or port outage does not take down the control system. Five protocols dominate, with very different failover times and very different operational costs:

  • STP / RSTP — IT-borrowed spanning tree. Slow. Not OT-grade.
  • MRP (Media Redundancy Protocol) — sub-200ms ring failover. The workhorse of process-industry OT networks.
  • DLR (Device Level Ring) — Rockwell's flavor of ring redundancy for the EtherNet/IP world. ~3ms failover.
  • PRP (Parallel Redundancy Protocol) — dual-network sending duplicate packets in parallel. Zero failover time. Premium cost.
  • HSR (High-availability Seamless Redundancy) — ring topology with duplicate-packet method like PRP. Zero failover. Common in IEC 61850 substations.

Why OT redundancy is different from IT redundancy

Enterprise IT redundancy is mostly about availability of services. If a switch fails and a TCP connection drops for 30 seconds, the application retries. The user gets a spinner. Life goes on.

OT redundancy is about availability of the process. If a switch fails and the PLC loses comms with an HMI or a safety-interlock partner for 30 seconds, the operator may not be able to acknowledge an alarm, the IEC 61850 GOOSE message between protective relays may not be delivered, the SIS may trigger a process shutdown. The consequences scale from "annoying" to "catastrophic" depending on the protocol and the application.

That's why OT redundancy protocols are designed for tens-of-milliseconds or zero-millisecond failover, not the seconds-to-minutes failover that's acceptable in enterprise IT. RSTP's 6-second failover is laughable in an OT context. MRP's 200ms is acceptable for most process control. Zero (PRP/HSR) is required for protective relaying.

The deeper reason: in enterprise IT, the consumer of network service is a human, and humans tolerate a spinner. In OT, the consumer is a control loop or a protective relay, and those don't have a "wait and retry" mode that's safe.

The protocol cheat sheet

ProtocolTopologyFailover timeWhere it fits
STP / RSTPMesh6-30 sec / 1-2 secCarryover from IT. Not OT-grade. Avoid for control-network use.
MRPRing≤ 200 ms (config'd to 30 ms or 10 ms)The default for industrial Ethernet rings. Process plants, factory floors.
DLRRing≤ 3 msEtherNet/IP networks. Rockwell-centric. Fast-failover process control.
PRPTwo parallel networks0 ms (no loss)Substation automation, safety-critical. Premium cost — you build two networks.
HSRRing (dual-path)0 ms (no loss)IEC 61850 substations. Combines ring topology with PRP-style packet duplication.
PROFINET MRPDRing~50 ms domain-boundedPROFINET-specific. High-availability process automation.

The cost dimensions

Choosing a redundancy protocol is a cost/benefit balance across four dimensions:

1. Hardware cost

STP/RSTP/MRP/DLR run on standard managed industrial switches — Siemens SCALANCE, Cisco IE-series, Hirschmann RSP, Moxa EDS — that all OT shops already buy. PRP and HSR requiredual-attached end devices (or PRP/HSR redundancy boxes called RedBoxes that sit between the network and legacy devices). PRP doubles the network infrastructure. Cost can be 2-3× for PRP at scale.

2. Operational complexity

STP/RSTP are well-understood by anyone with IT networking background. MRP and DLR are simpler conceptually but require correct ring topology setup. PRP requires duplicate networks and dual-attached devices. HSR requires HSR-aware switches. Each step up in protocol sophistication is a step up in training-team-to-troubleshootcomplexity.

3. Failure-mode visibility

A subtle gotcha: PRP and HSR mask single failures so well that operators can run with one path failed for months without noticing. The system is "fully working" by every indicator the operator sees — because that's the whole point of zero-failover redundancy. Without explicit health-monitoring of both paths independently, you're running effectively non-redundant.

4. Compliance-driven mandate

Some sectors are explicit. NERC CIP-005 for bulk electric. IEC 61850 protective relaying requires PRP or HSR for inter-relay GOOSE messaging.FERC Order 850 and NRC-regulated nuclear control systemshave specific availability requirements. If you're in one of those sectors, the protocol is partly chosen for you.

How to choose, in practice

Step 1: Classify your traffic

Walk through every network segment and answer: "what happens if this is down for 200ms? For 2 seconds? For 30 seconds?" Most segments tolerate 200ms easily. A small subset (safety-critical, protective relaying) tolerates zero.

Step 2: Map traffic to tiers

For segments where 200ms failover is acceptable, MRP or DLR is right-sized. For segments where seconds are acceptable, RSTP is technically fine (though MRP usually wins on operational simplicity). For segments where zero failover is required, PRP or HSR.

Step 3: Don't over-deploy

The single most common mistake: applying PRP/HSR to segments that don't need it, because "redundancy is better." PRP/HSR doubles your hardware cost and adds operational complexity. The right answer for the operator dashboard network is almost always MRP, not PRP.

Step 4: Instrument both paths

Whatever protocol you choose, monitor both legs of the ring or both parallel networks independently. Don't let "the system is fine" mean "no one's checked the standby path in six months." Aevus surfaces both-path-degradation as a single signal across customer infrastructure.

The 2026 trend: software-defined OT networking

Three trends reshaping OT network redundancy in 2025-26:

TSN (Time-Sensitive Networking, IEEE 802.1)

IEEE 802.1 amendments adding deterministic latency guarantees to standard Ethernet. Promises to do what PROFINET-IRT and EtherCAT did via proprietary protocols, but on vanilla switches with vendor-neutral spec. Adoption is slow but real. Where it matures, it changes the redundancy conversation — TSN can carry both critical and best-effort traffic on the same network with bounded latency.

OT-aware SD-WAN

Replacing legacy MPLS site-to-site WANs with software-defined WAN appliances that can do dual-carrier failover, application-aware routing, and dynamic encryption. Cisco, Fortinet, Versa Networks all sell OT-tuned variants. Reduces redundancy complexity at the WAN edge.

OT zero trust + microsegmentation

See our Zero Trust for OT article. The relationship: redundancy is about availability under failure. Microsegmentation is about availability under attack. They overlap — segmenting your network into zones with redundant connectivity per zone is both. Modern OT network design increasingly treats them as one architecture problem.

How Aevus monitors OT networks

Aevus doesn't replace OT network management (your switch-vendor's tools handle that). It augments it with fleet-wide pattern detection:

  • Switch SNMP polling — port up/down, traffic counters, retry counters, ring status. Pulled via SNMP or vendor APIs.
  • Both-path health — for PRP/HSR networks, we monitor each path independently. "Working but degraded" gets surfaced before "failed."
  • Cross-site correlation — when 12 sites on the same vendor switches start exhibiting the same drift pattern, that's a model/firmware issue, not 12 independent site issues. We catch this at the fleet level.
  • Predictive port failure — long-term trends on optical-power / retry-count metrics that signal cable, connector, or transceiver degradation weeks before hard failure.

Aevus never reconfigures your network. No port changes, no VLAN changes, no failover initiations. The IL-9000 boundary denies any write to customer infrastructure — switches, PLCs, radios, anything. IL-9000 brief →

That's OT redundancy.

If your team is designing or troubleshooting OT networks across multiple sites — or wants visibility into both paths of your redundant infrastructure rather than just the active one — Aevus is the conversation.