Case Study · Modeled Scenario · Pipeline Outage

A 6-hour pipeline outage. The post-mortem found the warning had been there for 11 days.

A modeled-scenario walk-through of a midstream pipeline shutdown — what the alarms said, what they didn't, and what an operational-intelligence layer would have surfaced before the trip. Composite operator. Real failure pattern. Modeled costs.

Aevus / Intrepid LogicModeled scenarioFor operators · executivesPublished 2026-05-21
Modeled scenario. Composite operator, not an actual customer. Failure pattern and instrumentation are drawn from real midstream-industry experience via the Aevus advisory board. Costs and timeline are realistic-modeled, not measured from a deployed Aevus customer. Aevus is pre-revenue; we don't claim outcomes we haven't yet delivered. We claim the pattern is real and the gap is the one we close.

The setup

"Sterling Midstream" runs a 312-mile natural-gas gathering and transmission system across north Texas. Six compressor stations, 47 mainline valves, 124 measurement points. Their SCADA is AVEVA OASyS with a Rockwell ControlLogix substrate at the stations and Schneider SCADAPack RTUs at the wellhead-tie-in points. The operations center runs three operators per shift on a 12-hour rotation.

Compressor Station 4 (CS-4) is a critical pinch point — 28% of the system's daily throughput passes through it. The station runs three reciprocating compressors, two online and one on rotation standby. The units were installed in 2017. A scheduled major overhaul was due for the lead unit in Q3 2026.

312
Miles of pipeline
CS-4
28% of system throughput
3
Operators per shift
~9
Years since lead-unit overhaul

What the operators saw

For 11 days leading up to the outage, the lead compressor at CS-4 had been throwing individually-routine alarms. None of them were P1 (critical). None tripped the unit. All cleared on acknowledgment. Each shift saw a few of them, acknowledged, moved on.

Pulled from the historian later in post-mortem, the alarm log showed a pattern that no single shift was positioned to see:

  • Bearing temperature P3 (advisory): 17 instances over 11 days, concentrated in the last 4 days.
  • Vibration X-axis P3 (advisory): 11 instances, escalating amplitude.
  • Vibration Y-axis P3 (advisory): 6 instances, lower amplitude but newly appearing in the last 3 days.
  • Lube oil pressure P2 (high priority): 4 instances of brief dips below the warning threshold, all of which auto-cleared.
  • Discharge temperature P3: 8 instances, trending upward.

None of these alarms was wrong. Each one accurately reflected the instrument's reading at the time. The HMI surfaced them, the operator acknowledged them, the historian recorded them. The system did exactly what it was designed to do.

What the system didn't do was tell anyone: "these alarms are correlated across two days of accelerating frequency and they collectively describe a bearing failure progressing toward catastrophic loss of lubrication."

Timeline of the failure

DAY 0
First P3 advisory. Bearing temp on lead-unit thrust bearing reads 4°F above its 90-day rolling baseline. Single occurrence. Auto-cleared on acknowledgment.
DAY 2-5
Pattern building, invisible. 8 more bearing-temp alarms across 3 shifts. 4 vibration X-axis P3s. No single shift saw the full pattern. Each operator saw "another one of those bearing alarms" and acknowledged.
DAY 7
Vibration Y-axis joins. The lateral mode joining the axial mode is the classic signature of a bearing entering distressed lubrication. Two of three operators on that day's shifts noted it in the shift log. Neither escalated to maintenance.
DAY 9
Lube oil pressure dips appear. Four brief dips below the warning threshold. Each auto-cleared in under 5 seconds. Day-shift operator opens a maintenance ticket: "lube oil system, intermittent low pressure." Ticket queued for the next planned maintenance window — 6 days out.
DAY 11, 3:42 AM
P1 critical: bearing temperature above ESD threshold. Unit auto-trips on safety interlock. Operator at the OPS center watches the trip, tries to start the standby unit. Standby fails to come online in 90 seconds (it was on quarterly cooldown — restart sequence had not been simulated in 6 months). System pressure begins falling.
3:51 AM
Low-discharge-pressure cascade. Three downstream measurement points report pressure below contracted-delivery minimums. Pipeline control sends shutdown signal to nine wellhead tie-ins on the upstream side. ~30% of system flow halts.
4:18 AM
Field tech reaches CS-4. Confirms bearing failure on lead unit — outer race spalled, lubrication contaminated. Standby unit started after extended cooldown; flow restored to 65% capacity. Lead unit out of service for ~6 weeks.
9:30 AM
Full system restoration. Bypass routing re-established for flow that doesn't need CS-4. Two customer delivery contracts in breach for the 6-hour window.

The cost

6.2h
Critical outage duration
$285K
Lost throughput value (modeled)
$140K
Contractual breach exposure
$420K
Emergency bearing-set + machining + labor
~6 wk
Lead-unit out of service
1
PHMSA notification (no findings)

Total modeled cost: ~$850K direct + 6-week capacity hit. Compare that to a planned bearing replacement during a scheduled outage: ~$45K. The cost of the failure being unplanned is ~19× the cost of catching it.

What Aevus would have surfaced

Day 5 — multi-mode alarm correlation

Aevus's alarm-correlation engine pattern-matches across mode (temperature + vibration) on the same asset. By Day 5, the bearing-temp + vibration-X correlation would have triggered a Severity-3 advisory: "compressor 4-A is exhibiting the classic multi-mode signature of progressing bearing distress; recommend planned-window inspection within 14 days."

Day 7 — escalation on Y-axis appearance

The Y-axis vibration appearing is operationally meaningful. Aevus would have automatically escalated to Severity-2 with an explainable note: "lateral vibration mode now present on lead bearing in addition to axial mode; this signature historically precedes catastrophic bearing failure by 5-10 days in similar reciprocating units. Recommend immediate maintenance window."

Day 9 — Severity-1 on lube oil pressure dips

The lube oil pressure dips were auto-clearing on each occurrence but were the capstone signal of the failure pattern. Aevus would have surfaced these as Severity-1 immediately: "active failure mode in progress; trip predicted within 48-72 hours; recommend controlled shutdown before unplanned trip."

"The whole point of operational intelligence above SCADA is to do the pattern recognition that no individual shift can do. The alarms were never wrong. They were just never read in correlation."— Aevus design rationale

Modeled outcome with Aevus

With a Day-7 Severity-2 escalation, the maintenance team performs a controlled-shutdown inspection of CS-4 lead unit on Day 8. Bearing distress confirmed. Planned replacement scheduled for Day 12 using the standby unit (which is brought up to ready-state in advance, not in an emergency). No 3:42 AM trip. No customer contracts in breach. No PHMSA notification. Total spend: ~$45K planned. Total avoided cost: ~$805K + the 6-week capacity hit.

Why the conventional SCADA missed it

1. Alarm rationalization done per-tag, not per-failure-mode

Sterling's alarm rationalization (a great practice) was set up tag-by-tag — each sensor's thresholds are tuned to filter false positives. That works for individual instruments. It doesn't catch the case where multiple instruments are each within their per-tag thresholds but collectively indicate a failure mode.

2. Shift-bounded operator memory

Each operator saw only their 12-hour window. The pattern only emerged over multiple shifts and multiple days. Without an analytics layer doing the multi-shift integration, the pattern was invisible to the people best positioned to act on it.

3. The maintenance backlog buffer

Day-9's maintenance ticket was sized as "advisory, can wait 6 days." That sizing was correct given the information available to the operator. With the multi-mode correlation context, the ticket would have been "emergency, deploy now" — and the cost differential of being wrong about that judgment is ~19×.

What this case study is, and isn't

What it is: a modeled walk-through of a real and recurring failure pattern — reciprocating-compressor bearing failure with classic multi-mode signature — showing how individually-routine alarms hide a collectively-actionable signal.

What it isn't: a deployed-customer testimonial. Aevus is pre-revenue. When we have measured outcomes from real deployments, we'll publish those with the customer's name and explicit numbers — clearly labeled as measured.

If you've worked through a 3 AM trip like this.

If your operations team has ever stared at a post-mortem and realized the alarms were there all along, that's the exact gap we're built to close. The conversation takes 30 minutes; the lookback you'd run during it is worth more than that on its own.