A 6-hour pipeline outage. The post-mortem found the warning had been there for 11 days.
A modeled-scenario walk-through of a midstream pipeline shutdown — what the alarms said, what they didn't, and what an operational-intelligence layer would have surfaced before the trip. Composite operator. Real failure pattern. Modeled costs.
The setup
"Sterling Midstream" runs a 312-mile natural-gas gathering and transmission system across north Texas. Six compressor stations, 47 mainline valves, 124 measurement points. Their SCADA is AVEVA OASyS with a Rockwell ControlLogix substrate at the stations and Schneider SCADAPack RTUs at the wellhead-tie-in points. The operations center runs three operators per shift on a 12-hour rotation.
Compressor Station 4 (CS-4) is a critical pinch point — 28% of the system's daily throughput passes through it. The station runs three reciprocating compressors, two online and one on rotation standby. The units were installed in 2017. A scheduled major overhaul was due for the lead unit in Q3 2026.
What the operators saw
For 11 days leading up to the outage, the lead compressor at CS-4 had been throwing individually-routine alarms. None of them were P1 (critical). None tripped the unit. All cleared on acknowledgment. Each shift saw a few of them, acknowledged, moved on.
Pulled from the historian later in post-mortem, the alarm log showed a pattern that no single shift was positioned to see:
- Bearing temperature P3 (advisory): 17 instances over 11 days, concentrated in the last 4 days.
- Vibration X-axis P3 (advisory): 11 instances, escalating amplitude.
- Vibration Y-axis P3 (advisory): 6 instances, lower amplitude but newly appearing in the last 3 days.
- Lube oil pressure P2 (high priority): 4 instances of brief dips below the warning threshold, all of which auto-cleared.
- Discharge temperature P3: 8 instances, trending upward.
None of these alarms was wrong. Each one accurately reflected the instrument's reading at the time. The HMI surfaced them, the operator acknowledged them, the historian recorded them. The system did exactly what it was designed to do.
What the system didn't do was tell anyone: "these alarms are correlated across two days of accelerating frequency and they collectively describe a bearing failure progressing toward catastrophic loss of lubrication."
Timeline of the failure
The cost
Total modeled cost: ~$850K direct + 6-week capacity hit. Compare that to a planned bearing replacement during a scheduled outage: ~$45K. The cost of the failure being unplanned is ~19× the cost of catching it.
What Aevus would have surfaced
Day 5 — multi-mode alarm correlation
Aevus's alarm-correlation engine pattern-matches across mode (temperature + vibration) on the same asset. By Day 5, the bearing-temp + vibration-X correlation would have triggered a Severity-3 advisory: "compressor 4-A is exhibiting the classic multi-mode signature of progressing bearing distress; recommend planned-window inspection within 14 days."
Day 7 — escalation on Y-axis appearance
The Y-axis vibration appearing is operationally meaningful. Aevus would have automatically escalated to Severity-2 with an explainable note: "lateral vibration mode now present on lead bearing in addition to axial mode; this signature historically precedes catastrophic bearing failure by 5-10 days in similar reciprocating units. Recommend immediate maintenance window."
Day 9 — Severity-1 on lube oil pressure dips
The lube oil pressure dips were auto-clearing on each occurrence but were the capstone signal of the failure pattern. Aevus would have surfaced these as Severity-1 immediately: "active failure mode in progress; trip predicted within 48-72 hours; recommend controlled shutdown before unplanned trip."
Modeled outcome with Aevus
With a Day-7 Severity-2 escalation, the maintenance team performs a controlled-shutdown inspection of CS-4 lead unit on Day 8. Bearing distress confirmed. Planned replacement scheduled for Day 12 using the standby unit (which is brought up to ready-state in advance, not in an emergency). No 3:42 AM trip. No customer contracts in breach. No PHMSA notification. Total spend: ~$45K planned. Total avoided cost: ~$805K + the 6-week capacity hit.
Why the conventional SCADA missed it
1. Alarm rationalization done per-tag, not per-failure-mode
Sterling's alarm rationalization (a great practice) was set up tag-by-tag — each sensor's thresholds are tuned to filter false positives. That works for individual instruments. It doesn't catch the case where multiple instruments are each within their per-tag thresholds but collectively indicate a failure mode.
2. Shift-bounded operator memory
Each operator saw only their 12-hour window. The pattern only emerged over multiple shifts and multiple days. Without an analytics layer doing the multi-shift integration, the pattern was invisible to the people best positioned to act on it.
3. The maintenance backlog buffer
Day-9's maintenance ticket was sized as "advisory, can wait 6 days." That sizing was correct given the information available to the operator. With the multi-mode correlation context, the ticket would have been "emergency, deploy now" — and the cost differential of being wrong about that judgment is ~19×.
What this case study is, and isn't
What it is: a modeled walk-through of a real and recurring failure pattern — reciprocating-compressor bearing failure with classic multi-mode signature — showing how individually-routine alarms hide a collectively-actionable signal.
What it isn't: a deployed-customer testimonial. Aevus is pre-revenue. When we have measured outcomes from real deployments, we'll publish those with the customer's name and explicit numbers — clearly labeled as measured.
If you've worked through a 3 AM trip like this.
If your operations team has ever stared at a post-mortem and realized the alarms were there all along, that's the exact gap we're built to close. The conversation takes 30 minutes; the lookback you'd run during it is worth more than that on its own.
Related Articles
Watch AI catch degradation that classic SCADA misses.
Time-series databases, deadband compression, and query patterns.
One transmitter failure cascades into 1,217 alarms.
Ready to see how this applies to your operation? Start a pilot conversation — no commitment, no field changes.

