ML Anomaly Detection for Grid Telemetry: Methodology and Validation

Why Simple Thresholds Fail for Grid Anomaly Detection

The most common approach to anomaly detection in industrial SCADA environments is threshold-based alarming: configure a high and low limit for each measurement, and fire an alarm when a reading falls outside those limits. It is simple, auditable, and widely understood by operations teams.

It is also inadequate for detecting the kind of early-stage anomalies that precede grid incidents.

Consider a substation serving a large commercial district. Its peak load on a summer weekday afternoon might be 85% of capacity. On a winter Sunday night, that same substation might sit at 20% of capacity. A reading of 60% of capacity is completely normal on a weekday afternoon — but on a winter Sunday night, it represents a significant and potentially concerning deviation from expected behavior. A static threshold set to alarm at, say, 80% would miss the Sunday night anomaly entirely, while potentially generating nuisance alarms on hot summer afternoons when 85% is exactly what you would expect.

This is the core problem with threshold-based anomaly detection for grid telemetry: normal is not a fixed number. It is a distribution that varies by time of day, day of week, season, temperature, and dozens of other factors that differ by substation and by the load profile of what that substation serves.

Effective anomaly detection requires a model of what normal looks like for each substation, at each point in time — and the ability to detect deviations from that model in real time.

Our Approach: Per-Substation Statistical Baselines

HOP Sensors uses a per-substation baseline model that captures each site's normal behavior across multiple time dimensions. The core statistical approach is Z-score anomaly detection against a continuously updated rolling baseline — but the implementation has several design choices that matter significantly for operational performance.

Z-Score Anomaly Detection

A Z-score measures how many standard deviations a reading is from the mean of its reference distribution. A reading with a Z-score of 0 is exactly average. A Z-score of 2 means the reading is two standard deviations from the mean — unusual but not alarming. A Z-score of 4 means the reading is four standard deviations from the mean — rare enough to warrant attention under a normal distribution. The HOP Sensors anomaly score is derived from the Z-score but normalized to a 0–1 scale and capped to avoid runaway values on extreme readings.

Time-Stratified Baseline Windows

Rather than computing a single rolling mean and standard deviation across all historical readings, we stratify the baseline by time. Each substation maintains separate baseline statistics for each hour-of-day and day-of-week combination — 168 distinct baseline windows per measurement point. A reading taken at 2 PM on a Tuesday is compared against the baseline computed from all previous 2 PM Tuesday readings at that substation, not against the global mean.

This stratification is the most important design choice in the baseline model. It is what allows the system to distinguish between a reading that is high for this time of day at this substation (anomalous) and a reading that is high in absolute terms but normal for the current operating conditions (not anomalous).

Exponentially Weighted Updates

Baselines are not static — they update continuously as new readings arrive. We use exponentially weighted moving averages for both the mean and variance estimates, with a decay factor tuned to give approximately 30 days of meaningful weight to historical readings. This means the baseline adapts to slow, structural changes in a substation's load profile (a new large commercial customer coming online, seasonal demand shifts) while remaining stable enough to detect fast-moving anomalies without chasing them.

Multi-Measurement Composite Scoring

Each substation generates readings across multiple measurement points — voltage, current, frequency, power factor, and others depending on the instrumentation available. Anomalies in real grid incidents rarely appear in isolation; a developing fault typically produces correlated deviations across multiple measurements simultaneously.

The per-substation anomaly score is a weighted composite of the individual measurement Z-scores, with weights tuned to reflect the operational significance of each measurement type. A frequency deviation combined with an unexpected current spike produces a higher composite score than either anomaly in isolation — reflecting the higher likelihood that the combination represents a genuine fault condition.

False Positive Management

An anomaly detection system that cries wolf is worse than useless — it trains operators to ignore alerts, which defeats the entire purpose. False positive management is not an afterthought in our approach; it is a first-class design requirement.

We manage false positives through three mechanisms:

Adaptive alert thresholds. The anomaly score threshold that triggers an operator alert is not fixed. It is set per-substation based on that site's historical false positive rate during the calibration period. A substation with inherently noisy telemetry (perhaps due to its instrumentation or load characteristics) gets a higher threshold than a substation with clean, stable readings. The goal is a consistent false positive rate across the fleet, not a consistent threshold.

Alert suppression windows. Known operational events — scheduled maintenance, planned switching operations, seasonal ramp events — can be flagged in advance, and the anomaly scoring engine applies suppression during those windows. An anomaly score that would normally trigger an alert during a planned load transfer is held rather than delivered, avoiding the nuisance alarm while preserving the audit trail.

Minimum duration filtering. Single-reading anomalies are noted in the system but do not trigger operator alerts. An alert is only delivered when the anomaly score exceeds the threshold for a configurable minimum duration — by default, two consecutive scoring cycles (10 seconds). This filters out transient sensor glitches and measurement noise that resolves immediately.

The Cold Start Problem

A per-substation baseline model requires historical data to be meaningful. A substation that was connected to HOP Sensors yesterday has no baseline — it cannot be compared against its own history because that history does not exist yet.

We handle cold start through a two-phase onboarding process. During the first 14 days of a new substation connection, the system operates in observation mode: readings are ingested and stored, baselines are built, but anomaly scoring is not yet active. During this period, the system builds the time-stratified baseline windows across all 168 hour-of-day/day-of-week combinations.

After 14 days, anomaly scoring activates, but with conservative (higher) alert thresholds. As the baseline accumulates more data and stabilizes over the following 30 days, thresholds are progressively tightened toward their calibrated values. The full calibration period is 44 days from first connection — after which the model is operating with a meaningful baseline and appropriately tuned thresholds.

Validation Against Historical Data

Anomaly detection models are only useful if they can be shown to detect real anomalies. We validate the HOP Sensors approach against historical SCADA datasets with known incident timestamps.

Incident Type	Events Tested	Detection Rate	Median Lead Time	False Positive Rate
Feeder overload (developing)	47	94%	4m 20s before trip	2.1%
Transformer thermal anomaly	23	91%	11m 40s before alarm	1.8%
Frequency deviation event	61	98%	8s before threshold breach	0.9%
Sudden load drop (fault upstream)	34	88%	22s after onset	3.4%
Sensor/instrumentation fault	19	74%	varies	8.2%

Sensor and instrumentation faults show the highest false positive rate and lowest detection rate — an expected result, since a malfunctioning sensor produces readings that look anomalous from a statistical standpoint but do not represent real grid conditions. Distinguishing a sensor fault from a real anomaly is an active area of development; the current approach uses cross-measurement correlation to flag likely instrumentation issues separately from operational anomalies.

What This Means in Practice

For a utility operator, the practical implication of this approach is an anomaly detection system that behaves like a knowledgeable colleague rather than a blaring alarm. It knows what normal looks like for each substation at each time of day. It does not alert on every transient deviation. And when it does alert, the alert carries a meaningful signal — not just that something is outside a threshold, but that something is statistically inconsistent with that site's learned behavior.

That is the difference between an anomaly detection system that operators learn to trust and one they learn to ignore. We believe that trust is the most important metric for any alerting system in critical infrastructure operations.

Request a demo and we will walk through how the anomaly scoring behaves on your specific network topology and measurement types.