Industry Insights

Lockout/Tagout in Data Centres: Energy Isolation Without Compromising Uptime

The data centre is the only industrial environment where the entire purpose of the facility is to never stop running. The OSHA framework that governs energy isolation under 29 CFR 1910.147 was written for environments where equipment can be taken offline. In a data centre, that assumption breaks down on day one. Maintenance has to happen on running infrastructure, around continuous operations, without interrupting the loads that the facility exists to serve. A standard LOTO programme that does not account for that constraint will create either downtime or unsafe shortcuts.

By Zentri Team May 1, 2026 9 min read

Lockout/Tagout in Data Centres: Energy Isolation Without Compromising Uptime

The data centre is the only industrial environment where the entire purpose of the facility is to never stop running. Every other site has scheduled downtime, planned shutdowns, or seasonal maintenance windows. Data centres do not. The promise the operator makes to customers is that the lights stay on continuously, that workloads keep executing, and that the facility's redundancy architecture absorbs whatever maintenance activity is happening behind the scenes.

That promise creates a particular problem for lockout/tagout. The OSHA framework that governs energy isolation under [29 CFR 1910.147](https://www.osha.gov/laws-regs/regulations/standardnumber/1910/1910.147) was written for environments where equipment can be taken offline. In a data centre, that assumption breaks down on day one. Maintenance has to happen on running infrastructure, around continuous operations, without interrupting the loads that the facility exists to serve. A standard LOTO programme that does not account for that constraint will create either downtime or unsafe shortcuts.

## What makes data centre LOTO different

The energy sources that need controlling in a data centre are familiar ones. High-voltage electrical at the utility feed and main switchgear. Lower-voltage distribution through PDUs and rack-level circuits. Mechanical hazards in chillers, pumps, and CRAH units. Stored energy in UPS battery systems and generator starting circuits. Hydraulic pressure in some cooling systems. None of this is exotic. The complication is the operating context.

Concurrent maintainability is the design principle that defines most modern data centre infrastructure. Uptime Institute Tier III certification requires that any single component can be removed for planned maintenance without affecting the IT load. Tier IV adds fault tolerance, so the facility survives any single failure in addition to any planned activity. Both designs assume that redundancy paths absorb maintenance impact. That assumption only holds if the LOTO procedure preserves the redundancy throughout the work.

Add to this the contractor density. Most data centres run with a small permanent operations team and a much larger pool of specialised contractors who handle electrical work, generator service, UPS replacement, chiller maintenance, and the dozens of other discipline-specific activities that keep the facility running. Each contractor has to interact with the LOTO programme without inside knowledge of the facility's specific architecture.

And then there is the [NFPA 70E](https://www.nfpa.org/codes-and-standards/all-codes-and-standards/list-of-codes-and-standards/detail?code=70E) overlay. High-voltage data centre work that cannot be fully de-energised falls under the energised work permit framework, with arc flash boundary calculations and specific PPE requirements that the standard 1910.147 procedure does not address. The result is a LOTO environment where the standard playbook is genuinely insufficient.

## The dependency mapping problem

Most data centre LOTO failures come from the same root cause: the technician applying the lockout did not have a clear picture of what would be affected downstream. The intent is correct. The procedure is documented. But the dependency map between the isolation point and the running load is incomplete.

Uptime Institute's 2025 Annual Outage Analysis found that nearly 40 per cent of organisations had suffered a major outage caused by human error in the previous three years. Of those incidents, 85 per cent stemmed from staff failing to follow procedures or from flaws in the procedures themselves. The proportion caused by failure to follow procedures rose by ten percentage points compared with 2024 [1].

Power remains the leading root cause of impactful data centre outages, at 45 per cent in 2025, most often originating in UPS-related issues during maintenance or replacement [1]. The financial consequences are significant. 54 per cent of operators reported their most recent serious outage cost more than $100,000, and one in five reported more than $1 million [1]. Most operators (80 per cent) believe better management and processes would have prevented their most recent downtime [1].

The pattern matters because it points to a specific intervention. The procedural failures that drive most data centre outages are not random. They are concentrated in moments where the technician executing the work did not have visibility into what was downstream of the isolation point. A breaker tagged out for maintenance turns out to have been the active feed for a downstream PDU because the supposed B-side feed had been switched offline three weeks earlier and not noted in the dependency record. A switch isolation that was meant to affect a single rack ends up taking down a row because the redundancy group definition was out of date. A contractor who had been briefed on the procedure executed it correctly against documentation that no longer matched the live state of the facility.

These are dependency mapping problems before they are LOTO problems. The lockout itself was applied correctly. The map underneath it was wrong.

## Three principles for safe data centre LOTO

Data centre LOTO programmes that stay clear of the cascade failure mode share three operational principles. None of them are unique to any one platform or vendor, but all of them are difficult to maintain manually at scale.

The first principle is map before you lock. Every isolation event needs an explicit understanding of the equipment hierarchy from the utility feed down to the affected racks. That includes primary power paths, redundant paths, emergency power paths, and the specific dependencies between switchgear, UPS systems, PDUs, and downstream loads. Without that map in place at the point of isolation, the technician is operating on assumptions about facility state. Those assumptions are the source of the cascade failures the Uptime data captures.

The second principle is verify redundancy stays active throughout the work. A/B redundancy in a data centre is not static. The B-side feed that was meant to absorb the loss of the A-side may itself have been derated by a recent contractor visit, or temporarily reconfigured for an unrelated maintenance task. The LOTO procedure has to confirm, at the moment of isolation, that the redundant path is genuinely live and capable of carrying the full load. Not last week, not according to the design document, right now. This is one of the most common gaps in mature programmes that have not been updated as the facility has evolved.

The third principle is document downstream impact for every isolation. Even when the work is routine, the record of what was affected, what redundancy state was verified, and what the post-work test confirmed becomes part of the operational history of the facility. When something does eventually go wrong, that record is the difference between a fast root cause analysis and a multi-day investigation. NFPA 70E energised work permits require something similar for high-voltage work, but the same documentation discipline applies to all isolation events in a critical environment.

Implementing these three principles manually means maintaining current dependency diagrams, redundancy state records, and impact assessments across every asset in the facility. The work is straightforward in concept and brutal in practice. Most data centre operators acknowledge that their dependency documentation drifts behind the live state of the facility, and that the manual reconciliation between design records and current operations is one of their most consistent operational gaps.

## NFPA 70E and arc flash considerations

For data centre work that cannot be fully de-energised, NFPA 70E imposes additional requirements that sit alongside the OSHA 1910.147 framework. Energised work permits are required for tasks performed within the limited approach boundary of equipment operating above 50 volts. Arc flash boundary calculations, incident energy assessments, and PPE category selection follow from the equipment ratings and the specific work being performed.

This matters in data centres because some tasks genuinely cannot be performed under full isolation. Live IR thermography, voltage testing during fault investigation, and certain switchgear maintenance activities require a hybrid approach where the worker is protected by NFPA 70E controls while the broader facility continues to operate under its normal redundancy state.

A data centre LOTO programme that addresses 1910.147 in isolation from NFPA 70E is incomplete. The two standards complement each other in critical environments, and the documentation, training, and authorisation framework needs to bridge both. An authorised employee under 1910.147 is not automatically a qualified person under NFPA 70E. The training and qualification records have to satisfy both standards independently, particularly for contractors performing specialised electrical work.

## Where dependency mapping software fits in

The dependency mapping problem is where digital tooling makes the most practical difference. Maintaining a current, accurate map of facility power dependencies is one of those tasks that is easy on paper and hard in practice. The map drifts. The contractors who replace UPS modules update the local diagram but not the master record. The redundancy group definitions made during commissioning fall out of step with operational reality. By the time someone needs the map for a real isolation, the gap between the documentation and the live facility has widened to the point where the procedure cannot be safely executed against it.

Zentri's [ImpactView](https://www.zentri.cc/features/impactview) was built specifically to address this in the data centre context. The platform represents the equipment hierarchy as a tree structure from utility feed down to individual racks, with explicit support for primary, backup, and emergency power paths. A/B redundancy groups are mapped as first-class objects, with real-time status verification at the point of isolation. The interactive flow diagram is generated from the defined relationships, so the visual representation always matches the underlying data rather than being a separately maintained PDF.

The downstream impact preview is the operational core of the feature. Before any isolation is initiated, the system shows exactly which downstream equipment will be affected and which redundant paths will absorb the load. Cycle detection prevents circular dependency definitions that would otherwise create silent gaps in the map. None of this replaces the operational discipline of running a sound LOTO programme, but it removes the manual reconciliation work that is the most common point of failure.

## The bigger picture

Data centre LOTO is not a different standard. It is the same standard executed in an environment where the cost of getting it wrong shows up in customer-visible outages rather than safety incidents alone. The programmes that handle it well treat dependency mapping as a primary operational practice rather than a documentation exercise. The programmes that struggle treat the LOTO standard as if it were enough on its own.

If your data centre LOTO programme does not currently include live dependency mapping at the point of isolation, that is the first gap worth closing. See how Zentri's [data centre solution](https://www.zentri.cc/solutions/data-centres) handles it, or [book a demo](https://www.zentri.cc/demo) to walk through it with our team.

## References

[1] Uptime Institute, Annual Outage Analysis 2025. Industry research on data centre outage causes, frequency, severity, and human factors. <https://intelligence.uptimeinstitute.com/resource/annual-outage-analysis-2025>```