Factors that caused UK’s IT disaster are widespread

30 Nov 2004

The factors and practices that caused a failed IT upgrade to paralyse more than 60,000 PCs at the UK’s Department for Work and Pensions (DWP) – causing some 80,000 civil servants to resort to writing out giro cheques by hand for more than 800,000 pensioners – are more widespread in Irish and UK organisations than most people are willing to believe, a leading software maker has warned.

Some 80pc of desktop PCs at the DWP were paralysed last week when a failed IT upgrade took them offline – one of the largest IT disasters in the UK Government’s IT history. It is understood that an upgrade from Windows 2000 to XP across a small number of PCs – rumoured to be only seven PCs – was taking place but went horribly wrong, affecting email, benefits processing and internet and intranet connectivity across 60,000 PCs. Outsourcing giant EDS is understood to have been providing a patch for the seven PCs, but a request went out to make it live across 80pc of computers, causing the major outage.

Paul Arthur, field director for northern European region, BMC, said that the IT disaster was caused by a lack of understanding of the business impact of IT change and could have been prevented by the introduction of change management methodology and configuration software that takes a snapshot of IT systems prior to failure and returns them to their original state. He argued that with 836,700 people claiming benefits in the UK, understanding that any change in the IT infrastructure that could have an impact on the business is critical.

“There are two major issues at stake here,” Arthur said. “Firstly, it [DWP] didn’t appear to understand the business impact of the IT change. If they had, they would never have risked it. Secondly, management of the IT change itself was a poor factor. It spread to a wider environment than it should have. When it went wrong it didn’t have a mechanism to fall back on, such as an ‘image’ of the original configuration. If you’re going to do anything like this, you have to conduct an analysis of the impact of the change on the business and be damn sure you have a recovery and back-up plan.”

Arthur cited the disaster as a classic example of IT not aligning itself with the business. An upgrade such as this, he said, should not have caused such havoc if established processes were in place.

He said he believed that the practices and processes that led to the 60,000 PC outage are commonplace across most business and government organisations in Ireland and the UK. “It’s a common practice. People are making changes to IT environments all the time. But are they aware of the business impact of the change failing?

“The way to get around this is to be fully aware of the impact of a successful change and an unsuccessful change. Make strategic decisions from a business perspective rather than simply thinking of it as a routine IT change.

“I also think it was a configuration management issue,” Arthur continued. “It thought it was were just deploying software. Instead they affected the entire business. Being a Government monopoly, the DWP is not going to lose any competitive edge here, but it is going to lose time and money for the amount of consultants and outsourcing that will have to be done. As a result, other cost saving changes that were planned for the DWP will possibly have to be put on hold while they fix this problem. Those changes could have resulted in better services to end user customers.”

Arthur concluded: “The DWP is going to have to start again on what its business continuity plan is in the event of something like this happening again. Organisations need to take a bigger view on why they invest in IT. It’s not just about enabling people to send emails to one another but about providing better services to end customers.”

By John Kennedy