Dealing with the data explosion


30 Mar 2005

Businesses are drowning in a sea of digital data. As more business becomes electronic, more data is being collected. In turn that data needs to be stored, usually in a manner that is compliant with data protection and other legislation, backed up in case of a disaster and the information that is of value to the businesses available in real-time. Quite a challenge for IT departments to have to deal with.

To give some sense of the issue on a global scale How Much Information, a University of Berkeley project, estimated that there was five exabytes of new information created in 2002, which is equivalent to about 800MB for every man, woman and child on the planet. Interestingly 92pc of new information is stored on magnetic media, primarily hard disks, while film represents 7pc of the total, paper 0.01pc, and optical media such as DVDs and CD account for just 0.002pc. The study also found that new stored information is growing at a rate of about 30pc a year.

So how are businesses dealing with this massive explosion in data that needs to be stored and managed? With the prices of hard disks and other storage hardware tumbling it would seem relatively straightforward to simply throw more hardware at the problem. After all storage area networks (SANs), dedicated pools of storage that are networked using high-speed fibre channel technology, are now at a price level that even relatively small companies can afford them. Network attached storage, which also provides a central pool of storage for your servers, is available to buy for hundreds of euros per device.

Unfortunately there is a law of diminishing returns at work and simply adding more hardware does not solve the problem. “In the past the view was that you simply went to a disk vendor and purchased more capacity,” says Mark O’Neill, a storage specialist with Computer Associates (CA). “That was an easy option, but you still have to manage it, backup and restore the data — that’s where the significant overhead is.”

In addition, while acquiring a new high-capacity SAN is relatively cheap, managing and optimising its use is not a trivial task, while there is also a need to have a supporting infrastructure around it.

“The big difficulty for server message blocks (SMBs) putting in fibre is do they have the backbone to support it, including fibre switches and cables,” explains Neil Mullaney, joint managing director of Unitech Systems. “You also need replication of the elements so you don’t lose access to your storage. The investment in the network can easily be a bigger investment than the storage.”

Alternative technologies such as internet small computer system interface (iSCSI) and fibre channel over internet protocol (IP), based on IP networks rather than fibre channel are currently being touted as an answer to this problem, although the jury is still out on the impact they are likely to have. Local implementations of these technologies are also rare.

“Fibre channel SANs are still the core of the enterprise — that hasn’t been impacted by the new technologies,” says Basil Bailey (pictured), CEO of Xpert Technology. “The argument that one will replace the other is irrelevant. Fibre channel is still highly relevant where synchronous storage is required while the new technology is more relevant in getting information out to the periphery of an organisation.”
Rather than looking at alternatives, most companies are adopting information lifecycle management (ILM), a strategy built around moving data through tiers of storage as its value to the business reduces. Typically tier 1 would be a fibre channel SAN while archive material eventually makes its way down to tape.

“ILM has been around for years,” says Howard Roberts, principal consultant of Sabeo Technologies. “I remember StorageTek talking about it in 1974 when it had the concept of the right information in the right place at the right cost. So it’s been around on the mainframe forever — it’s just now it’s starting to be talked about in open systems.”

The key to ILM is categorising data when it is created, which is the basis for evaluating what disk or media type it is stored on over the time that it remains valuable to the company. The big challenge is automating movement of the data according to business rules without creating a huge administrative effort. This is where a close marriage between technology and policies is required to ensure seamless transfer of the data. As a result, in an effort to automate these tasks, software has become an increasingly central part of the storage story.

Regulations such as Sarbanes-Oxley Act, 2002, Basle II, the requirements of US Food and Drug Administration legislation and our own Data Protection Acts are extending the period that organisations are required to store data for. Even after the information has lost any value to the company, they could be required to restore it to meet the requirements of the legislation.

Compliance with such regulations is having an impact not only on data storage policies but on the entire way organisations are doing business. “People seem to think compliance with legislation such as Sarbanes-Oxley or from Securities and Exchange Commission rules is just another form to fill in, but it fundamentally changes the way a company functions,” says Tony Quinn, country sales manager, EMC Ireland.

Paul Marnane, business unit manager for enterprise servers and storage at Hewlett-Packard, believes that in order to help organisations tackle their compliance issues, vendors will have to change the way they sell the technology. “You can’t just say this is the latest storage solution,” says Marnane. “The IT industry itself has to come to terms with the legal requirements of different industries and tick all the boxes for them.”

One of the major contributors to data growth is undoubtedly email. While it has definitely become a vital business tool it is also a repository for users’ personal messages and attachments such as space hogging audio and video clips, pictures and Office documents — most of which have no value to the organisation.

“Many companies are now talking about understanding the business information being transacted in email,” says Quinn. “The amount of key correspondence held in email versus fax or letters is huge. You may need to come back to that in a year or two, but you don’t want to store it on your production server as that would be prohibitively expensive. You need standards in place to trap it and then have it available if it’s needed.”

Vendors such as EMC and Hewlett-Packard have responded to this challenge with email archive products that allow you to retain and manage email as a record of business transacted.

“Email archive solutions are mostly implemented for compliance reasons,” says Simon O’Gorman, consultant systems engineer with Horizon. “There is a dictum in the industry that if a piece of data hasn’t been read in 90 days it won’t be read. From the day-to-day business point of view data more than 90 days old has limited impact but from a legislation point of view you need to be able to trace your older transactions.”

As data grows exponentially it raises a second challenge — how to cost effectively and efficiently backup that data and will the business be able to restore what it needs in the time frame required by the business or industry regulators.

The technology that has had the biggest impact on backup strategy in the past few years is Serial ATA hard disk. Serial ATA actually refers to the computer bus used to transfer data to and from the disks, and Serial ATA, as opposed to Parallel ATA that it replaces, provides significantly more bandwidth and also scales more easily. Serial ATA means that high performance disk-based storage is now available at commodity prices. The result? Tape-based backup is declining in popularity as more reliable disk-based backup, which used to be prohibitively expensive, becomes the standard. Storage vendors such as EMC are even releasing systems that look like tape to a users backup software but are actually disk based, so the change over can be totally seamless.

“There will always be tape and some form of removable media such as DVD,” says Roberts. “But you need to try to get the mix between them right — disk is very reliable but it’s not the fastest.”

One of the advantages of disk-based backup is that there are no tapes that need to be securely transported and stored offsite. The disks that you backup to can easily be in a secure data centre managed by a service provider rather than in your own computer room. With tapes you will need to physically transfer them from your own premises to a secure facility of your own or a third party. The inherent risks of that approach were highlighted last February when Bank of America lost a number of backup tapes as they were being shipped to a secure centre. More embarrassingly for the bank the tapes in question contained confidential account data on US federal employees including senators, which could potentially be used to access their accounts or make purchases with their credit cards.

By John Collins