Doing More With Less - Approaches to de-duplication

07/05/09 | in: De-Duplication

Symantec note that “As companies integrate disk and discover its benefits as a more active component of their back up environments, they soon realize that they cannot keep all of their backup data on disk. Despite declines in the cost of disk storage, they still lack the available capacity to recover most data from disk locally in the data centre. Data de duplication is a disk-based technology that enables companies to eliminate duplicate backup data and significantly decrease the storage, and in some cases bandwidth, consumption.”

We agree with these observations and also note that a ‘one size fits all’ approach to de-duplication does not deliver the most effective solution. S3’s focus technologies for de-duplication are Data Domain’s appliance based technology and Symantec’s software approach using Puredisk.

The first point to make is that we do not consider these to be fundamentally competitive. At the most obvious level Puredisk is targeted at users of Net Backup and although it is designed to work with Symantec software it does not directly de-duplicate from Enterprise Vault email archives or Backup Exec. Data Domain appliances de-duplicate data from any source. It is also not, as some argue, the obvious solution for SMB environments because it is based on Linux architecture and most SMBs we meet operate Windows based infrastructures. But in saying this let’s not forget that Net Backup is the dominant corporate backup technology and so potentially has an important role to play.

So where do these two technology approaches fit?

Where the backup application is not Net backup, perhaps Bakbone or CommVault, Data Domain would be S3’s default recommendation as the clear technology and market leader in appliance based de-duplication.

We see Puredisk in many ways as a competitor to backup applications like EVault rather than to Data Domain. This is because in distributed environments it allows the market leading backup technology to optimize the use of available network bandwidth by minimizing the amount of data to be transmitted between sites and to the data centre.

In highly distributed corporate environments a mix of Data Domain and Puredisk will often be the most efficient answer. The Data Domain appliance will satisfy the throughput requirements of the data centre, where it can be tightly integrated with Net Backup, and will often achieve a higher de-duplication factor. Puredisk will satisfy the requirements of remote Net Backup users and make efficient use of network bandwidth.

Incidentally Data Domain is the first vendor to release a product based on Symantec’s Net Backup Open Storage (OST) API designed to facilitate backup to disk without emulating tape. Working together these approaches offer the opportunity to leapfrog VTL and, in the words of Matt Kixmoeller, Symantec VP of product marketing. “Stop pretending these disk devices are tape and start treating them as disk.”

Whatever approach users take de-duplication is certainly one of the key technologies in helping users to do ‘more with less’.