We know that disk drive capacities are increasing and at the same time reducing in cost per GB. With the price of SSD reducing and the huge performance it brings companies, now more than ever, organisations are looking at different ways to spend I.T. cash. Backup is an un-productive necessity that consumes resources, with no apparent gain. It should almost be a set and forget task but the problem is the amounts of data being backed up, the time it takes and the various physical and virtual servers and applications. The need for reliable backup, only becomes blatantly apparent, when data is lost!
The short answer is yes, but it comes at a cost that isn't really contemplated when considering backup. Today de-duplication technology is almost everywhere from the smallest NAS systems through to enterprise class storage and backup solutions.
De-duplication condenses repetitive data in a block and creates a hash tag, this technology compresses data from 10:1 up to 100:1 and is primarily used in de duplication backup appliances or backup software, a backup of 50TB could compress down to 1TB or less depending on the type and compressibility of the data using de-duplication. The issue with de-duplication is if for any reason something gets corrupted you may have lost 100% of your backup data, whereas if you have 50 LTO Ultrium tapes and you threw one away you would still have 98% of your backup data. In reality many companies typically purchase two de-duplication appliances or replicate data between sites.
Why not replicate data between sites. This is a simple idea, but what if someone deleted a file a month ago and now needs it restoring? The replication software should be capable of handling file versioning and this enables you to retain "x" number of copies before the original is overwritten. In addition to this your replication software should have single instance copy, so if you send a file to 100 people only one copy of the file is sent. Replication software is fine for restoring files but for applications and databases it becomes unsuitable due to the complexities of the restoration process.
Backup to local disk using a SAN/NAS/DAS etc. This is faster than the de duplication method as it doesn't need to compress and create a hash file as well as costing less. The issue with a disk only backup is the amount of data you need to store your backups, and typically is a local backup, not good for disaster recovery. Below is an example of a typical backup schedule.
1. Weekly full backup 20TB's retained for 1 month (we'll exclude the months that have 5 weekends to make it easier)
2. Monthly backup kept for 1 year
3. Daily incremental backup 5% change
4. Data volume growth 20% per annum
So based on the above we would need a 372TB disk based backup solution that consumes quite a bit of power and will need monitoring in case of disk or RAID failure.
Put all your data in the cloud. This is a great way to store data as you no longer have to worry about application updates, patches, you can access it anywhere and pay for processing power as and when required. Running applications in the cloud can for some businesses be ideal and you will often find a portion of an organisations data is in the cloud and the remainder on-site, but backups are typically done locally due to limited internet bandwidth.
The downside is typically price and data protection. The price is based on four things.
1. Data is charged per Gigabyte sent
2. How much data do you retrieve per month
3. Monthly management fee
4. How many copies of your data do you want to store
Cloud data protection is a bigger issue as many organisations need to ensure the data remains within international boundaries, the company providing the cloud service isn't going to pull the plug and finally who owns the data if the cloud storage bill isn't paid.
Trust your storage and do nothing. Believe it or not many people do take this approach and some are lucky but most are not. Any type of computer or disk system is prone to failure electrical, virus, malicious, act of god, accidental and this is why we shouldn't just rely on having one copy of your data.
A good example of a backup system is D2D2T (disk to disk to tape). The backups are sent to disk, this data can also be de-duped. The backup to tape can then, if required take place outside the normal backup window. Alternatively you could specify a backup job that automatically backs up to tape and the other backups to disk only.
When the data is written to LTO Ultrium tape the data is written in the original file format.
The benefits of a D2D2T system are reduced backup window as disk is faster, restores are quicker, data is easier to replicate. Data can be taken offsite for regulatory compliance and disaster recovery. The LTO Ultrium tape can then be used to restore large amounts of data at relatively low cost.
Another method is to put in place a tiered storage solution whereby data that hasn't been accessed within a time period, file type, size etc is moved to lower cost storage platforms, thus freeing up tier 1 storage and reducing your backup window and complexity.
Since the launch of LTO Ultrium tape technology and now with the 9th generation of Ultrium technology storing 18TB's natively and a roadmap to eight generations tape still provides.
1. The lowest cost per Gigabyte of any storage technology
2. Far lower energy consumption than any disk based system
3. Greater reliability compared to disk
4. Backup speed not far off disk
5. Easy to create a second or third copy
6. Easily stored
Whilst tape appears to have a diminishing role in business we feel it still has a function to play in I.T. departments so long as the backup requirement and process has been analysed and all alternatives and options considered.