搜索
热搜: music
门户 Wiki Wiki Reference view content

Manipulation of data and dataset optimization

2014-7-26 17:19| view publisher: amanda| views: 1002| wiki(57883.com) 0 : 0

description: It is frequently useful or required to manipulate the data being backed up to optimize the backup process. These manipulations can provide many benefits including improved backup speed, restore speed, ...
It is frequently useful or required to manipulate the data being backed up to optimize the backup process. These manipulations can provide many benefits including improved backup speed, restore speed, data security, media usage and/or reduced bandwidth requirements.
Compression
Various schemes can be employed to shrink the size of the source data to be stored so that it uses less storage space. Compression is frequently a built-in feature of tape drive hardware.
Deduplication
When multiple similar systems are backed up to the same destination storage device, there exists the potential for much redundancy within the backed up data. For example, if 20 Windows workstations were backed up to the same data repository, they might share a common set of system files. The data repository only needs to store one copy of those files to be able to restore any one of those workstations. This technique can be applied at the file level or even on raw blocks of data, potentially resulting in a massive reduction in required storage space. Deduplication can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach also reduces bandwidth required to send backup data to its target media. The process can also occur at the target storage device, sometimes referred to as inline or back-end deduplication.
Duplication
Sometimes backup jobs are duplicated to a second set of storage media. This can be done to rearrange the backup images to optimize restore speed or to have a second copy at a different location or on a different storage medium.
Encryption
High capacity removable storage media such as backup tapes present a data security risk if they are lost or stolen.[13] Encrypting the data on these media can mitigate this problem, but presents new problems. Encryption is a CPU intensive process that can slow down backup speeds, and the security of the encrypted backups is only as effective as the security of the key management policy.
Multiplexing
When there are many more computers to be backed up than there are destination storage devices, the ability to use a single storage device with several simultaneous backups can be useful.
Refactoring
The process of rearranging the backup sets in a data repository is known as refactoring. For example, if a backup system uses a single tape each day to store the incremental backups for all the protected computers, restoring one of the computers could potentially require many tapes. Refactoring could be used to consolidate all the backups for a single computer onto a single tape. This is especially useful for backup systems that do incrementals forever style backups.
Staging
Sometimes backup jobs are copied to a staging disk before being copied to tape. This process is sometimes referred to as D2D2T, an acronym for Disk to Disk to Tape. This can be useful if there is a problem matching the speed of the final destination device with the source device as is frequently faced in network-based backup systems. It can also serve as a centralized location for applying other data manipulation techniques.
Managing the backup process
As long as new data are being created and changes are being made, backups will need to be performed at frequent intervals. Individuals and organizations with anything from one computer to thousands of computer systems all require protection of data. The scales may be very different, but the objectives and limitations are essentially the same. Those who perform backups need to know how successful the backups are, regardless of scale.
Objectives
Recovery point objective (RPO)
The point in time that the restarted infrastructure will reflect. Essentially, this is the roll-back that will be experienced as a result of the recovery. The most desirable RPO would be the point just prior to the data loss event. Making a more recent recovery point achievable requires increasing the frequency of synchronization between the source data and the backup repository.[14]
Recovery time objective (RTO)
The amount of time elapsed between disaster and restoration of business functions.[15]
Data security
In addition to preserving access to data for its owners, data must be restricted from unauthorized access. Backups must be performed in a manner that does not compromise the original owner's undertaking. This can be achieved with data encryption and proper media handling policies.
Limitations
An effective backup scheme will take into consideration the limitations of the situation.
Backup window
The period of time when backups are permitted to run on a system is called the backup window. This is typically the time when the system sees the least usage and the backup process will have the least amount of interference with normal operations. The backup window is usually planned with users' convenience in mind. If a backup extends past the defined backup window, a decision is made whether it is more beneficial to abort the backup or to lengthen the backup window.
Performance impact
All backup schemes have some performance impact on the system being backed up. For example, for the period of time that a computer system is being backed up, the hard drive is busy reading files for the purpose of backing up, and its full bandwidth is no longer available for other tasks. Such impacts should be analyzed.
Costs of hardware, software, labor
All types of storage media have a finite capacity with a real cost. Matching the correct amount of storage capacity (over time) with the backup needs is an important part of the design of a backup scheme. Any backup scheme has some labor requirement, but complicated schemes have considerably higher labor requirements. The cost of commercial backup software can also be considerable.
Network bandwidth
Distributed backup systems can be affected by limited network bandwidth.
Implementation
Meeting the defined objectives in the face of the above limitations can be a difficult task. The tools and concepts below can make that task more achievable.
Scheduling
Using a job scheduler can greatly improve the reliability and consistency of backups by removing part of the human element. Many backup software packages include this functionality.
Authentication
Over the course of regular operations, the user accounts and/or system agents that perform the backups need to be authenticated at some level. The power to copy all data off of or onto a system requires unrestricted access. Using an authentication mechanism is a good way to prevent the backup scheme from being used for unauthorized activity.
Chain of trust
Removable storage media are physical items and must only be handled by trusted individuals. Establishing a chain of trusted individuals (and vendors) is critical to defining the security of the data.
Measuring the process
To ensure that the backup scheme is working as expected, key factors should be monitored and historical data maintained.
Backup validation
(also known as "backup success validation") Provides information about the backup, and proves compliance to regulatory bodies outside the organization: for example, an insurance company in the USA might be required under HIPAA to demonstrate that its client data meet records retention requirements.[16] Disaster, data complexity, data value and increasing dependence upon ever-growing volumes of data all contribute to the anxiety around and dependence upon successful backups to ensure business continuity. Thus many organizations rely on third-party or "independent" solutions to test, validate, and optimize their backup operations (backup reporting).
Reporting
In larger configurations, reports are useful for monitoring media usage, device status, errors, vault coordination and other information about the backup process.
Logging
In addition to the history of computer generated reports, activity and change logs are useful for monitoring backup system events.
Validation
Many backup programs use checksums or hashes to validate that the data was accurately copied. These offer several advantages. First, they allow data integrity to be verified without reference to the original file: if the file as stored on the backup medium has the same checksum as the saved value, then it is very probably correct. Second, some backup programs can use checksums to avoid making redundant copies of files, and thus improve backup speed. This is particularly useful for the de-duplication process.
Monitored backup
Backup processes are monitored by a third party monitoring center, which alerts users to any errors that occur during automated backups. Monitored backup requires software capable of pinging[clarification needed] the monitoring center's servers in the case of errors. Some monitoring services also allow collection of historical meta-data, that can be used for Storage Resource Management purposes like projection of data growth, locating redundant primary storage capacity and reclaimable backup capacity.
Conclusion
    Wikiquote has a collection of quotations related to: Backup
Because of a considerable overlap in technology, backups and backup systems are frequently confused with archives and fault-tolerant systems. Backups differ from archives: archives are the primary copy of historical data, usually retained for the long term, while backups are a secondary copy of current data, kept on hand to replace the primary copy of that data. Backup systems differ from fault-tolerant systems in the sense that backup systems assume that a fault will cause a data loss event and fault-tolerant systems assure a fault will not.

This article possibly contains original research. Please improve it by verifying the claims made and adding inline citations. Statements consisting only of original research should be removed. (November 2009)
The more important the data, the greater the need to back it up.
The Retention period plays a very important role in deciding which media to use for backup.
Backups are dependent on the restore strategy, which should be tested.
Storing the copy near the original is unwise, since many disasters such as fire, flood, theft, and electrical surges may affect the backup at the same time, in which case both the original and the backup medium would be lost.
Automated backup and scheduling should be considered, as manual backups can be affected by human error.
Incremental backups should be considered to save the amount of storage space and to avoid redundancy.
Backups can fail for a wide variety of reasons. A verification or monitoring strategy is an important part of a successful backup plan.
Multiple backups on different media, stored in different locations, should be used for all critical information.
Backed up archives should be stored in open and standard formats, especially when the goal is long-term archiving. Recovery software and processes may have changed, and software may not be available to restore data saved in proprietary formats.
System administrators and other IT professionals are often dismissed for failing to devise and implement backup processes suitable to their organization.
Events
On 5 May 1996, during a fire at the headquarters of Crédit Lyonnais, a major bank in Paris, system administrators ran into the burning building to rescue backup tapes because they didn't have off-site copies. Crucial bank archives and computer data were lost.[17][18]
Privacy Rights Clearinghouse has documented[19] 16 instances of stolen or lost backup tapes (among major organizations) in 2005 and 2006. Affected organizations included Bank of America, Ameritrade, Citigroup, and Time Warner.
On 3 January 2008, an email server crashed at TeliaSonera, a major Nordic telecom company and internet service provider. The company then discovered that the last serviceable backup set was from 15 December 2007. This affected 300,000 customer email accounts.[20][21]
On 27 February 2011 a software bug on Gmail caused 0.02% of its users to lose all their email. The messages were successfully restored from tape backups hours after the event.[22]
up one:Backupnext:Transaction log

About us|Jobs|Help|Disclaimer|Advertising services|Contact us|Sign in|Website map|Search|

GMT+8, 2015-9-11 21:06 , Processed in 0.156560 second(s), 16 queries .

57883.com service for you! X3.1

返回顶部