On- and Off-site Data Protection
Offsite Data Protection Service (ODPS)
ODPS description:
- Rotational auto-snapshots of a dataset or zvol according to created retention-interval plan
- Asynchronous replication of auto-snapshots delta to local or remote destinations.
Working modes:
- Rotational auto-snapshots of a dataset or zvol on local server only. The task definition omits destination node in the create task command.
-
Asynchronous replication of auto-snapshots delta to local or remote destinations, where destination is:
- another dataset or zvol within the same ZFS pool,
- dataset or zvol on different ZFS pool,
- dataset or zvol on a remote node.
NOTE: Rotational auto-snapshots on both source and destination are created according to retention plans. It is possible to have different retention plans for source and destination pool.
Options:
- It is possible to attach, list and detach backup nodes (remote nodes used to asynchronous replication)
-
It is possible to create backup task in following modes:
- only source node details provided -> snapshots created locally, no replication
- source and destination node details provided -> snapshots created locally and replicated to destination. Un-Kh source and destination keep rotational auto-snapshots.
- optionally it is possible to use mbuffer (with parameter mbuffer size) as a tool for buffering data streams
- It is possible to get details of all backup tasks.
- It is possible to delete backup task.
-
It is possible to get status of the odps service:
- service status
- last entries from logs
- It is possible to debug backup task - run it in dry mode in order to check what’s wrong.
- It is possible to restart all tasks so configuration of tasks will be reset.
Important notes:
- Only Genesis NX2 can be used as a destination node.
- Replication will not be done as long as destination dataset or zvol does not exist. User needs to create destination dataset/zvol manually.
- Replication will not be done if dataset or zvol on destination is used e.g. by iSCSI Target with active session. Data from particular snapshot can be accessed only via a clone created from the snapshot.
- User snapshots created on destination dataset or zvol will be deleted by odps service during rotation round.
- User snapshots created on source dataset or zvol are not deleted by odps service during rotation round.
- Snapshots on both source and destination that are cloned are not deleted by odps service during rotation round.
-
When replication round fails because it is not possible to replicate snapshot to destination e.g. caused by:
- lack of communication between nodes
- busy dataset on destination (used e.g. by iSCSI Target)
- existence of user’s own snapshot with clone on destination,
- existence of user’s own snapshot on destination created before first replication
then source snapshots are not rotated. At the next round of replication when conditions to replication are passed then rotation of snapshots on both source and destination is performed.
- When nodes have different sets of snapshots (no common snapshot between source and destination) then snapshots on destination are deleted and re-replicated from source.
- Odps service is activated when at least one backup task exist in the system (at pool import and system start)
- Odps service is deactivated when there’s no backup tasks in the system (at pool export).
- Replication to remote destination is encrypted (SSH).
- Odps replication processes are killed at pool export/destroy.
- Ongoing replication process is not killed when odps task is deleted. After finishing this process there’s no more replications.
- When backup plan is created like this: 1min_every_10min backup task won't start. Retention time must be always bigger than interval: 10min_every_1min.
- (to confirm) Source snapshot that is being replicated (replication of older snapshot is still in progress) blocks rotation of snapshots on source. New snapshots however are created according to schedule plan.
- (to confirm) When many destinations used only one replication is performed at given time.
Important notes (clustered environment):
- It is possible to use odps in clustered environment when both source and destination are clusters (two pairs of clusters). Each source node must then attach each destination node using its physical IP address and tasks must be created with IP of destination cluster (VIP).
- In following configuration: source (cluster) -> destination (single node) when destination node is inaccessible over network then it does not break failover on source (cluster).
- Ongoing replication processes are killed when automatic failover is performed or manual move is performed. It applies only for moved pools. replication processes connected with different pools remains continued.
Retention plans:
The ODPS plan consists of a series of retention periods to interval associations: "retention_every_interval,retention_every_interval,retention_every_interval,...".
Un-Kh intervals and retention periods use standard units of time or multiples of them, with full names or a shortcut according to the following list: second|sec|s, minute|min, hour|h, day|d, week|w, month|mon|m, year|y
Known issues:
- ODPS can cause temporary inconsistency of internal cache with zfs resources. This temporary inconsistency can lead to errors displayed on GUI that inform about missing zfs resources (usually missing snapshots). This issue happens because ODPS refreshes cache using hook script, cache is refreshed before and after any snapshot is removed by ODPS. But using hook script (only possible way of cache refresh when using external software) leaves small window when cache is inconsistent. If GUI requests information about snapshots in time between snapshot removal and cache refresh than cache is inconsistent and GUI gets information about not existing items which may lead to errors. Most common place of this error is volume edition - snapshots are checked in this window in order to lock edition of name if volume has any snapshots. After fixes this issue is only temporary, cache is refreshed soon and most probably when window is closed and reopened then error doesn’t pop up because cache is consistent again.
- Source and destination should be of the same type (zvol-zvol, dataset-dataset). It is possible to create task with source for example zvol and destination - dataset but when started it shows error.