Deduplication: Difference between revisions

From Scalelogic Wiki
Jump to navigation Jump to search
No edit summary
m 1 revision
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
Deduplication is a process that eliminates redundant copies of data and reduces storage overhead. In turn, the deduplication ratio is the measurement of the zpool's data original size versus the data size after removing redundancy. Deduplication can be set on zvols or datasets but the deduplication ratio is displayed per pool instead of single zvols or datasets because they are located on the pool.
Deduplication is a process that eliminates redundant copies of data and reduces storage overhead. In turn, the deduplication ratio is the measurement of the zpool's data original size versus the data size after removing redundancy. Deduplication can be set on zvols or datasets but the deduplication ratio is displayed per pool instead of single zvols or datasets because they are located on the pool.


== Things to be taken into consideration ==
== Things to be taken into consideration ==

Latest revision as of 13:27, 3 December 2021

Deduplication is a process that eliminates redundant copies of data and reduces storage overhead. In turn, the deduplication ratio is the measurement of the zpool's data original size versus the data size after removing redundancy. Deduplication can be set on zvols or datasets but the deduplication ratio is displayed per pool instead of single zvols or datasets because they are located on the pool.

Things to be taken into consideration

Before using deduplication, the following matters should be taken into account:

  • hardware - deduplication is very memory consuming so if simultaneously the system would have to process the current tasks, it may significantly slow down its efficiency
  • need for quick access to the data - when archiving or backing up the data as it saves the disk space. In case when there is a small amount of repetitive data, deduplication can only cause longer write times.

It is also worth to calculate memory requirements in the following way:

  • deduplication table (DDT) entry that is equal to about 320 bytes should be multiplied by the number of allocated blocks by 320. For example: DDT size (1.08MB) x 320 = 345,6‬ meaning that this amount of memory is required for deduplication

Once deduplication is enabled, it has an impact on the whole pool since it creates the global DDT array with deduplication indicators. When deduplication is disabled, Zpool storage deduplication rate is set to 1. If the value is greater than 1, then the deduplication operation has taken place. Disabling deduplication will not cause the value to return to 1.


Removing deduplicated data

Deduplicated data must be rearranged after deduplication has been disabled on zpool by performing the send/ receive operation on the pool. The Zpool storage deduplication rate will be reset to 1, otherwise old data will be left in deduplication state.

To remove deduplicated data, disable deduplication and transfer data from sources which had deduplication enabled to a different place where deduplication is not enabled.

If the deduplication ratio on the given pool returns to value 1.0 it is possible to assume that there are no deduplicated data on the pool.