MPIO global settings: Difference between revisions
migrate>Mi-S No edit summary |
migrate>Mi-S No edit summary |
||
| Line 72: | Line 72: | ||
<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">This setting allows to chose program used to obtain priority for path. Path priority is higher when it has higher value. Priorities of paths in path group are summed and group with highest priority is used when currently active group fails.</span> | <span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">This setting allows to chose program used to obtain priority for path. Path priority is higher when it has higher value. Priorities of paths in path group are summed and group with highest priority is used when currently active group fails.</span> | ||
<ol style="margin-top:0pt;margin-bottom:0pt;"> | <ol style="margin-top:0pt;margin-bottom:0pt;"> | ||
<li> | <li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Const</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">(default) - generate same priority (with value 1) for all paths. Basically this means that path group has higher priority if it has more paths. This setting will also cause that weighted round robin algorithm is never used</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">.</span></li> | ||
<span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Const</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">(default) - generate same priority (with value 1) for all paths. Basically this means that path group has higher priority if it has more paths. This setting will also cause that weighted round robin algorithm is never used | <li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Random</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- this setting will generate priority randomly in range 1 - 10 and assign it to path.</span></li> | ||
</li> | <li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">SCSI-3 ALUA</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- path priority is generated using SCSI-3 ALUA status. Detailed description can be found in this </span>[https://agilo/ticket/89573 <span style="font-size: 11pt; color: rgb(17, 85, 204); background-color: transparent; text-decoration: underline; vertical-align: baseline;">report</span>]<span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">. In short summary from mentioned report path priority is generated in following way:</span><ol style="margin-top:0pt;margin-bottom:0pt;"> | ||
<li> | <li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">Un-Kh paths are active: both path priorities are set to 50.</span></li> | ||
<span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Random</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- this setting will generate priority randomly in range 1 - 10 and assign it to path.</span> | <li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is non optimized state: priority of active path is 50 and priority of non optimized path is 10.</span></li> | ||
</li> | <li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in standby state: priority of active path is 50 and of standby path is 1.</span></li> | ||
<li> | <li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in unavailable state: priority of active path is 50 and priority of unavailable path is 0.</span></li> | ||
<span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">SCSI-3 ALUA</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- path priority is generated using SCSI-3 ALUA status. Detailed description can be found in this </span>[https://agilo/ticket/89573 <span style="font-size: 11pt; color: rgb(17, 85, 204); background-color: transparent; text-decoration: underline; vertical-align: baseline;">report</span>]<span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">. In short summary from mentioned report path priority is generated in following way:</span> | <li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in offline state: priority of active path is 50 and priority of offline path is 0.</span></li> | ||
<ol style="margin-top:0pt;margin-bottom:0pt;"> | <li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in transitioning state: priority of active path is 50 and priority of transitioning path is 0.</span></li> | ||
<li> | |||
<span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">Un-Kh paths are active: both path priorities are set to 50.</span> | |||
</li> | |||
<li> | |||
<span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is non optimized state: priority of active path is 50 and priority of non optimized path is 10.</span> | |||
</li> | |||
<li> | |||
<span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in standby state: priority of active path is 50 and of standby path is 1.</span> | |||
</li> | |||
<li> | |||
<span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in unavailable state: priority of active path is 50 and priority of unavailable path is 0.</span> | |||
</li> | |||
<li> | |||
<span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in offline state: priority of active path is 50 and priority of offline path is 0.</span> | |||
</li> | |||
<li> | |||
<span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in transitioning state: priority of active path is 50 and priority of transitioning path is 0.</span> | |||
</li> | |||
</ol></li> | </ol></li> | ||
<li> | <li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Vendor specific settings: EMC arrays, HP storage array, Hitachi HDS Modular storage arrays, NetApp arrays, RDAC storage controller</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- those settings can be used with specific hardware. In case of those settings multipath communicate with hardware to generate proper path priority and quite possibly paths with faster transfer gets higher priority.</span></li> | ||
<span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Vendor specific settings: EMC arrays, HP storage array, Hitachi HDS Modular storage arrays, NetApp arrays, RDAC storage controller</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- those settings can be used with specific hardware. In case of those settings multipath communicate with hardware to generate proper path priority and quite possibly paths with faster transfer gets higher priority.</span> | |||
</li> | |||
</ol> | </ol> | ||
Revision as of 12:08, 8 February 2017
Multipath specific options
This section contains attempt to explain some of multipath specific settings that are present in API and on GUI. Unfortunately it is quite hard to do because most of available data is present only in manual pages which in many cases simply do not give enough explanation. On the internet are of course some more resources for example red hat documentationor suse documentationabout multipath. But in some cases multipath implementation is slightly different than the one used in debian. Also documentation present there is simply a slightly modified copy of manual pages that as mentioned sometimes are not enough explanatory to understand all details. Moreover in order to test some of options it is required to use very specific hardware that is currently unavailable. Because of that some of following sections sometimes contains assumptions than not always are extensively tested. At the end of document is copy of multipath.conf manual page with original description of all options. Below sections contains name of particular setting that is used on GUI. If this name is much different from original multipath option name then in parentheses is supplied name used in multipath documentation. The same applies for options values usually presented as a numbered list. Particular option values used by GenesisZX as default are marked as (default) next to option name and after name of original option if it is present.
Path grouping policy
In order to understand this option it is required to explain how paths are organized by multipath. When disk is connected using multiple paths and multipath device is created those paths can be organized in groups of paths. Sometimes it might be just one group with all paths but it is not always the case. Multipath can group paths in a few ways and use them to transfer data. At particular point in time data is transferred through one path group that is selected according to its priority. Within selected path group data is transferred in round-robin fashion by default to increase throughput. Multipath has a few policies that are used to create groups of paths.
- Multibus(default) - in this policy only one path group is created using all available paths.
- Failover- every path is put in separate group.
- Group by node name- 1 group is created for each target node name. According to documentation this setting has something to with FC targets and paths are grouped using names of nodes that exports particular target (this is what I understand).
- Group by priority value- paths with same priority are added to the same group. Each group has paths with same priority.
- Group by serial number- paths that has same serial number are put in one group. I assume that paths can have different serial numbers because it is modified or reported by some controllers or other devices that are used to connect disks. Those devices can report different serial numbers (modified) for actually same disk. This may make sense to put this kind of paths in different group because for example particular path goes physically by different route.
Failback
This setting specifies how multipath handles path group that recovers from failure. According to setting multipath may do nothing or evaluate particular path group priority and switch to group if it has higher priority than currently used group.
- Manual(default) - In this setting multipath do not switches back automatically to path group that recovered. However if some path group recovered and all other failed then multipath will switch to this last available path group that previously was unavailable. This setting won’t cause any transfer break.
- Immediate- failed path groups are being monitored and as soon as multipath realises that path group recovered it is enabled immediately. But when particular path group recovers it is enabled only if it has higher priority than the one that multipath switched to. If priorities are the same then multipath has no reason to switch path group.
- Custom value- number of seconds after path group recovery that have to pass before multipath can switch to it. This setting is similar to immediate but instead of immediately switching path group multipath wait given number of seconds until it is allowed to switch path group. Basically if path group with higher priority than currently used recovers then multipath will switch to it only if it is available for specified time.
- Follower- manual says that this setting allows automatic failback only if first path in path group becomes active.
Path selector
This option specifies algorithm used to load balance traffic across paths in active path group.
- Round-robin(default) - data is split across all paths in active group and same amount of data is send through each path. Multipath simply sends some part of data to first path, next part to second path and so on. Each part of data has same size but it is possible that also weighted version of round robin algorithm is used. In that case paths with higher weight can get more data to transfer.
- Queue-length- next piece of data is send through path that has smallest queue of data that is waiting to be send.
- Service-time- similar to queue-length because it also sends next piece of data to path that has smallest queue of data waiting to be send. But size of that piece of data that is going to be send is chosen relatively to the speed of particular path.
Path checker
Setting that tells multipath how to check state of path.
- Direct I/O(default) - read first sector of disk without using any cache.
- Test Unit Ready- Use SCSI command “Test Unit Ready” to check if disk is available. On response to this command device return if it is accessible by client application.
- EMC Clariion, RDAC storage controller and HP storage array- this settings are specific for particular hardware. Some hardware vendors provides custom path checker options. Those options can be used with specified hardware.
Path priority routine (prio)
This setting allows to chose program used to obtain priority for path. Path priority is higher when it has higher value. Priorities of paths in path group are summed and group with highest priority is used when currently active group fails.
- Const(default) - generate same priority (with value 1) for all paths. Basically this means that path group has higher priority if it has more paths. This setting will also cause that weighted round robin algorithm is never used.
- Random- this setting will generate priority randomly in range 1 - 10 and assign it to path.
- SCSI-3 ALUA- path priority is generated using SCSI-3 ALUA status. Detailed description can be found in this report. In short summary from mentioned report path priority is generated in following way:
- Un-Kh paths are active: both path priorities are set to 50.
- One path is active and one is non optimized state: priority of active path is 50 and priority of non optimized path is 10.
- One path is active and one is in standby state: priority of active path is 50 and of standby path is 1.
- One path is active and one is in unavailable state: priority of active path is 50 and priority of unavailable path is 0.
- One path is active and one is in offline state: priority of active path is 50 and priority of offline path is 0.
- One path is active and one is in transitioning state: priority of active path is 50 and priority of transitioning path is 0.
- Vendor specific settings: EMC arrays, HP storage array, Hitachi HDS Modular storage arrays, NetApp arrays, RDAC storage controller- those settings can be used with specific hardware. In case of those settings multipath communicate with hardware to generate proper path priority and quite possibly paths with faster transfer gets higher priority.
Queue disabling (flush_on_last_del)
Manual says that this option will disable queueing when last path to device is removed. I made some tests by writing data (using dd) to multipath device and disconnecting paths to device or device itself. In both available settings “yes” and “no” (default) it behaved in the same way - as expected transfer was simply broken. Some additional search in google revealed that this option was introduced as a bug fix. In short summary there was reported that without this option (when it is set to no) LVM commands might hang infinitely when underlying disk is removed (which causes that all paths are removed).
Path retry (no_path_retry)
This option specifies what should happen when path fails. It controls if data should be still queued when path is failed or not. It is also possible to specify how many times multipath should reattempt to send data before it fails path.
- Disabled (fail)(default) - path is immediately considered as failed and no data is being queued.
- Infinite (queue)- data is always queued without failing path.
- Custom value- number of attempts that multipath have to do until it fails path.
No. of I/O request (rr_min_io and rr_min_io_rq)
This section discusses two settings rr_min_io and rr_min_io_rq because those settings are connected with each other. On GUI those settings are called “No. of I/O request for BIO based multipath” (rr_min_io) and “No. of I/O request for request based multipath” (rr_min_io_rq). Basically multipath manual describes rr_min_io as minimum number of I/O that have to be performed before it can switch to next path in same group and this value applies only for block based multipaths. Second setting rr_min_io_rq is described as minimum number of requests that have to be routed before it can switch to next path, this setting applies only to request based multipaths. Generally those settings looks more or less simple, by using them it is possible to set minimum amount of data that have to be send through one path before multipath switches to the next one. But there is a gotcha because we have no knowledge if particular multipath is request or block based. I have no idea from where can I fetch such information. According to that user also have no idea about type of used multipath. There is assumption that it might be related to underlying device but it is only assumption that doesn’t give any clue how to actually check it.
It is worth to mention that multipath documentation for Redhat and Suse stands that rr_min_io is not used in newer kernels it is only relevant for kernel in version 2.6.31 and earlier. Moreover I have found on the internet some articles about block based and request based multipaths for example this one. In a nutshell it turns out that block based multipaths has some issues with load balancing and using request based multipaths resolve those issues. According to article multipath user space software has ability to create both types of multipaths and request based should be prefered because of its improvements. What I think is that current implementation of multipath used in GenesisZX uses always request based multipaths and rr_min_io setting is never actually used. I was searching for any way to check if multipath is request or block based. On Redhat mailing list I have found following postin which is question when rr_min_io is used and when rr_min_io_rq is used. Answer that is present there is valid with Redhat documentation (rr_min_io is for older kernels) but there is also showed how to check which value is set in device mapper. I checked it on virtual machine that has configured multipath using iSCSI storage and rr_min_io_rq was in use according to method described in mailing list.
In order to check which value is used it is necessary to execute command:
dmsetup table DEVICE_NAME
# DEVICE_NAME could be for example /dev/dm-10
Command returns following output:
0 41943040 multipath 0 0 1 1 round-robin 0 2 1 8:112 4 8:144 4
Numbers like 8:112and 8:144are major:minor and value just after them is the one that multipath sets for minimum I/O before switching to next path. Above output was obtained for configuration that had two paths in group. In my case this value is same as value set to rr_min_io_rq. Of course I switched it several times and each time same value was set in dmsetup as the one I put in multipath configuration. One more test was performed but this time multipath was created on SATA disk. Also in this case rr_min_io_rq was used. This agrees with my assumption that currently only request based multipaths are created.
Default value for rr_min_io_rq is 1 and for rr_min_io is 1000.
Path weight
This option allows to select method used to assign weight to paths in a group. At the beginning available settings are explained and after that is actual explanation what this option is for (to some extent it is assumption). Setting allows two values:
- Uniform(default) - all paths has same weight.
- Priorities- weight of each path is calculated by multiplying path priority times rr_min_io_rq (or rr_min_io if it is used but as explained in previous section rather not).
Now it is required to explain what is a purpose of path weight. Most probably it is used in weighted round robin algorithm that calculates how much data should be send through particular path in active path group. Path with higher weight is considered to be faster than path with lower weight. Algorithm simply send more data through path with higher weight in order to better balance load. In case of uniform setting use of weighted round robin algorithm is technically disabled because each patch has same weight. In case of priorities weight makes actual impact only if multipath set different priorities to particular paths. Because if priorities have the same value then also weight of each path is different and rr_min_io_rq can be set for whole multipath only but not for single path. See path priority routineto check how multipath can assign priorities to paths.