MPIO global settings: Difference between revisions

From Scalelogic Wiki
Jump to navigation Jump to search
migrate>Mi-S
No edit summary
m 1 revision
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== <span style="background-color: transparent; color: rgb(0, 0, 0); font-family: Arial; font-size: 16pt; font-weight: 400;">Path grouping policy</span> ==
== Path grouping policy ==


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">In order to understand this option it is required to explain how paths are organized by multipath. When disk is connected using multiple paths and multipath device is created those paths can be organized in groups of paths. Sometimes it might be just one group with all paths but it is not always the case. Multipath can group paths in a few ways and use them to transfer data. At particular point in time data is transferred through one path group that is selected according to its priority. Within selected path group data is transferred in round-robin fashion by default to increase throughput. Multipath has a few policies that are used to create groups of paths.</span>
In order to understand this option it is required to explain how paths are organized by multipath. When disk is connected using multiple paths and multipath device is created those paths can be organized in groups of paths. Sometimes it might be just one group with all paths but it is not always the case. Multipath can group paths in a few ways and use them to transfer data. At particular point in time data is transferred through one path group that is selected according to its priority. Within selected path group data is transferred in round-robin fashion by default to increase throughput. Multipath has a few policies that are used to create groups of paths.
<ol style="margin-top:0pt;margin-bottom:0pt;">
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Multibus</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">(default) - in this policy only one path group is created using all available paths.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Failover</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- every path is put in separate group.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Group by node name</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- 1 group is created for each target node name. According to documentation this setting has something to with FC targets and paths are grouped using names of nodes that exports particular target (this is what I understand).</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Group by priority value</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- paths with same priority are added to the same group. Each group has paths with same priority.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Group by serial number</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- paths that has same serial number are put in one group. I assume that paths can have different serial numbers because it is modified or reported by some controllers or other devices that are used to connect disks. Those devices can report different serial numbers (modified) for actually same disk. This may make sense to put this kind of paths in different group because for example particular path goes physically by different route.</span></li>
</ol>


== <span style="font-size: 16pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-weight: 400; vertical-align: baseline;">Failback</span> ==
#'''Multibus'''(default) - in this policy only one path group is created using all available paths.
#'''Failover''' - every path is put in separate group.
#'''Group by node name''' - 1 group is created for each target node name.
#'''Group by priority value''' - paths with the same priority are added to the same group. Each group has paths with same priority.
#'''Group by serial number''' - paths that has same serial number are put in one group. We assume that paths can have different serial numbers because it is modified or reported by some controllers or other devices that are used to connect disks. Those devices can report different serial numbers (modified) for actually same disk. This may make sense to put this kind of paths in different group because for example particular path goes physically by different route.


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">This setting specifies how multipath handles path group that recovers from failure. According to setting multipath may do nothing or evaluate particular path group priority and switch to group if it has higher priority than currently used group.</span>
== Failback ==
<ol style="margin-top:0pt;margin-bottom:0pt;">
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Manual</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">(default) - In this setting multipath do not switches back automatically to path group that recovered. However if some path group recovered and all other failed then multipath will switch to this last available path group that previously was unavailable. This setting won’t cause any transfer break.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Immediate</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- failed path groups are being monitored and as soon as multipath realises that path group recovered it is enabled immediately. But when particular path group recovers it is enabled only if it has higher priority than the one that multipath switched to. If priorities are the same then multipath has no reason to switch path group.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Custom value</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- number of seconds after path group recovery that have to pass before multipath can switch to it. This setting is similar to immediate but instead of immediately switching path group multipath wait given number of seconds until it is allowed to switch path group. Basically if path group with higher priority than currently used recovers then multipath will switch to it only if it is available for specified time.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Follower</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- manual says that this setting allows automatic failback only if first path in path group becomes active.</span></li>
</ol>


== <span style="font-size: 16pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-weight: 400; vertical-align: baseline;">Path selector</span> ==
This setting specifies how multipath handles path group that recovers from failure. According to setting multipath may do nothing or evaluate particular path group priority and switch to group if it has higher priority than currently used group.


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">This option specifies algorithm used to load balance traffic across paths in active path group.</span>
#'''Manual'''(default) - In this setting multipath do not switches back automatically to path group that recovered. However if some path group recovered and all other failed then multipath will switch to this last available path group that previously was unavailable. This setting won’t cause any transfer break.
<ol style="margin-top:0pt;margin-bottom:0pt;">
#'''Immediate''' - failed path groups are being monitored and as soon as multipath realises that path group recovered it is enabled immediately. But when particular path group recovers it is enabled only if it has higher priority than the one that multipath switched to. If priorities are the same then multipath has no reason to switch path group.
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Round-robin</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">(default) - data is split across all paths in active group and same amount of data is send through each path. Multipath simply sends some part of data to first path, next part to second path and so on. Each part of data has same size but it is possible that also weighted version of round robin algorithm is used. In that case paths with higher weight can get more data to transfer.</span></li>
#'''Custom value''' - number of seconds after path group recovery that have to pass before multipath can switch to it. This setting is similar to immediate but instead of immediately switching path group multipath wait given number of seconds until it is allowed to switch path group. Basically if path group with higher priority than currently used recovers then multipath will switch to it only if it is available for specified time.
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Queue-length</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- next piece of data is send through path that has smallest queue of data that is waiting to be send.</span></li>
#'''Follower''' - this setting allows automatic failback only if first path in path group becomes active.
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Service-time</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- similar to queue-length because it also sends next piece of data to path that has smallest queue of data waiting to be send. But size of that piece of data that is going to be send is chosen relatively to the speed of particular path.</span></li>
</ol>


== <span style="font-size: 16pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-weight: 400; vertical-align: baseline;">Path checker</span> ==
== Path selector ==


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">Setting that tells multipath how to check state of path.</span>
This option specifies algorithm used to load balance traffic across paths in active path group.
<ol style="margin-top:0pt;margin-bottom:0pt;">
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Direct I/O</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">(default) - read first sector of disk without using any cache.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Test Unit Ready</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- Use SCSI command “Test Unit Ready” to check if disk is available. On response to this command device return if it is accessible by client application.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">EMC Clariion, RDAC storage controller and HP storage array</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- this settings are specific for particular hardware. Some hardware vendors provides custom path checker options. Those options can be used with specified hardware.</span></li>
</ol>


== <span style="font-size: 16pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-weight: 400; vertical-align: baseline;">Path priority routine (prio)</span> ==
#'''Round-robin'''(default) - data is split across all paths in active group and same amount of data is send through each path. Multipath simply sends some part of data to first path, next part to second path and so on. Each part of data has same size but it is possible that also weighted version of round robin algorithm is used. In that case paths with higher weight can get more data to transfer.
#'''Queue-length''' - next piece of data is send through path that has smallest queue of data that is waiting to be send.
#'''Service-time''' - similar to queue-length because it also sends next piece of data to path that has smallest queue of data waiting to be send. But size of that piece of data that is going to be send is chosen relatively to the speed of particular path.


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">This setting allows to chose program used to obtain priority for path. Path priority is higher when it has higher value. Priorities of paths in path group are summed and group with highest priority is used when currently active group fails.</span>
== Path checker ==
<ol style="margin-top:0pt;margin-bottom:0pt;">
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Const</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">(default) - generate same priority (with value 1) for all paths. Basically this means that path group has higher priority if it has more paths. This setting will also cause that weighted round robin algorithm is never used</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Random</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- this setting will generate priority randomly in range 1 - 10 and assign it to path.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">SCSI-3 ALUA</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- path priority is generated using SCSI-3 ALUA status. Pa</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">th priority is generated in following way:</span><ol style="margin-top:0pt;margin-bottom:0pt;">
<li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">Un-Kh paths are active: both path priorities are set to 50.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is non optimized state: priority of active path is 50 and priority of non optimized path is 10.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in standby state: priority of active path is 50 and of standby path is 1.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in unavailable state: priority of active path is 50 and priority of unavailable path is 0.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in offline state: priority of active path is 50 and priority of offline path is 0.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">One path is active and one is in transitioning state: priority of active path is 50 and priority of transitioning path is 0.</span></li>
</ol></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Vendor specific settings: EMC arrays, HP storage array, Hitachi HDS Modular storage arrays, NetApp arrays, RDAC storage controller</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- those settings can be used with specific hardware. In case of those settings multipath communicate with hardware to generate proper path priority and quite possibly paths with faster transfer gets higher priority.</span></li>
</ol>


== <span style="font-size: 16pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-weight: 400; vertical-align: baseline;">Queue disabling (flush_on_last_del)</span> ==
Setting that tells multipath how to check state of path.


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">Manual says that this option will disable queueing when last path to device is removed. I made some tests by writing data (using dd) to multipath device and disconnecting paths to device or device itself. In both available settings “yes” and “no” (default) it behaved in the same way - as expected transfer was simply broken. Some additional search in google revealed that this option was introduced as a </span>[https://bugzilla.redhat.com/show_bug.cgi?id=430494 <span style="font-size: 11pt; font-family: Arial; color: rgb(17, 85, 204); background-color: transparent; text-decoration: underline; vertical-align: baseline;">bug fix</span>]<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">. In short summary there was reported that without this option (when it is set to no) LVM commands might hang infinitely when underlying disk is removed (which causes that all paths are removed).</span>
#'''Direct I/O''' - read first sector of disk without using any cache.
#'''Test Unit Ready''' (default) - Use SCSI command “Test Unit Ready” to check if disk is available. On response to this command device return if it is accessible by client application.
#'''EMC Clariion, RDAC storage controller and HP storage array''' - this settings are specific for particular hardware. Some hardware vendors provides custom path checker options. Those options can be used with specified hardware.


== <span style="font-size: 16pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-weight: 400; vertical-align: baseline;">Path retry (no_path_retry)</span> ==
== Path priority routine (prio) ==


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">This option specifies what should happen when path fails. It controls if data should be still queued when path is failed or not. It is also possible to specify how many times multipath should reattempt to send data before it fails path.</span>
This setting allows to chose program used to obtain priority for path. Path priority is higher when it has higher value. Priorities of paths in path group are summed and group with highest priority is used when currently active group fails.
<ol style="margin-top:0pt;margin-bottom:0pt;">
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Disabled (fail)</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">(default) - path is immediately considered as failed and no data is being queued.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Infinite (queue)</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- data is always queued without failing path.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Custom value</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- number of attempts that multipath have to do until it fails path.</span></li>
</ol>


== <span style="font-size: 16pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-weight: 400; vertical-align: baseline;">No. of I/O request (rr_min_io and rr_min_io_rq)</span> ==
#'''Const'''(default) - generate same priority (with value 1) for all paths. Basically this means that path group has higher priority if it has more paths. This setting will also cause that weighted round robin algorithm is never used.
#'''Random''' - this setting will generate priority randomly in range 1 - 10 and assign it to path.
#'''SCSI-3 ALUA''' - path priority is generated using SCSI-3 ALUA status. Path priority is generated in following way:
##Un-Kh paths are active: both path priorities are set to 50.
##One path is active and one is non optimized state: priority of active path is 50 and priority of non optimized path is 10.
##One path is active and one is in standby state: priority of active path is 50 and of standby path is 1.
##One path is active and one is in unavailable state: priority of active path is 50 and priority of unavailable path is 0.
##One path is active and one is in offline state: priority of active path is 50 and priority of offline path is 0.
##One path is active and one is in transitioning state: priority of active path is 50 and priority of transitioning path is 0.
#'''Vendor specific settings: EMC arrays, HP storage array, Hitachi HDS Modular storage arrays, NetApp arrays, RDAC storage controller''' - those settings can be used with specific hardware. In case of those settings multipath communicate with hardware to generate proper path priority and quite possibly paths with faster transfer gets higher priority.


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">This section discusses two settings rr_min_io and rr_min_io_rq because those settings are connected with each other. On GUI those settings are called “No. of I/O request for BIO based multipath” (rr_min_io) and “No. of I/O request for request based multipath” (rr_min_io_rq). Basically multipath manual describes rr_min_io as minimum number of I/O that have to be performed before it can switch to next path in same group and this value applies only for block based multipaths. Second setting rr_min_io_rq is described as minimum number of requests that have to be routed before it can switch to next path, this setting applies only to request based multipaths. Generally those settings looks more or less simple, by using them it is possible to set minimum amount of data that have to be send through one path before multipath switches to the next one. But there is a gotcha because we have no knowledge if particular multipath is request or block based. I have no idea from where can I fetch such information. According to that user also have no idea about type of used multipath. There is assumption that it might be related to underlying device but it is only assumption that doesn’t give any clue how to actually check it.</span>
== Queue disabling (flush_on_last_del) ==


This option will disable queueing when last path to device is removed.


== Path retry (no_path_retry) ==


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">It is worth to mention that multipath documentation for Redhat and Suse stands that rr_min_io is not used in newer kernels it is only relevant for kernel in version 2.6.31 and earlier. Moreover I have found on the internet some articles about block based and request based multipaths for example </span>[https://www.kernel.org/doc/ols/2007/ols2007v2-pages-235-244.pdf <span style="font-size: 11pt; font-family: Arial; color: rgb(17, 85, 204); background-color: transparent; text-decoration: underline; vertical-align: baseline;">this one</span>]<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">. In a nutshell it turns out that block based multipaths has some issues with load balancing and using request based multipaths resolve those issues. According to article multipath user space software has ability to create both types of multipaths and request based should be prefered because of its improvements. What I think is that current implementation of multipath used in GenesisZX uses always request based multipaths and rr_min_io setting is never actually used. I was searching for any way to check if multipath is request or block based. On Redhat mailing list I have found following </span>[https://www.redhat.com/archives/dm-devel/2014-October/msg00165.html <span style="font-size: 11pt; font-family: Arial; color: rgb(17, 85, 204); background-color: transparent; text-decoration: underline; vertical-align: baseline;">post</span>]<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">in which is question when rr_min_io is used and when rr_min_io_rq is used. Answer that is present there is valid with Redhat documentation (rr_min_io is for older kernels) but there is also showed how to check which value is set in device mapper. I checked it on virtual machine that has configured multipath using iSCSI storage and rr_min_io_rq was in use according to method described in mailing list.</span>
This option specifies what should happen when path fails. It controls if data should be still queued when path is failed or not. It is also possible to specify how many times multipath should reattempt to send data before it fails path.


#'''Disabled (fail)'''(default) - path is immediately considered as failed and no data is being queued.
#'''Infinite (queue)''' - data is always queued without failing path.
#'''Custom value''' - number of attempts that multipath have to do until it fails path.


== No. of I/O request (rr_min_io and rr_min_io_rq)> ==


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">In order to check which value is used it is necessary to execute command:</span>
This section discusses two settings rr_min_io and rr_min_io_rq because those settings are connected with each other. On GUI those settings are called “No. of I/O request for BIO based multipath” (rr_min_io) and “No. of I/O request for request based multipath” (rr_min_io_rq). We can describe rr_min_io as minimum number of I/O that have to be performed before it can switch to next path in same group and this value applies only for block based multipaths. Second setting rr_min_io_rq is a minimum number of requests that have to be routed before it can switch to next path, this setting applies only to request based multipaths. By using those settings it is possible to set minimum amount of data that have to be send through one path before multipath switches to the next one.


<span style="font-size: 11pt; font-family: Consolas; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">dmsetup table DEVICE_NAME</span>
Default value for rr_min_io_rq is 1 and for rr_min_io is 1000.


<span style="font-size: 11pt; font-family: Consolas; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;"># DEVICE_NAME could be for example /dev/dm-10</span>
== Path weight ==


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">Command returns following output:</span>
This option allows to select method used to assign weight to paths in a group. Setting allows two values:


<span style="font-size: 11pt; font-family: Consolas; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">0 41943040 multipath 0 0 1 1 round-robin 0 2 1 8:112 4 8:144 4</span>
#'''Uniform'''(default) - all paths has same weight.
#'''Priorities''' - weight of each path is calculated by multiplying path priority times rr_min_io_rq (or rr_min_io if it is used but as explained in previous section rather not).


<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">Numbers like </span><span style="font-size: 11pt; font-family: Consolas; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">8:112</span><span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">and </span><span style="font-size: 11pt; font-family: Consolas; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">8:144</span><span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">are major:minor and value just after them is the one that multipath sets for minimum I/O before switching to next path. Above output was obtained for configuration that had two paths in group. In my case this value is same as value set to rr_min_io_rq. Of course I switched it several times and each time same value was set in dmsetup as the one I put in multipath configuration. One more test was performed but this time multipath was created on SATA disk. Also in this case rr_min_io_rq was used. This agrees with my assumption that currently only request based multipaths are created.</span>
<br/>Most probably Path weight is used in weighted round robin algorithm that calculates how much data should be send through particular path in active path group. Path with higher weight is considered to be faster than path with lower weight. Algorithm simply send more data through path with higher weight in order to better balance load. In case of uniform setting use of weighted round robin algorithm is technically disabled because each patch has same weight. In case of priorities weight makes actual impact only if multipath set different priorities to particular paths. Because if priorities have the same value then also weight of each path is different and rr_min_io_rq can be set for whole multipath only but not for single path.


 
[[Category:Help topics]]
 
<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">Default value for rr_min_io_rq is 1 and for rr_min_io is 1000.</span>
 
== <span style="font-size: 16pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-weight: 400; vertical-align: baseline;">Path weight</span> ==
 
<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">This option allows to select method used to assign weight to paths in a group. At the beginning available settings are explained and after that is actual explanation what this option is for (to some extent it is assumption). Setting allows two values:</span>
<ol style="margin-top:0pt;margin-bottom:0pt;">
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Uniform</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">(default) - all paths has same weight.</span></li>
<li><span style="font-size: 11pt; background-color: transparent; font-weight: 700; vertical-align: baseline;">Priorities</span><span style="font-size: 11pt; background-color: transparent; vertical-align: baseline;">- weight of each path is calculated by multiplying path priority times rr_min_io_rq (or rr_min_io if it is used but as explained in previous section rather not).</span></li>
</ol>
 
<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">Now it is required to explain what is a purpose of path weight. Most probably it is used in weighted round robin algorithm that calculates how much data should be send through particular path in active path group. Path with higher weight is considered to be faster than path with lower weight. Algorithm simply send more data through path with higher weight in order to better balance load. In case of uniform setting use of weighted round robin algorithm is technically disabled because each patch has same weight. In case of priorities weight makes actual impact only if multipath set different priorities to particular paths. Because if priorities have the same value then also weight of each path is different and rr_min_io_rq can be set for whole multipath only but not for single path. See </span>[https://docs.google.com/document/d/1Y15ap4EfeMnfSZc4e77InGkSTymN9Q52eDnm6DXXdqk/edit#heading=h.eqta6ifvcs0x <span style="font-size: 11pt; font-family: Arial; color: rgb(17, 85, 204); background-color: transparent; text-decoration: underline; vertical-align: baseline;">path priority routine</span>]<span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; vertical-align: baseline;">to check how multipath can assign priorities to paths.</span>

Latest revision as of 10:03, 3 November 2023

Path grouping policy

In order to understand this option it is required to explain how paths are organized by multipath. When disk is connected using multiple paths and multipath device is created those paths can be organized in groups of paths. Sometimes it might be just one group with all paths but it is not always the case. Multipath can group paths in a few ways and use them to transfer data. At particular point in time data is transferred through one path group that is selected according to its priority. Within selected path group data is transferred in round-robin fashion by default to increase throughput. Multipath has a few policies that are used to create groups of paths.

  1. Multibus(default) - in this policy only one path group is created using all available paths.
  2. Failover - every path is put in separate group.
  3. Group by node name - 1 group is created for each target node name.
  4. Group by priority value - paths with the same priority are added to the same group. Each group has paths with same priority.
  5. Group by serial number - paths that has same serial number are put in one group. We assume that paths can have different serial numbers because it is modified or reported by some controllers or other devices that are used to connect disks. Those devices can report different serial numbers (modified) for actually same disk. This may make sense to put this kind of paths in different group because for example particular path goes physically by different route.

Failback

This setting specifies how multipath handles path group that recovers from failure. According to setting multipath may do nothing or evaluate particular path group priority and switch to group if it has higher priority than currently used group.

  1. Manual(default) - In this setting multipath do not switches back automatically to path group that recovered. However if some path group recovered and all other failed then multipath will switch to this last available path group that previously was unavailable. This setting won’t cause any transfer break.
  2. Immediate - failed path groups are being monitored and as soon as multipath realises that path group recovered it is enabled immediately. But when particular path group recovers it is enabled only if it has higher priority than the one that multipath switched to. If priorities are the same then multipath has no reason to switch path group.
  3. Custom value - number of seconds after path group recovery that have to pass before multipath can switch to it. This setting is similar to immediate but instead of immediately switching path group multipath wait given number of seconds until it is allowed to switch path group. Basically if path group with higher priority than currently used recovers then multipath will switch to it only if it is available for specified time.
  4. Follower - this setting allows automatic failback only if first path in path group becomes active.

Path selector

This option specifies algorithm used to load balance traffic across paths in active path group.

  1. Round-robin(default) - data is split across all paths in active group and same amount of data is send through each path. Multipath simply sends some part of data to first path, next part to second path and so on. Each part of data has same size but it is possible that also weighted version of round robin algorithm is used. In that case paths with higher weight can get more data to transfer.
  2. Queue-length - next piece of data is send through path that has smallest queue of data that is waiting to be send.
  3. Service-time - similar to queue-length because it also sends next piece of data to path that has smallest queue of data waiting to be send. But size of that piece of data that is going to be send is chosen relatively to the speed of particular path.

Path checker

Setting that tells multipath how to check state of path.

  1. Direct I/O - read first sector of disk without using any cache.
  2. Test Unit Ready (default) - Use SCSI command “Test Unit Ready” to check if disk is available. On response to this command device return if it is accessible by client application.
  3. EMC Clariion, RDAC storage controller and HP storage array - this settings are specific for particular hardware. Some hardware vendors provides custom path checker options. Those options can be used with specified hardware.

Path priority routine (prio)

This setting allows to chose program used to obtain priority for path. Path priority is higher when it has higher value. Priorities of paths in path group are summed and group with highest priority is used when currently active group fails.

  1. Const(default) - generate same priority (with value 1) for all paths. Basically this means that path group has higher priority if it has more paths. This setting will also cause that weighted round robin algorithm is never used.
  2. Random - this setting will generate priority randomly in range 1 - 10 and assign it to path.
  3. SCSI-3 ALUA - path priority is generated using SCSI-3 ALUA status. Path priority is generated in following way:
    1. Un-Kh paths are active: both path priorities are set to 50.
    2. One path is active and one is non optimized state: priority of active path is 50 and priority of non optimized path is 10.
    3. One path is active and one is in standby state: priority of active path is 50 and of standby path is 1.
    4. One path is active and one is in unavailable state: priority of active path is 50 and priority of unavailable path is 0.
    5. One path is active and one is in offline state: priority of active path is 50 and priority of offline path is 0.
    6. One path is active and one is in transitioning state: priority of active path is 50 and priority of transitioning path is 0.
  4. Vendor specific settings: EMC arrays, HP storage array, Hitachi HDS Modular storage arrays, NetApp arrays, RDAC storage controller - those settings can be used with specific hardware. In case of those settings multipath communicate with hardware to generate proper path priority and quite possibly paths with faster transfer gets higher priority.

Queue disabling (flush_on_last_del)

This option will disable queueing when last path to device is removed.

Path retry (no_path_retry)

This option specifies what should happen when path fails. It controls if data should be still queued when path is failed or not. It is also possible to specify how many times multipath should reattempt to send data before it fails path.

  1. Disabled (fail)(default) - path is immediately considered as failed and no data is being queued.
  2. Infinite (queue) - data is always queued without failing path.
  3. Custom value - number of attempts that multipath have to do until it fails path.

No. of I/O request (rr_min_io and rr_min_io_rq)>

This section discusses two settings rr_min_io and rr_min_io_rq because those settings are connected with each other. On GUI those settings are called “No. of I/O request for BIO based multipath” (rr_min_io) and “No. of I/O request for request based multipath” (rr_min_io_rq). We can describe rr_min_io as minimum number of I/O that have to be performed before it can switch to next path in same group and this value applies only for block based multipaths. Second setting rr_min_io_rq is a minimum number of requests that have to be routed before it can switch to next path, this setting applies only to request based multipaths. By using those settings it is possible to set minimum amount of data that have to be send through one path before multipath switches to the next one.

Default value for rr_min_io_rq is 1 and for rr_min_io is 1000.

Path weight

This option allows to select method used to assign weight to paths in a group. Setting allows two values:

  1. Uniform(default) - all paths has same weight.
  2. Priorities - weight of each path is calculated by multiplying path priority times rr_min_io_rq (or rr_min_io if it is used but as explained in previous section rather not).


Most probably Path weight is used in weighted round robin algorithm that calculates how much data should be send through particular path in active path group. Path with higher weight is considered to be faster than path with lower weight. Algorithm simply send more data through path with higher weight in order to better balance load. In case of uniform setting use of weighted round robin algorithm is technically disabled because each patch has same weight. In case of priorities weight makes actual impact only if multipath set different priorities to particular paths. Because if priorities have the same value then also weight of each path is different and rr_min_io_rq can be set for whole multipath only but not for single path.