Comments on: Beware of svctm in Linux’s iostat http://www.xaprb.com/blog/2010/09/06/beware-of-svctm-in-linuxs-iostat/ Stay curious! Thu, 02 May 2013 12:36:53 +0000 hourly 1 http://wordpress.org/?v=3.5.1 By: Zahid Haseeb http://www.xaprb.com/blog/2010/09/06/beware-of-svctm-in-linuxs-iostat/#comment-19936 Zahid Haseeb Fri, 16 Mar 2012 20:06:40 +0000 http://www.xaprb.com/blog/?p=2004#comment-19936 @ Rudolf
“”Sometime, i have very high await and svctm, but very low %util:”"

await=service queue + service time
for example
await=2+15 can create a high await time:

I think you have low IOPS and high service time will generate low util because you dont have much IOPS….I feel you have old hard disk thats why its taking much time in service time. I doubt on your service time.

]]>
By: Rudolf http://www.xaprb.com/blog/2010/09/06/beware-of-svctm-in-linuxs-iostat/#comment-18965 Rudolf Tue, 30 Nov 2010 14:37:55 +0000 http://www.xaprb.com/blog/?p=2004#comment-18965 Sometime, i have very high await and svctm, but very low %util:

Device: rrqm/s wrqm/s r/s w/s rsec/
s wsec/s avgrq-sz avgqu-sz await svctm %util Conc. Util.
Thu Nov 4 02:33:13 CET 2010 dm-11 0.00 0.00 0.00 0.02 0.0
0 0.03 2.00 0.01 476.00 476.00 0.79 C:0.952 U:0.952

Also calculating the above Concurrency and Utilisation (here shown as C: and U:) does not help me to understand what is happening on the server.

]]>
By: Rudolf http://www.xaprb.com/blog/2010/09/06/beware-of-svctm-in-linuxs-iostat/#comment-18964 Rudolf Tue, 30 Nov 2010 11:29:06 +0000 http://www.xaprb.com/blog/?p=2004#comment-18964 I added the formulas for Concurrancy and Utilisation to the output of iostat, multiplied by 100 to get comparable numbers:

sda 0.33 2.44 1.37 0.97 93.15 27.27 46.57 13.64 51.44 0.03 13.38 4.49 1.05 C:3.13092 U:1.05066
sda 16.12 24.32 105.01 1.00 4693.49 202.60 2346.75 101.30 46.19 1.32 12.50 9.05 95.91 C:132.513 U:95.9391
sda 5.59 28.97 41.06 12.39 1594.41 330.87 797.20 165.43 36.02 4.65 86.99 17.69 94.54 C:464.962 U:94.553
sda 3.90 19.02 32.83 0.70 1198.80 157.76 599.40 78.88 40.45 1.18 35.19 27.71 92.91 C:117.992 U:92.9116
sda 4.40 10.69 31.57 0.90 1294.71 92.71 647.35 46.35 42.73 1.20 36.71 27.76 90.13 C:119.197 U:90.1367
sda 5.00 19.20 34.40 0.90 1316.80 160.80 658.40 80.40 41.86 1.16 32.92 26.02 91.86 C:116.208 U:91.8506
sda 3.20 19.52 31.53 5.21 1257.26 198.60 628.63 99.30 39.63 1.66 45.07 24.86 91.31 C:165.587 U:91.3356
sda 4.40 13.19 41.96 0.60 1313.89 110.29 656.94 55.14 33.46 1.14 26.84 22.37 95.20 C:114.231 U:95.2067
sda 18.10 6.00 43.00 0.70 2532.80 52.80 1266.40 26.40 59.17 1.65 37.86 22.43 98.02 C:165.448 U:98.0191
sda 27.10 22.50 56.30 9.30 3708.80 254.40 1854.40 127.20 60.41 3.81 58.17 14.78 96.94 C:381.595 U:96.9568
sda 3.60 13.00 40.90 1.40 1260.80 115.20 630.40 57.60 32.53 1.04 24.48 21.77 92.08 C:103.55 U:92.0871
sda 3.90 9.01 39.24 0.90 1145.15 79.28 572.57 39.64 30.50 1.23 30.75 23.75 95.34 C:123.43 U:95.3325
sda 1.50 13.59 41.16 0.80 684.92 115.08 342.46 57.54 19.07 0.99 23.62 22.27 93.45 C:99.1095 U:93.4449
sda 1.00 21.90 67.60 11.70 709.60 268.80 354.80 134.40 12.34 8.16 102.87 11.65 92.40 C:815.759 U:92.3845
sda 0.00 18.12 0.00 14.51 0.00 261.06 0.00 130.53 17.99 1.16 79.71 1.69 2.45 C:115.659 U:2.45219

Utilisation seems to be the same as %util
But i still don´t get the meaning of Concurrency.

Can anybody help?

]]>
By: Rudolf http://www.xaprb.com/blog/2010/09/06/beware-of-svctm-in-linuxs-iostat/#comment-18960 Rudolf Tue, 30 Nov 2010 01:09:33 +0000 http://www.xaprb.com/blog/?p=2004#comment-18960 Fantastic discussion – thank you!

I am still interested in concrete examples or calculations about how to be able to say that a (database or other app) system has a “disk problem”, meaning that services times are too large and require striping about more disks to fix that.

Would the above formula

U = XS
utilization = (r/s + w/s) * (svctm/1000)

be the right way to say that the disk (or disk array) is too small / too slow to handle the burden?

Is this formula “better” than what “iostat” calculates for %util?

Thank you for any hint.

]]>
By: Nathan Webb http://www.xaprb.com/blog/2010/09/06/beware-of-svctm-in-linuxs-iostat/#comment-18927 Nathan Webb Fri, 19 Nov 2010 01:07:55 +0000 http://www.xaprb.com/blog/?p=2004#comment-18927 Hi Darcy,

I suggest that you should read through the comments to get an understanding of where we are up to with this.

Just to re-iterate, svctm is the average amount of time that the disk spends servicing IOs, excluding wait time. await is the svctm + the average time spend waiting in the queue.

The examples you have used are problematic, and give a false understanding of the differences and meanings of service time and average wait time.

Firstly, as I’ve stated (and restated), “Sure, ‘utilization’ is wrong for an array, but for a single disk this is ok, isn’t it?”. So discussions about a 10 disk array and 10 concurrent IOs aren’t useful as the results will be erroneous.

Your next example, of 10 IOs arriving one at a time, is rather unusual. Most operations arrive in a random manner, mathematically known as a poisson distribution. With a random, memory-less arrival rate, you will get some transactions arriving at roughly the same time, and some of those transactions will need to wait to be serviced. In your example, each transaction arrives exactly as the previous one completes, and therefore the wait time = 0. This can happen, but only in unusual, non-random circumstances. The result of wait time = 0 means that await = (svctm + 0) = svctm, as your example shows. Normally when there are multiple IOs, wait time will be non-zero, and await will not equal svctm.

This statement is wrong: “await is the average time for a single IO operation to complete, svctm is the average time over all of the IO operations”

It’s wrong for several reasons, one reason being that a single IO operation doesn’t have an average time. An average can only apply to multiples.

Now, to address some of the confusion, and the question that Bhavin asks, there are ways to estimate the concurrency. You can use the average size of each IO, and knowledge about the RAID configuration, to very roughly estimate the concurrency (over long periods of time). BTW, I made a typo earlier when I said that iostat overestimates the svctm for concurrent transactions. That should be underestimates, as Darcy’s example so clearly shows. I would also use the physical disk characteristics (calculate theoretical service time) to validate my estimates. This can be useful, but I generally just create a model that links the application response time with the number of disk IOs. If that doesn’t work, then I might do the above to fine-tune my model.

It’s definitely not precise, and as xaprb makes clear, the RAID controller will do some merging and re-ordering, etc…

]]>