# Xaprb

Stay curious!

## How to subtract in SQL over samples that wrap

This article explains how to do subtraction in SQL over samples that wrap back to zero when they exceed a boundary.

A reader wrote in with a question about how to find how much traffic has passed through a network interface. The reader has a script that samples the interface’s statistics and stores them in a database. The statistics wrap back around when they exceed the maximum size of an integer, so it’s not a strictly increasing sequence. The question, paraphrased, is “how can I find out how much traffic has gone through the interface in any given time period?”

A key assumption is that the counter never wraps back to zero more than once between samples. If it does, all hope is lost.

### Setup

To simplify the math, pretend the counter wraps at 1,000 and you have the following table:

```create table samples(
num int not null auto_increment primary key,
bytes int not null
);

insert into samples(bytes) values
(100), (900),
(230), (700), (982),
(163), (600);

select * from samples;
+-----+-------+
| num | bytes |
+-----+-------+
|   1 |   100 |
|   2 |   900 |
|   3 |   230 |
|   4 |   700 |
|   5 |   982 |
|   6 |   163 |
|   7 |   600 |
+-----+-------+```

### How much traffic?

A manual calculation is easier than it looks, and solving this by hand is the key to solving it in SQL. You don’t have to do a bunch of nasty math, like subtracting 982 from 163 (that’s already too hard for me). You just have to notice where the counter wraps. You can find these places by seeing where the number decreases from one sample to the next. In the example, the counter wraps twice: from 900 to 230, and from 982 to 163. Here’s the data, graphed with wraps “unrolled.”

There are several ways to proceed from here. One way is to calculate the traffic as 1,000 times the number of wraps. Then you just do a little math to “clean up the edges:” subtract the first number in the sequence, and then add the last number. This gives (2 * 1000) – 100 + 600, which is 2500.

Another approach is to go row by row, summing the differences from the previous row and the last row. When the counter wraps, you add 1000 before taking the difference. This math gives the same answer. This is a lot harder to do by hand.

Either technique works given an arbitrary start and end point in the sequence. Now let’s see how to do these in SQL.

### Problem: find the “previous” row

While these methods seem easy to humans, they resist many relational solutions because of the notion of “previous row.” SQL is set-oriented, and doesn’t do iterative row-by-row data manipulation. If you try to do this by grouping each strictly increasing set of data together and using aggregate functions like `SUM`, you’ll have trouble. You need the values from the “previous set” to do that, and that doesn’t work like you might want it to.

However, it’s not that hard to get the current and last row matched up side-by-side so you can operate upon them within the context of a single row:

```select cur.num, cur.bytes, prev.bytes as prev_bytes
from samples as cur
left outer join samples as prev on cur.num = prev.num + 1;
+-----+-------+------------+
| num | bytes | prev_bytes |
+-----+-------+------------+
|   1 |   100 |       NULL |
|   2 |   900 |        100 |
|   3 |   230 |        900 |
|   4 |   700 |        230 |
|   5 |   982 |        700 |
|   6 |   163 |        982 |
|   7 |   600 |        163 |
+-----+-------+------------+```

Once you think of “previous” in SQL terms, it becomes easy. Armed with this tool, we are ready to take on the queries.

### Technique 1: count wraps and clean up the edges

Now that we’ve figured out how to find the “previous row,” how can we express the “count wraps and clean up edges” algorithm in SQL? Brace yourself:

```select 1000 * sum(t1.wraps) - t2.start + o.bytes as total
from samples as o
inner join (
select cur.num, count(prev.num) as wraps
from samples as cur
left outer join samples as prev on cur.num = prev.num + 1
and cur.bytes < prev.bytes
group by cur.num
) as t1 on t1.num <= o.num
cross join (
select bytes as start from samples order by num limit 1
) as t2
where o.num = 7
group by o.num```

Anything I say about that query will probably make it harder to understand, so I'll just count on you reading it carefully. It may help to remove some of it so you can see the intermediate results:

```select sum(t1.wraps) as wraps, t2.start, o.bytes
from samples as o
inner join (
select cur.num, count(prev.num) as wraps
from samples as cur
left outer join samples as prev on cur.num = prev.num + 1
and cur.bytes < prev.bytes
group by cur.num
) as t1 on t1.num <= o.num
cross join (
select bytes as start from samples order by num limit 1
) as t2
group by o.num;
+-------+-------+-------+
| wraps | start | bytes |
+-------+-------+-------+
|     0 |   100 |   100 |
|     0 |   100 |   900 |
|     1 |   100 |   230 |
|     1 |   100 |   700 |
|     1 |   100 |   982 |
|     2 |   100 |   163 |
|     2 |   100 |   600 |
+-------+-------+-------+```

You can also write it as a correlated subquery, instead of a subquery in the `FROM` clause:

```select 1000 * (
select count(*) from samples as cur
inner join samples as prev on cur.num = prev.num + 1
and cur.bytes < prev.bytes
where cur.num <= samples.num
)
- (select bytes from samples order by num limit 1)
+ samples.bytes as total
from samples
where num = 7```

Both queries need `WHERE` clauses in multiple places to make them behave if you want anything other than the full range of data summed up. For example, if you want to sum over rows 3 through 6, the first query becomes

```select 1000 * sum(t1.wraps) - t2.start + o.bytes as total
from samples as o
inner join (
select cur.num, count(prev.num) as wraps
from samples as cur
left outer join samples as prev on cur.num = prev.num + 1
and cur.bytes < prev.bytes
where cur.num > 3
group by cur.num
) as t1 on t1.num <= o.num
cross join (
select bytes as start from samples where num >= 3 order by num limit 1
) as t2
where o.num = 6
group by o.num```

The problem with both queries is the `<=` predicate, which turns them into O(n2) algorithms. They're essentially a cross-join. Plus, they're hard to understand. It turns out that the simplest method by hand is complicated in SQL.

### Method 2: Adjust when there's a wrap

The second method I showed above is more complex for humans, but it's actually simpler to do in SQL:

```select sum(
if (cur.bytes >= prev.bytes,
cur.bytes - prev.bytes,
cur.bytes - prev.bytes + 1000)) as total
from samples as cur
inner join samples as prev on cur.num = prev.num + 1
-- optional WHERE clause for choosing start/end:
-- where cur.num > 3 and cur.num <= 6```

A slightly more compact way to write this is

```select sum(
cur.bytes - prev.bytes + if(cur.bytes >= prev.bytes, 0, 1000)) as total
from samples as cur
inner join samples as prev on cur.num = prev.num + 1
-- where cur.num > 3 and cur.num <= 6```

This query is both simpler and more efficient than the first method I showed. If your platform doesn't support `IF()`, use a `CASE` statement.

### Method 3: do it with user-variables in MySQL

It's possible to do even better than the simple join technique on MySQL. Using some MySQL-specific tricks, you can make this query a once-through, low-cost algorithm, much the way you might do it by hand or in a programming language that supports iteration. If you want to know how this works, and why the query has to be written in such a contorted way, read my article on advanced user-variable techniques in MySQL.

```set @last_bytes := null;

select sum(greatest(
if(bytes >= @last_bytes,
bytes - @last_bytes,
coalesce(bytes + 1000 - @last_bytes, 0)),
least(0, @last_bytes := bytes)
)) as bytes
from samples order by num;```

This is a bit trickier to write than some of the other user-variable examples I've shown, because you can't use `@last_bytes is null` in any `IF()` or `CASE` statement. If you do, the query optimizer will look at `@last_bytes` at compile time, see that the statement can be optimized out and replaced with a constant, and your query will not work as you expect it to.

### Summary

In this article I've shown you three methods to do SQL math on a counter that wraps around to zero when it reaches a limit. I showed them to you in order of increasing efficiency, but the second method is both the simplest and the most platform-independent (and probably the most sane).

Can you think of any other ways to do this, or any other uses for these kinds of techniques? Write into the comments!

Written by Baron Schwartz

February 19th, 2007 at 8:42 am

Posted in Uncategorized

Tagged with , ,

### 4 Responses to 'How to subtract in SQL over samples that wrap'

1. This is particularly useful for me, as I take some OS and MySQL statistics every 5 minutes, and MySQL has counters that reset upon restart as well as roll over when they pass a certain number. So it’s useful to be able to have this — in case anyone is wondering, “why would I use this”?

Sheeri

19 Feb 07 at 6:19 pm

2. I don’t guess I’m really getting this. What’s to say a value can’t be wrapped more than once? Also, what would cause it to wrap at 900 when the upper limit is 1000?

Bill Minton

20 Feb 07 at 10:30 am

3. Bill, thanks for writing in. It absolutely could wrap more than once. However, in real life, most things I know of (like traffic through a network interface) wrap at some large number, such as the size of an unsigned integer. If you sample it often enough, you should be able to get a fairly smooth curve and very low probability of two wraps in one. For example, if you know your counter increases at 1000 per second and you sample every second, there’s no way two wraps could sneak past you.

The value didn’t wrap at 900; I sampled it at 900, it increased to 1000 and back to 230 before I sampled it again.

Xaprb

20 Feb 07 at 4:48 pm

4. Oh ok, you’re sampling it. I was trying to figure out why it would write out a value lower than it’s threshold.

Now I’ll have to re-read the article with a little better understanding. :)

Bill Minton

20 Feb 07 at 5:08 pm