Xaprb

Stay curious!

Archive for August, 2009

OpenSQL Camp events in 2009

without comments

There are two OpenSQL Camp events in 2009. This is a conference about any and all open-source databases. The slogan of the inaugural event was “of, by, and for the community.”

The first of this year’s events just ended last weekend in Germany, and I’m really sad that neither I nor anyone else from Percona could attend. But fortunately the amazing crew of organizers has already posted videos of the sessions!

The other is upcoming in November in Portland, Oregon USA. I am going to try to attend this one. I encourage you to not only attend, but to participate in planning this event. Thus far in its short life, the OpenSQL Camp conference series has been entirely a grassroots effort. You can be part of that.

I also see that the call for proposals for PostgreSQL West 2009 is still open for a few more days. I just wish I could attend all these conferences. Especially if you’re near the west coast of the USA, I encourage you to attend this event — I went to East 2008 and East 2009, and they were really worth it.

Written by Xaprb

August 29th, 2009 at 9:58 pm

How to find per-process I/O statistics on Linux

with 22 comments

Newer Linux kernels have per-process I/O accounting and you can use the iotop tool to find out what’s performing I/O, but in many cases I’m trying to find the source of an I/O problem in an older kernel. I found sort of a hack-ish way to do that today, while trying to figure out why a system was basically unresponsive.

I found a post on Stack Overflow that showed a way you can get per process I/O statistics from the kernel even in older kernels. I adapted this to my needs, and wrote a little script.

Here’s how you use it. First, get it:

wget http://aspersa.googlecode.com/svn/trunk/iodump

Then turn on kernel messages about I/O:

echo 1 > /proc/sys/vm/block_dump

This makes the kernel start writing messages about every I/O operation that takes place. Now all you have to do is get those messages and feed them into my script:

while true; do sleep 1; dmesg -c; done | perl iodump

Wait a little while, then cancel the script. The results should look something like the following:

root@kanga:~# while true; do sleep 1; dmesg -c; done | perl iodump
^C# Caught SIGINT.
TASK                   PID      TOTAL       READ      WRITE      DIRTY DEVICES
firefox               4450       4538        251       4287          0 sda4, sda3
kjournald             2100        551          0        551          0 sda4
firefox              28452        185        185          0          0 sda4
kjournald              782         59          0         59          0 sda3
pdflush                 31         30          0         30          0 sda4, sda3
syslogd               2485          2          0          2          0 sda3
firefox              28414          2          2          0          0 sda4, sda3
firefox              28413          1          1          0          0 sda4
firefox              28410          1          1          0          0 sda4
firefox              28307          1          1          0          0 sda4
firefox              28451          1          1          0          0 sda4

I deliberately generated a bunch of I/O by deleting my Firefox history and cache.

Got any better ideas, warnings, etc? Post them in the comments.

Written by Xaprb

August 23rd, 2009 at 1:26 am

A Review of Beginning Database Design by Clare Churcher

with 5 comments

Beginning Database Design: From Novice to Professional

Beginning Database Design: From Novice to Professional

Beginning Database Design: From Novice to Professional. By Clare Churcher, Apress, 2007. Page count: 230 pages. (Here’s a link to the publisher’s site.)

My wife bought a copy of this book, and recently I took it off her bookshelf to give it a read myself.

I found the book very lucid and readable. The author does not drag us through a bunch of formalisms, nor does she attempt to force the book to be readable through the use of comics, pop-culture references, or other artificial devices. Instead, she draws on her real-life experience helping people design databases, and presents several examples that nearly anyone will understand without effort. She doesn’t rely on hard-to-understand examples such as an employee-supervisor database, which is badly overused and really not that clear anyway in my opinion.

The book begins with a guided tour of the development process, laying the foundation for understanding relational database design and approaching some topics such as initial requirements and use cases. It doesn’t spend long here, before moving on to understanding and developing a data model based on reality. Along the way, the author tackles many topics such as relationships, techniques for spotting areas where there might be ambiguities, how to explore those further and arrive at hidden requirements, and so forth.

The examples strike me as just realistic enough, just complicated enough, and just simple enough to be understandable to everyone. For example one, of the sample projects is about delivering takeout meals. Another is about sports teams. Others include taking samples of insect populations on farms. I think these examples really serve well to explain problems such as many to-many-relationships.

This book is one of the few in which I have seen chapter summaries used well. Most authors simply do not know how to summarize their own work and distill it into something that the reader will actually be able to use as a reference. This author certainly does. Her summaries are very clear and concise. They do not begin with “in this chapter, we covered…”

The book trends towards object oriented design, which is probably a good thing, considering that this model seems most suited for representing real-world concepts with which non-technical people might be familiar. However, the book does not get into elaborate explanations of UML or object notation. It explains the bare necessities of object-oriented concepts — inheritance, association, attributes, and relationships. The author also warns people away from overusing things that novices tend to rely on too much (inheritance!). I can only agree, having seen people go through this process myself.

About halfway through the book, we enter the realm of a relational schema. Up until this point in the book, everything has been about representing real-world data in the simplest and most logical model possible. Now the book shows how to go from that model to relational schema. Each of the major concepts introduced earlier in the book is explained in terms of databases, but without being tied to any particular database. The summary in this chapter is particularly useful.

The next chapter is about normalization. Again, I appreciate how she gracefully sidesteps the tired old explanations of this topic, which always seem to be copied and pasted from one book to the next. She covers functional dependencies, first normal form, second normal form, third normal form, and more advanced normal forms. Including the summary, this chapter is less than 20 pages long, and it contains everything that it needs to, in my opinion.

Chapter 9 explains a little bit more about the intricacies of keys and constraints. This is one of the topics where I think other books really fall short. They either gloss over or skip this topic, or they approach it too formally.

After this, there is a brief introduction to how to write queries, which straddles the divide between relational and everyday terminology. Next up is a chapter on user interfaces, followed by a brief look at other ways to represent the data model, such as object-oriented databases. Personally, I feel that this section adds less value to the book, because 99% of people are going to be using relational databases. However, there is a very nice section on using Excel spreadsheets, which is probably even more useful than databases for a lot of people. It is certainly less often covered in the literature elsewhere.

There is a concluding chapter, which is only four pages long. All in all, the real content in this book is about 230 highly readable pages. I think this book hits the sweet spot for people who understand a particular problem domain very well, but don’t understand the technobabble that most of programmers subject them to. It should be extremely useful for professionals who have learned how to model things in Excel spreadsheets or in databases such as FileMaker. My guess is that it will help people to organize their thoughts and really understand things that they have only intuited. For those who don’t understand data modeling at all, the learning curve will probably be steeper, but I think this book holds up well as an intensive study guide.

Written by Xaprb

August 22nd, 2009 at 12:07 pm

Posted in PostgreSQL,Review,SQL