Archive for August, 2009
OpenSQL Camp events in 2009
There are two OpenSQL Camp events in 2009. This is a conference about any and all open-source databases. The slogan of the inaugural event was “of, by, and for the community.”
The first of this year’s events just ended last weekend in Germany, and I’m really sad that neither I nor anyone else from Percona could attend. But fortunately the amazing crew of organizers has already posted videos of the sessions!
The other is upcoming in November in Portland, Oregon USA. I am going to try to attend this one. I encourage you to not only attend, but to participate in planning this event. Thus far in its short life, the OpenSQL Camp conference series has been entirely a grassroots effort. You can be part of that.
I also see that the call for proposals for PostgreSQL West 2009 is still open for a few more days. I just wish I could attend all these conferences. Especially if you’re near the west coast of the USA, I encourage you to attend this event — I went to East 2008 and East 2009, and they were really worth it.
How to find per-process I/O statistics on Linux
Newer Linux kernels have per-process I/O accounting and you can use the iotop tool to find out what’s performing I/O, but in many cases I’m trying to find the source of an I/O problem in an older kernel. I found sort of a hack-ish way to do that today, while trying to figure out why a system was basically unresponsive.
I found a post on Stack Overflow that showed a way you can get per process I/O statistics from the kernel even in older kernels. I adapted this to my needs, and wrote a little script.
Here’s how you use it. First, get it:
wget http://aspersa.googlecode.com/svn/trunk/iodump
Then turn on kernel messages about I/O:
echo 1 > /proc/sys/vm/block_dump
This makes the kernel start writing messages about every I/O operation that takes place. Now all you have to do is get those messages and feed them into my script:
while true; do sleep 1; dmesg -c; done | perl iodump
Wait a little while, then cancel the script. The results should look something like the following:
root@kanga:~# while true; do sleep 1; dmesg -c; done | perl iodump ^C# Caught SIGINT. TASK PID TOTAL READ WRITE DIRTY DEVICES firefox 4450 4538 251 4287 0 sda4, sda3 kjournald 2100 551 0 551 0 sda4 firefox 28452 185 185 0 0 sda4 kjournald 782 59 0 59 0 sda3 pdflush 31 30 0 30 0 sda4, sda3 syslogd 2485 2 0 2 0 sda3 firefox 28414 2 2 0 0 sda4, sda3 firefox 28413 1 1 0 0 sda4 firefox 28410 1 1 0 0 sda4 firefox 28307 1 1 0 0 sda4 firefox 28451 1 1 0 0 sda4
I deliberately generated a bunch of I/O by deleting my Firefox history and cache.
Got any better ideas, warnings, etc? Post them in the comments.
A Review of Beginning Database Design by Clare Churcher
Beginning Database Design: From Novice to Professional. By Clare Churcher, Apress, 2007. Page count: 230 pages. (Here’s a link to the publisher’s site.)
My wife bought a copy of this book, and recently I took it off her bookshelf to give it a read myself.
I found the book very lucid and readable. The author does not drag us through a bunch of formalisms, nor does she attempt to force the book to be readable through the use of comics, pop-culture references, or other artificial devices. Instead, she draws on her real-life experience helping people design databases, and presents several examples that nearly anyone will understand without effort. She doesn’t rely on hard-to-understand examples such as an employee-supervisor database, which is badly overused and really not that clear anyway in my opinion.
The book begins with a guided tour of the development process, laying the foundation for understanding relational database design and approaching some topics such as initial requirements and use cases. It doesn’t spend long here, before moving on to understanding and developing a data model based on reality. Along the way, the author tackles many topics such as relationships, techniques for spotting areas where there might be ambiguities, how to explore those further and arrive at hidden requirements, and so forth.
The examples strike me as just realistic enough, just complicated enough, and just simple enough to be understandable to everyone. For example one, of the sample projects is about delivering takeout meals. Another is about sports teams. Others include taking samples of insect populations on farms. I think these examples really serve well to explain problems such as many to-many-relationships.
This book is one of the few in which I have seen chapter summaries used well. Most authors simply do not know how to summarize their own work and distill it into something that the reader will actually be able to use as a reference. This author certainly does. Her summaries are very clear and concise. They do not begin with “in this chapter, we covered…”
The book trends towards object oriented design, which is probably a good thing, considering that this model seems most suited for representing real-world concepts with which non-technical people might be familiar. However, the book does not get into elaborate explanations of UML or object notation. It explains the bare necessities of object-oriented concepts — inheritance, association, attributes, and relationships. The author also warns people away from overusing things that novices tend to rely on too much (inheritance!). I can only agree, having seen people go through this process myself.
About halfway through the book, we enter the realm of a relational schema. Up until this point in the book, everything has been about representing real-world data in the simplest and most logical model possible. Now the book shows how to go from that model to relational schema. Each of the major concepts introduced earlier in the book is explained in terms of databases, but without being tied to any particular database. The summary in this chapter is particularly useful.
The next chapter is about normalization. Again, I appreciate how she gracefully sidesteps the tired old explanations of this topic, which always seem to be copied and pasted from one book to the next. She covers functional dependencies, first normal form, second normal form, third normal form, and more advanced normal forms. Including the summary, this chapter is less than 20 pages long, and it contains everything that it needs to, in my opinion.
Chapter 9 explains a little bit more about the intricacies of keys and constraints. This is one of the topics where I think other books really fall short. They either gloss over or skip this topic, or they approach it too formally.
After this, there is a brief introduction to how to write queries, which straddles the divide between relational and everyday terminology. Next up is a chapter on user interfaces, followed by a brief look at other ways to represent the data model, such as object-oriented databases. Personally, I feel that this section adds less value to the book, because 99% of people are going to be using relational databases. However, there is a very nice section on using Excel spreadsheets, which is probably even more useful than databases for a lot of people. It is certainly less often covered in the literature elsewhere.
There is a concluding chapter, which is only four pages long. All in all, the real content in this book is about 230 highly readable pages. I think this book hits the sweet spot for people who understand a particular problem domain very well, but don’t understand the technobabble that most of programmers subject them to. It should be extremely useful for professionals who have learned how to model things in Excel spreadsheets or in databases such as FileMaker. My guess is that it will help people to organize their thoughts and really understand things that they have only intuited. For those who don’t understand data modeling at all, the learning curve will probably be steeper, but I think this book holds up well as an intensive study guide.






