Xaprb

Stay curious!

Archive for December, 2009

A review of Pentaho Solutions by Roland Bouman and Jos van Dongen

with one comment

Pentaho Solutions

Pentaho Solutions

Pentaho Solutions, Business Intelligence and Data Warehousing with Pentaho and MySQL. By Roland Bouman and Jos van Dongen, Wiley 2009. Page count: about 570 pages. (Here’s a link to the publisher’s site.)

The book is big in part because it’s about a GUI tool, so there are the requisite number of screenshots (but not too many). It is structured into four parts, each on a different topic.

The first part is 4 chapters on getting started with Pentaho: from a quick-start through installing, configuring, and understanding the Pentaho BI Stack. Pentaho is a complex suite of tools, and there’s a handy architecture diagram to help you grok what the parts are and how they fit together. You’ll learn about topics such as Mondrian and configuring the database connection pool.

The second part is a primer on dimensional modeling and DW design. It uses a sample database that the authors developed for the book, which you can download from the publisher’s website. You’ll learn about star schemas and data marts. You can skip this part if you’re familiar with BI concepts in general, and just want to learn how Pentaho implements them.

Part three is about Pentaho data integration. The first chapter is a primer on integration, which again you’ll be able to skip if you know your stuff. Then you’ll walk through topics such as generating dimensions, designing, and deploying data integration solutions with Kettle and Spoon.

Part four is about designing and building BI applications with Pentaho: learning about its metadata layer; using the reporting tools; scheduling, subscriptions, and bursting; OLAP; data mining; and building dashboards. This is about half the book, really. There’s a lot to it — it’s all about how to take a generic and flexible suite of tools and get something specific and useful out of it. If you’ve ever done that, you’ll know why this could occupy half a book. This isn’t a simple suite of tools that only does one thing well.

In the end, this is a good beginner-to-intermediate book for people who want to learn about data warehousing, business intelligence, Pentaho, or all of the above. If you don’t know anything about these topics, you’ll find the entire book quite useful. If you know a lot about BI and DW, you’ll probably get the most out of the Pentaho-specific bits. On the other hand, people who already have an advanced level of proficiency with Pentaho will probably know much of what’s in this book. Those seeking to build advanced solutions presumably also know a lot about the general BI concepts, too. So this is probably not the book for you if you already know what you’re doing with Pentaho.

Proprietary BI systems cost at least an arm and a leg, and possibly more. That’s why open-source BI is such a hot topic. If you’re looking to get acquainted with Pentaho, I think this is an excellent book — that’s what I got it for, and I wasn’t disappointed. Now if only I could find a similar book for Jaspersoft.

Written by Xaprb

December 13th, 2009 at 11:11 pm

Version 1.1.5 of improved Cacti templates released

with 4 comments

I’ve released version 1.1.5 of my improved Cacti templates for MySQL and other components of a LAMP application. This is a pure bug-fix release. One of the bug fixes prevents spikes in graphs, but requires you to rebuild your RRD files. There are upgrade instructions on the project wiki for this and all releases. Use the project issue tracker to view and report issues, and use the project mailing list to discuss the templates and scripts.

The full changelog follows:

2009-12-13: version 1.1.5

  * Support for getting slave lag via mk-heartbeat was broken (issue 87).
  * The memcached stats command hung because it lacked "quit" (issue 65).
  * The COUNTER data type caused spikes; switched to DERIVE instead (issue41).
  * LOCK WAIT in an InnoDB transaction could cause an error (issue 91).
  * The cache file name didn't include the MySQL port (issue 82).
  * Added the -q option to the SSH command to quell missing homedir warnings.
  * The --port option to the MySQL templates could not be null.
  * The log_bytes_flushed and log_bytes_written were renamed (issue 81).

Written by Xaprb

December 13th, 2009 at 9:28 pm

Posted in PHP,SQL,Sys Admin

InnoDB is a NoSQL database

with 9 comments

As long as the whole world is chasing this meaningless “NoSQL” buzzword, we should recognize that InnoDB is usable as an embedded database without an SQL interface. Hence, it is as much of a NoSQL database as anything else labeled with that term. And I might add, it is fast, reliable, and extremely well-tested in the real world. How many NoSQL databases have protection against partial page writes, for example?

It so happens that you can slap an SQL front-end on it, if you want: MySQL.

Written by Xaprb

December 13th, 2009 at 12:08 pm

Posted in SQL

Tagged with ,