Xaprb

Stay curious!

Four types of database abstraction layers

with 15 comments

Quite a few people have chimed in on a recent discussion about PHP, MySQL, database abstraction layers, and performance. I think enough viewpoints have been covered that I don’t need to comment, but one question I don’t see answered is “what are the qualities of a good SQL abstraction layer?” I think it’s a very interesting — and complicated — question. As it turns out, the term has several meanings, and I think it’s important to understand them.

I started drafting this article in February, but put it aside until the recent spate of articles prompted me to finish it. Here are links to some of those articles:

This issue has been discussed before, too. For example,

That explains why I’m not going to jump into the fray with my thoughts on the developerWorks article; everything I’d say has been said before. I just want to see if I can clarify what people mean by “database abstraction layer,” and lay out some ways to think about what’s good and bad in such a system.

Types of abstraction layers

People sometimes say “SQL abstraction layer” or “database interface” fairly loosely, assuming everyone knows what they mean. Not so — I’ve seen at least four distinct meanings in common usage:

  1. A software library to connect to a database server and issue queries, fetch results etc.
  2. A software library to present a common API to different database servers.
  3. A software library to automatically generate portable SQL queries.
  4. A software library to map Object-Oriented Programming to a relational database (Object-Relational Mapping, or ORM)

Most libraries will also provide related functionality such as escaping quotes for preventing SQL injection attacks, getting server status, controlling transactions, and so forth.

Each type of interface usually builds upon the types that precede it in my numbering scheme. Each type has different goals, which you have to understand before you can decide what criteria to use when measuring goodness or badness.

Type 1: Libraries that provide access to a database

Libraries that connect to specific database software, issue queries, and return results are generally written at a fairly low level, and their interfaces usually map directly to the specific server software they’re written for. For example, the PHP mysql_ functions are clearly just hooks into the MySQL drivers — without any real abstraction as a high-level programmer would think of it.

Type 1 software doesn’t really have a goal, except enabling access from the programming language to the database.

Type 2: Libraries that present a common interface to different server software

Type 2 software does much the same as Type 1, but it abstracts away the guts of Type 1 software so every database system can be accessed with the same functions. This is what Jeremy means when he says

I use a revolutionary new programming technique. Instead of littering my code with those calls, I put my core data access layer into a library–a separate piece of reusable code that I can include in various parts of my application and… reuse!

Perl’s DBI is a good example of Type 2 software. What is DBI, and what are its goals? From the homepage:

The DBI is the standard database interface module for Perl. It defines a set of methods, variables and conventions that provide a consistent database interface independent of the actual database being used.

In other words, you don’t have to look at documentation to know “what function fetches a row from my results if I’m connected to Firebird instead of MySQL?” You learn DBI, and that’s it. PHP’s PDO is similar too. I think this is what Jeremy means when he talks about a “Neutral API,” though his article doesn’t really make that very clear.

Type 2 software’s goal is presenting your code a common API for different database systems.

Type 3: Libraries that write portable SQL

The third type of software tries to abstract away the differences in SQL dialect between different database systems. For example, in some databases, you insert a row by the following SQL:

insert into table(col, col, col) values (val, val, val)

In others, you may do this:

insert into table set col = val, col = val, col = val

Type 3 software wants to help you avoid writing SQL so you can express in your code what you want the SQL to do, and let the abstraction layer sort out how to tell the database server to do it. This is typically accomplished with a non-SQL interface, such as in PHP’s PEAR::MDB2 package. From the documentation:

It provides a common API for all support RDBMS. The main difference to most other database abstraction packages is that MDB2 goes much further to ensure portability. Among other things MDB2 features:

  • An OO-style query API

The emphasis is mine. Type 3 software writes the queries for you behind the scenes. You don’t write SQL. Programmers who advocate Type 3 software consider this a Good Thing, because they believe portable SQL will prevent vendor lock-in:

… and when they decide to use another DBMS instead of MySQL (and they undoubtedly will at some point), the conversion will be painless.

Again, Type 2 hides API differences, but Type 3 goes further and hides SQL differences as well.

Type 3 software’s goal is complete SQL portability between all supported systems.

Type 4: Object-relational mapping software

The fourth type of software maps database objects to code objects. This is called ORM (Object-Relational Mapping). The typical mapping is that a database table is a class in an object-oriented language, and a row in the table is an instance of the class. Again in Perl, Class::DBI is an example of this. The paradigm of having one object per row in the table breaks out of the relational model where operations are performed on sets, not individual rows, but it can be handy for some uses.

Type 4 software writes the SQL for CRUD operations for you behind the scenes, so in that sense it is similar to Type 3 software. It also relies on Type 2 facilities so you can treat a row as an object everywhere, whether it’s in Oracle or SQL Server.

Type 4 software’s goal is to enable treating database rows as objects in your code.

What makes a good database abstraction layer?

I’ve shown there are at least four completely different types of “database abstraction layer,” and they have very different goals. It makes sense that “goodness” should be measured by different criteria as well, right? In fact, I think there should be some criteria in common, and some will be different.

The common criteria should be — at a minimum — speed/efficiency, correctness, good documentation, and platform portability (by which I mean, the system should compile and run on various hardware and operating system platforms).

Beyond that, each type will have different criteria:

  1. Type 1’s criteria should be just the basics I named above.
  2. Type 2’s criteria should be an interface that presents an adequate set of functionality for every database server, in a way that is “intuitive” or “elegant,” whatever that means in the given language. It should be a thin wrapper around Type 1 software. It also needs to allow invoking database-specific operations. In other words, common functionality ought to be presented in a common form, but database-specific operations should never be taken away from the programmer.
  3. Type 3’s criteria should be that the auto-generated code is truly portable among databases, and it should result in writing much less code in your application. It, like Type 2, should also allow the programmer to invoke database-specific operations when needed. Since it is designed as a high-level abstraction away from the underlying SQL, it should also be “elegant.”
  4. Type 4 software should hide the fact that an object is really a row in the database. The SQL it generates to hide this should be completely database-independent. Since it, like Type 3, is a convenience for the programmer, it ought to be very “elegant,” and should provide lots of conveniences, such as knowing when a column is a foreign key and allowing navigation between a row and its related rows in the foreign key table in an object-oriented fashion.

My opinions

Without getting into the discussions I mentioned above, I definitely have some opinions on what’s good and bad about various database access software.

First, I think Type 1 software is too low-level for most application code, as others have said. The only reason I’d use it is if I really wanted raw access to a specific database, and couldn’t accept any performance overhead. In every circumstance I can think of, it would be much better from a cost, coding and maintenance standpoint to call some Type 2 software and let it re-dispatch the call (remember, I’m assuming the Type 2 software should be a very thin wrapper around Type 1).

Regarding Type 2 software, some of the systems I’ve seen are definitely better than others. For example, I don’t really care for Perl’s DBI. I know that’s heresy, but I think it could have done a much better job separating some concepts out. DBI basically treats the SQL world as a collection of two types of things: connections and statements. It has no separate concept of a result set, for example. In my opinion, that makes it pretty hard to remember what you get when you invoke an operation, and where you ought to be fetching rows from. It just doesn’t make sense to me to fetch rows from a statement after executing it. Executing a statement should return a result, and I should fetch rows from the result, not the statement. I think DBI is awkward to use, and that’s after I’ve been programming with it for seven years! I still have to look up the documentation to figure out what I need to do, after all this time.

In fact, I think the best-designed system I’ve seen is the Microsoft .NET System.Data class library. In this library, every concept is represented separately. For example, you open a Connection, create a Command object, and when you execute the Command, you get back an appropriate object to do further work. For example, Command.ExecuteReader() returns an object from which you can fetch rows. I’m not a fan of Microsoft, as you probably know, but I find this design highly intuitive.

I generally dislike Type 3 software, and I think anyone who’s ever written serious applications that require real performance from a database system will probably agree, for fairly obvious reasons. For one thing, platform-independent SQL is a myth. Easy and/or painless conversion between different database systems is, too. It does not exist in the real world. And I don’t agree with those who assert it’s a common requirement, or that it would be a good thing. I think porting from one system to another is generally rare, and trying to write “portable” systems when there’s no clear need is going to cause nothing but problems. YAGNI.

I also dislike PHP’s PEAR libraries. For example, in PEAR::DB and PEAR::MDB2, you never know what type of object you’re going to get back from an operation! I said above a Type 3 SQL abstraction layer should be “elegant” and should result in writing less code. The PEAR error-handling paradigm is not elegant. Just one example: every action has to be followed by an if statement to check for an error. That’s what error handling is for. You shouldn’t have to write if statements — the software should throw an error when there’s an error! In my opinion, using these libraries results in ugly, complex code. And remember this: error-handling code is a huge risk anyway, because it’s so hard to test adequately. Anything that makes error-handling harder to do well is to be avoided.

So I dislike Type 3 software; I’m also not very excited by Type 4, though I don’t have such a strong aversion to it. It can be handy at times, but it can also promote a variety of truly bad practices (as Sheeri Kritzer wrote recently, over-using surrogate keys can be one such bad practice), and sometimes betrays a programmer’s partial or complete lack of understanding of relational systems — or worse yet, unwillingness or inability to think. I’ve been drafting an article on ORM for six months, so I’ll save my full rant for now, but ORM systems usually betray a shallow understanding of Object-Oriented Programming, too.

Summary

This has been a longish article, but that’s because these are complicated issues. I hope I’ve teased things into some sort of order that helps you understand how methods of database access differ, and how database access software differs accordingly. I’ve laid out my taxonomy of such methods, from raw access to Object-Relational Mapping systems, stated the goal of each type of system, and defined what I think are useful criteria to measure how good such a system is. Finally, I told you why I prefer Type 2 systems for most uses.

Written by Xaprb

August 13th, 2006 at 2:22 pm

Posted in Coding, SQL

15 Responses to 'Four types of database abstraction layers'

Subscribe to comments with RSS or TrackBack to 'Four types of database abstraction layers'.

  1. Just to clarify type 3 a bit more from my POV. MDB2 is more about giving people the ability to construct portable SQL statements themselves. So for example there is a method that will construct a portable call to concat a bunch of strings in SQL. So its more type 2.5 I guess. Type 3 would be more what I have done with the LiveUser_Admin storage layer that generates queries (SELECT’s and CRUD) from an array structure definition.

    Lukas

    13 Aug 06 at 5:55 pm

  2. The PEAR error-handling paradigm is not elegant. Just one example: every action has to be followed by an if statement to check for an error.

    This is not true:

    function my_error_handing_function($pear_error_object) {
       // do something with error object
    }
    PEAR::setErrorHandling(PEAR_ERROR_CALLBACK, 'my_error_handling_function');

    … and you can do away with all those if statements.

    jpk

    13 Aug 06 at 6:20 pm

  3. In reply to Lukas, certain statements (such as the CRUD behind an ORM) can be auto-generated in a portable way, but more complex queries, which really use the power of a relational database, can’t in my experience. It takes an expert with insight about the data, how it’s used, how it’s indexed, and so on (plus, when you get a 10-way join with 5 nested subqueries, it’s simpler to write it in SQL than via OO methods). You query a table with 40 million rows differently from one with 40 rows, even if they have the same schema. Generating an INSERT from arrays is perhaps the easiest usage. This is where I tend to favor ORM-like systems… and when used to good advantage, they remove the need to write simple queries, and let a programmer focus on the more complex ones.

    This reminds me of my last employer, where anything NOT in a stored procedure was verboten. That was a terrible policy. I ended up writing and maintaining almost a hundred silly “select * from table where blah = ?” queries in stored procedures, in a system that had only a couple dozen complex queries. If I’d been able to auto-generate those, I’d have cut about 10% of my application’s code and had fewer bugs resulting from schema changes without changes in the corresponding code.

    To jpk, I never knew about that. Perhaps then at least part of my dislike of PEAR could be remedied by fixing the documentation, which always shows error checking via if statements!

    Xaprb

    13 Aug 06 at 6:42 pm

  4. On further thought, still in reply to Lukas, I should point out that like any categorization scheme, mine is arbitrary and imperfect. There’s definitely an in-between for any library. I don’t mean to pigeon-hole any specific code library.

    Xaprb

    13 Aug 06 at 6:47 pm

  5. [...] Found this article through planet MySQL. In it he goes over 4 different “types” of abstraction layers that typically are collectively called a “database abstraction layer” – though they are very different from one another. I found it an interesting read. [...]

  6. What should also be mentioned when talking about layers like these: They help you in the common case. So you probably are not writing 10-way joins most of the time. You are also not writing code that will have horrible performance if you stick to portable ANSI SQL. So these layers help you get done faster and in a more portable fashion for the common case.

    But obviously even if you use a DBAL you can put in some code that is RDBMS specific, or you can hand write some joins even if you use an ORM for those extreme cases. The result is that you get fast performance, fast development time and you minimize the effort required to port to a different product.

    Lukas

    14 Aug 06 at 2:33 am

  7. You seems to have missed type 1.5, that is what I typically use for my application :)

    Meaning you still keep interface like mysqli but extend it with goodness needed for your own application. In my case I extend it with profiling logging, debugging, handling slave failure etc. You also can extend it to do lazy connecting if you want to.

    Type 2 could be good but in PHP there was no fast uniform access for all databases for years. We’ll see how PDO goes. Also I surely do not care about code working with other databases – there would be many changes needed If I decide to do a switch as I’m actively using MySQL only feature. So I would rather adjust my isolated database access code in this case.

    One more good thing about such approach – you can plug it in into third party code very easily, Have got application which uses MySQLi object ? And it gets trivial to make this code go via your abstraction library.

    Peter Zaitsev

    14 Aug 06 at 4:39 am

  8. Hi Lukas,

    You are also not writing code that will have horrible performance if you stick to portable ANSI SQL.

    That is not true. In every system I’ve worked with, there are examples of ANSI standard SQL queries — and simple ones at that — which are terribly optimized. This is why I won’t use auto-generated code for anything beyond a one-row select with a simple WHERE clause, and even then only if I know there’s no way it will burn me.

    I agree that they help you in the common case, if your common case is those one-row selects. As I mentioned at my previous job, there was about a 4-to-1 ratio of those in one specific system. However, I’ve written other systems where the “common case” is between 100 and 5,000 lines of SQL. I’ve done more systems like that than systems with one-row selects.

    As for “you minimize the effort required to port to a different product,” again that’s not a benefit for anything I have ever worked on. Telling my bosses we should make it easy to switch from SQL Server to MySQL if we ever wanted to would have been hilarious. The company ran on SQL Server. They ported from a UNIX system to Windows ten or fifteen years ago, when they probably had 5% of their current codebase, and it took many, many years. To port to anything else now would be completely impossible. That may sound like an overstatement, but if you saw how huge the codebase is, you’d agree. The portability argument in a company like that is completely irrelevant. It’s an enormous undertaking even to upgrade versions of SQL Server and make sure things don’t break. Nobody wants to hear about anything worse than that.

    Xaprb

    14 Aug 06 at 6:58 am

  9. Peter, I wrote my own Type 2 system for PHP and have used it for many years, since I was so averse to PEAR. I added some of those features into it when needed, so I see what you are saying.

    I enjoy my Type 2 (maybe 2.5 because it does have a feature to auto-generate an INSERT/UPDATE) system because it makes more sense for me to code with. It works the way I think. I’ve never tested, but I think it has barely any overhead. But it also only connects to MySQL. I never had to connect to anything else, so I only wrote one back-end. If I were working with a system that I wanted to support several different databases, I’d write the new back-end, and then ship the product with two sets of queries.

    Xaprb

    14 Aug 06 at 7:07 am

  10. I agree with Peter Z on this one, and I think the recent SqlConnection class I posted in the CacheEngine article is an example of a Type “1.5″. Nothing you don’t need, real simple, lazy loading… It’s worked fast and true for me for years with only a few modifications needed for certain projects at certain times… Also, over the years, I’ve used all of the different types of DBAL detailed above. I’ve found, just through doing this stuff a while, that my personal taste is the Type 2 and below. That doesn’t mean that the others aren’t good. I think a lot of it often comes down to personal preference and what is more important to you, from a feature perspective…

    Jay Pipes

    14 Aug 06 at 10:30 am

  11. Funny you mention it because mine is called SqlConnection too :-)

    Maybe I’ll get onto a train of thought here and actually post a couple more things I have drafted along these same lines — my own code, my ORM article, and some neat things ORM-ish systems can actually be great for, such as an in-database, single-query ACL system.

    Xaprb

    14 Aug 06 at 11:06 am

  12. [...] For related reading, I recently read an article discussing four types of data abstraction layers, ranging from simple to complex, and points out their similarities and differences. For those interested in learning more about different data abstraction layers, this article is a good read. It also has several links to other related articles, so there’s a lot of reading to do on the subject! [...]

  13. This is really sharp. Looking forward to the ORM article. I think howere, there may be a contingent defense of type 3 … it is a good thing for writing code that inovled schlepping data from one place to another. After all, this kind of code tends not to neeed maximum efficiency and the simple fact that you are writing code to move data from one place to another means that you are functioning in a heterogenous data environment. In this situation, DBIx:Abstract is a real boon.

    Daniel Lathrop

    20 Oct 06 at 7:08 pm

  14. Was the ORM article ever published?

    Ed

    29 Jun 09 at 2:37 am

  15. I am not sure, but I don’t see it among my drafts. So I either published it (and can’t find it) or abandoned it. Maybe I was thinking of http://www.xaprb.com/blog/2008/10/26/the-power-of-a-good-sql-naming-convention/

    Xaprb

    29 Jun 09 at 7:49 am

Leave a Reply