It’s tax season, and this year I decided to export my GnuCash data into a MySQL database for analysis. This article is about that process, including source code for the export script and a couple of simple queries.
The source code is available on GitHub.
GnuCash is a powerful double-entry accounting system for managing personal and/or business finances. I’ve used it since about 2002, and find it adequate for my needs. However, the reporting is widely acknowledged to be one of its weakest points. Reporting is implemented in the Scheme programming language, and if the included reports aren’t enough, most people probably won’t be able to write their own. Personally, I find it much easier (for certain reports) to write SQL queries instead. With that in mind, I set out to write a little script that will insert my GnuCash data into a MySQL database.
It’s actually a fairly simple task; the GnuCash developers chose XML as the file format, so the data is easily accessible from other programs. I decided to export only the account structure and transactions, leaving alone the business features such as customers and invoices.
Before I started, though, I spent some time investigating the built-in PostgreSQL backend. The default data format is XML, but the GnuCash developers also have support for storing the data in a PostgreSQL database. Unfortunately, I couldn’t get it to work. At least in version 1.8.11, it seems to be broken. In fact, the GnuCash website and developer mailing lists indicate this is a low priority for right now, as they are working towards a new version which uses an updated graphical toolkit (GTK+). Apparently the source is high-quality, but a bit out of date since the developers plan to revisit that once the new version is ready.
Satisfied that I’m not reinventing the wheel, I went ahead with a script to export the data myself. Since the XML format is really straightforward, I decided to do the simplest thing I could get to work. Perl seemed like a good choice. I am familiar with expat, and Perl has an expat module, so that was also a natural choice for the XML parsing. The basic idea is to push the data through the script like drinking from a firehose, and when it detects certain things – the start of an element, some character data, the end of an element – call a function to handle the data. Depending on what the data is (what element I’m currently examining, etc) I either store it for later reference, or push it into the database. I also automated creating the necessary table structure in the database.
Access to a MySQL database is required. The script uses tables called account, transaction and split. The DDL used to create the tables is at the end of the Perl script file. I used InnoDB tables so I’d have transaction support.
The script requires these Perl modules: DBI, Term::ProgressBar, and XML::Parser::Expat. You can run it with the
--help option to see how to run it.
These queries assume precision decimal math. Versions of MySQL less than 5.0 use imprecise math. I wrote an article on MySQL and decimal math. If you have any issues with these queries, it may help to consult that article.
This query finds all unbalanced non-equity transactions by summing the splits:
select s.amount, a.name, t.description, t.posted from account as a inner join ( select transaction, sum(amount) as amount, max(account) as account from split group by transaction having sum(amount) <> 0 ) as s on s.account = a.id inner join transaction as t on t.id = s.transaction where a.type <> 'EQUITY'
This query sums all expenses for 2005 by month and account:
select date_format(posted, '%Y-%m') as month, name, sum(amount) as amount from transaction as t inner join split as s on s.transaction = t.id inner join ( select id, name from account where type='EXPENSE' ) as a on a.id = s.account where year(posted) = 2005 group by date_format(posted, '%Y-%m'), name order by date_format(posted, '%Y-%m'), name;
This query finds average monthly expenditures by account since January 2005:
select @num_months := count(distinct date_format(posted, '%Y-%m')) from transaction where posted >= '2005-01-01'; select cast(sum(amount) / @num_months as decimal(8,2)) as 'Average monthly amount', concat(coalesce(grandparent_name, ''), if(grandparent_name is null, '', ' > '), coalesce(parent_name, ''), if(parent_name is null, '', ' > '), name) as name from ( select date_format(posted, '%Y-%m') as month, a.name, aa.name as parent_name, aaa.name as grandparent_name, sum(amount) as amount from transaction as t inner join split as s on s.transaction = t.id inner join ( select id, name, parent from account where type='EXPENSE' ) as a on a.id = s.account left outer join account as aa on aa.id = a.parent left outer join account as aaa on aaa.id = aa.parent where posted >= '2005-01-01' group by date_format(posted, '%Y-%m'), a.name ) as x group by name order by name;