Archive for the 'World Wide Web' Category

How to install and maintain multiple WordPress blogs easily

My wife has a site that needs two WordPress blog installations. The URLs differ by a subdirectory name. Both blogs need to be (URL-wise) subdirectories of /blog/. They need to be completely independent of each other, yet use the same custom theme. And there used to be just a single blog, which was not in a subdirectory; its permalinks must not break. (It has nice URLs with the date and title in them, not post ID-style URLs). And because I’m the husband, I get to maintain it, so tack “easy to maintain” onto the requirements (it must be easy to upgrade WP in both blogs, for example). In this article I’ll show you how I did it with a single .htaccess file, a single copy of WordPress, two MySQL databases, and a single configuration file.

Fixing URLs

As I mentioned, there used to be a blog at /blog/ which must not break. Suppose this blog was about dogs and my wife has recently started blogging about cats. She wants two completely independent blogs: /blog/dogs/ and /blog/cats/. Now the old permalinks structure, e.g. /blog/2006/03/01/dogs-are-great/, must redirect to /blog/dogs/2006/03/01/dogs-are-great/. How to do this?

I’m not a mod_rewrite wizard, but I figured there must be a way. And indeed there is: if an incoming URL doesn’t contain dogs or cats, it can be rewritten and redirected to the new URL. Here’s the code, which goes in /blog/.htaccess:

RewriteBase /blog/
RewriteCond %{REQUEST_URI} !dogs|cats
RewriteRule ^(.*)$ http://www.furryfriends.org/blog/dogs/$1 [R]

(By the way, the furryfriends thing is just an example, not the real site name).

So far, so good. That works just fine: when I access a URL without dogs or cats in it, it redirects me. But I need to do more: I need rewrite rules to match the date-and-title permalinks both blogs will use. I accomplish that like so:

RewriteCond %{REQUEST_URI} dogs|cats
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (dogs|cats) /blog/$1/index.php [L]

This is basically the same thing WordPress usually does, but I’ve made it tolerate either dogs or cats and figure out which installation should get the request. The .htaccess file lives in /blog/, not inside /dogs/ or /cats/ where it would be hard to maintain (it would get wiped out with upgrades). I can see different ways of doing this, but this is the way I chose. So here’s the whole file:


RewriteEngine On

# Anything to the old address (e.g. /blog/foo/bar) goes to the new address
# (e.g. /blog/dogs/foo/bar)
RewriteBase /blog/
RewriteCond %{REQUEST_URI} !dogs|cats
RewriteRule ^(.*)$ http://www.furryfriends.org/blog/dogs/$1 [R]

# If that fired, then we didn’t reach this code.  If we did, then this rule
# should do what a normal WP rule does.
RewriteCond %{REQUEST_URI} dogs|cats
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (dogs|cats) /blog/$1/index.php [L]

Are there any better ways of doing this? I’m curious. Leave a comment if you know of one.

Fixing the maintenance headache

Installing two copies of Wordpress, then customizing both is a pain. And it makes upgrades harder, too. I’d have to upgrade them both, fiddle with plugins (some of them are customized, too) etc etc. Even backups would be more complicated. It would be all too easy to screw up and delete some data. There are just so many ways this is a bad idea.

It occurred to me that I could use a single copy and turn the dogs/ and cats/ subdirectories in the filesystem into symbolic links. (Windows users, you can stop reading now: this won’t work for you).

To make the blogs, the Wordpress installation, and the custom blog theme all independent of each other, I created the following filesystem hierarchy:

blog/
   wordpress/
      2.3.2/
         [The usual WP files are here]
      wp_content/
         plugins/
         uploads/
         themes/
            my_custom_theme/

What I’ve done is separate the custom bits — the parts that don’t ship with WordPress — away from the files I want to upgrade when I upgrade Wordpress. How will this work, though?

I’ll make symbolic links from the dogs/ and cats/ directories to the currently installed version of Wordpress. So, from the root directory of the website, I type the following at the command line:

$ ln -s wordpress/2.3.2/ dogs
$ ln -s wordpress/2.3.2/ cats
$ cd wordpress/2.3.2/
$ rm -rf wp-content/
$ ln -s ../wp-content wp-content

The directory hierarchy now looks like this:

blog/
   cats/ -> wordpress/
   dogs/ -> wordpress/
   wordpress/
      2.3.2/
         [The usual WP files are here]
         wp-content/ -> ../wp-content
      wp_content/
         plugins/
         uploads/
         themes/
            my_custom_theme/

This is looking pretty good! There’s only one minor detail missing: because both blogs are running literally the same code via the magic of symlinks, each blog is trying to access the same database tables. I need to customize the Wordpress configuration file, too. I’ll just give each installation a different table name prefix in wp-config.php:

$table_prefix  = strpos($_SERVER['REQUEST_URI'], 'blog/cats/') ? 'wp_cats_' : 'wp_dogs';

And voila, it works perfectly now. I accessed the two URLs, ran through the installation procedure twice, and have two completely independent blogs running the same code in the same database.

The upgrade procedure

So, this is all a little complicated, right? What if I’ve forgotten how I did it when I upgrade next time, or what if someone else does it instead of me? I wrote myself a little README file to fix this. Here’s what it says:

This is how to upgrade Lynn's blog.

The two blogs are actually using shared files, which are symlinked to make
it so there is only one copy of files.  You can't change the files in one
without changing them in the other.

The wp-content subdirectory is symlinked.

The wp-config file is customized so it will work in either blog:

$table_prefix  = strpos($_SERVER['REQUEST_URI'], 'blog/cats/') ? 'wp_cats_' : 'wp_dogs';

To upgrade, 

 1. Download the latest version and unpack it inside wordpress/ as 2.3.2/
    or whatever version it is.
 2. Then go into that directory.
 3. Remove the wp-content/ directory completely.
 4. Then symlink it like this: ln -s ../wp-content wp-content
 5. Now re-customize wp-config.php
 6. Go back to the blog/ directory.  rm dogs cats
 7. ln -s wordpress/2.3.2/ dogs
 8. ln -s wordpress/2.3.2/ cats

It’s still a manual process, but it should take me all of thirty seconds. I’m okay with that. As long as I remember there’s a README file, that is!

Technorati Tags:, , ,

You might also like:

  1. How to write INSERT IF NOT EXISTS queries in standard SQL
  2. How to exploit an insecure order of access to resources
  3. How to install beautiful X11 cursors
  4. How to make file names cross-platform
  5. Interactive directory merging

My apologies if Bad Behavior blocked you

To cut down on comment spam, I have Bad Behavior enabled on this blog, and there was a minor issue with the version I had, though in general it has been wonderful. My apologies if it blocked you. I didn’t get any email from folks saying they were blocked, but it blocked me! Apparently a fairly common complaint. I’ve upgraded now and I’m not seeing any more issues.

There’s a lot of hate mail towards Bad Behavior’s author because of this, so I want to try to cancel some of that out: this plugin has saved me many gray hairs. Thanks.

Technorati Tags:,

You might also like:

  1. My unorthodox CAPTCHA blocked thousands of spam comments every week

Why is Embarq hijacking my DNS?

Isn’t this the same thing that happened a few years ago with ICANN or Verisign or one of those big names? (strangely, I can’t find relevant search results about this!).

I clicked on my toolbar shortcut for Toggl and my Embarq DSL service redirected me to a search-results page instead of telling my browser the truth. This makes me mad. The core layers of the Internet are designed the way they are for a reason and I don’t want to “opt out” of a stupid DNS hijacking stunt I never opted into.

Here’s a screenshot of what happens when I type in any old non-existent (or, in Toggl’s case, timing-out) domain name.

Embarq screwing with my DNS

And here’s what happens when I do a DNS lookup:

baron@kanga:~$ dig www.toggl.com

; <<>> DiG 9.4.1-P1 <<>> www.toggl.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27795
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.toggl.com.                 IN      A

;; ANSWER SECTION:
www.toggl.com.          22      IN      A       66.199.249.106

;; Query time: 72 msec
;; SERVER: 208.33.159.39#53(208.33.159.39)
;; WHEN: Fri Nov 23 15:50:14 2007
;; MSG SIZE  rcvd: 47

baron@kanga:~$ ping www.toggl.com
PING www.toggl.com (66.199.249.106) 56(84) bytes of data.
64 bytes from 66-199-249-106.reverse.ezzi.net (66.199.249.106): icmp_seq=1 ttl=53 time=79.2 ms

Did I mention that this makes me mad? Time to get on the phone.

PS: it looks like Verizon is doing it too.

Technorati Tags:, , , ,

No related posts.

pair Networks is now carbon-neutral

I’m a big fan of pair Networks, my hosting company. Their service has been outstanding; the few times I’ve ever had a glitch with my shared hosting, they have been responsive beyond the call of duty and done whatever it takes to fix the issue. I use them to host not only a half-dozen of my own sites, but family and client sites as well, plus some other groups I’m involved with. It has been a uniformly excellent experience.

Now I see pair Networks has gone carbon-neutral too. While such labels can be abused, and I wouldn’t really trust this announcement from just anyone, I trust them. I’m happy to see them trying to reduce their environmental impact. Go pair, go!

Side note: pair is not the cheapest (and I’m accepting gifts if you feel the urge), but every other hosting provider I’ve heard people rave about for cheapness eventually ends up being a sore point — even the biggest in the industry — I’ll name no names. Sometimes there’s no way to know if someone is good without trying them for six months and seeing how they handle problems. I am also involved with enterprises that use Blue Ridge InternetWorks, who is also top-notch and employs a number of people I respect a lot.

Technorati Tags:, , , , ,

You might also like:

  1. Four companies to sponsor Maatkit development
  2. How to install beautiful X11 cursors

JavaScript Number Formatting Library v1.3 released

Download Number Formatting Library

I’ve updated my JavaScript Number Formatting Library to version 1.3. This release adds the ability to customize how not-a-number (NaN), positive infinity and negative infinity are formatted. All you need to do is set the appropriate constant in Number.prototype:

  • Number.prototype.NaN
  • Number.prototype.posInfinity
  • Number.prototype.negInfinity

For more documentation, see the original article on JavaScript number formatting.

Technorati Tags:, , , , , ,

You might also like:

  1. How to format numbers in JavaScript flexibly and efficiently
  2. JavaScript number-formatting library updated
  3. Javascript date parsing and formatting, Part 2
  4. JavaScript formatting library update
  5. JavaScript date parsing and formatting, Part 1

JavaScript formatting library update

This is a quick update on the state of my JavaScript date formatting libraries and date chooser, and JavaScript number formatting library. It’s been a while since I wrote them, and as you can tell my interests have turned to many other things, but thet remain the best JavaScript formatting and parsing libraries I’ve seen.

I originally started this post in May of 2006, intending to use the libraries to demonstrate how HTML tables can contain multi-dimensional data, and use the seldom-used HTML elements like TFOOT to generate aggregate data about the table. This was going to be the follow-up to my tables and data with CSS post. I had a rough draft sketched out somewhere: a table full of numbers, dates, currencies and strings. A drop-down menu and a “format paintbrush” would let you reformat it all on the fly, and it would all be generated from semantic information attached to the table cells, not hard-coded into the page.

This was only practical because of the efficiency of my libraries; to reformat entire date regions in the table in real-time, for example, you’d need to parse the value as a date in one format, then reformat it for output in another. It was to be a showcase of how much efficiency matters for some things.

Tangent: I suppose it’s less important for people who aren’t still running 500MHz laptops these days, but efficiency really matters for me; a lot of these flashy sites these days simply take too much CPU for my little old computer to run well. I stubbornly resist getting a new computer because I cringe at the thought of the environmental cost, but I’m slowly breaking down; it’s gotten to the point my battery won’t charge, and Dell doesn’t even have a record of my service tag anymore. Spare parts for these things are long since unavailable.

Now I’m involved with quite different things, since I’m working more in programming and less in the Internet space. The good news is others keep reading and using all of my work — not just the recent work — which makes me happy. Just the other day Liran Tal wrote to tell me he’s using my Javascript libraries in the Daloradius project (check it out, it’s pretty cool). The date-parsing library found its way into some ExtJS tools that extend the YUI libraries, too.

And a few days ago someone sponsored an improvement to the number-formatting libraries.

Who knows — someday I may end up building some browser GUI systems again and use these. In the meantime it’s encouraging that they remain useful to people.

Note: This episode is pre-recorded. I’m taking a short hiatus from blogging and will respond to your comments when I return.

Technorati Tags:, , , , , , ,

You might also like:

  1. Javascript date parsing and formatting, Part 2
  2. JavaScript date formatting benchmarks
  3. JavaScript Number Formatting Library v1.3 released
  4. How to format numbers in JavaScript flexibly and efficiently
  5. JavaScript date chooser

JavaScript number-formatting library updated

Download number-functions

I’ve released a new version of my powerful, flexible, efficient JavaScript number-formatting library, which is probably the best available. This release adds a fix for zero-padding negative numbers.

If you find bugs, please send me test cases I can use to reproduce and add to the unit test suite. One test per line, like “input”, “format”, “expected” is best. For example, this is a great test case:

-1, "#,#.00", "-1.00"

I can plug that directly into the unit test suite, run it, and if it gives back “-01.00″ it will fail the test. This makes it much easier and more convenient for me to fix bugs.

Sponsoring bug fixes wouldn’t hurt either ;-)

Technorati Tags:, , , , , , , ,

You might also like:

  1. JavaScript Number Formatting Library v1.3 released
  2. JavaScript formatting library update
  3. How to format numbers in JavaScript flexibly and efficiently
  4. Javascript date parsing and formatting, Part 2
  5. JavaScript date formatting benchmarks

My unorthodox CAPTCHA blocked thousands of spam comments every week

I wrote a custom image-less CAPTCHA for my blog a while ago. I didn’t write it as a plugin, so I lost it when I upgraded Wordpress a couple weeks ago. Not having this protection was an eye-opening experience, and vindicated what I asserted in my original posts: a naive question-and-answer system is highly effective at stopping spammers, probably as effective as scrambled images. Read on for the details.

In my original article, I hypothesized that CAPTCHAs with scrambled images just make it hard for real people to use websites, and probably don’t provide any additional protection over less obnoxious methods. I thought there was probably a sweet spot at which humans don’t find the system intrusive, and yet it’s just a tiny bit too hard for most spammers to bother cracking it. After all, comment spammers are mostly targeting wide-open Wordpress installations. Why work hard at the small fraction that resist comment spam when there are so many easy targets?

(Actually, knowing what I know about search engine optimization, I’d go after the hard-to-get ones myself if I wanted quality links, but the comment spam I get is clearly about quantity, not even an attempt to look like quality).

How much spam do I get?

My little system of multiple-choice questions such as “which of the following is blue? a) sky b) grass …” seemed to cut out the vast majority of comment spam, but I never quite knew how much until I took it away and replaced it with a default installation of WordPress 2.1. In the old system, I had to delete a comment or two a day from the moderation queue. Wanna guess how much spam I built up in a week with nothing but Akismet in the new installation? From Sunday night May 13th to the next Sunday night, I got over 1,800 spam comments.

What about Akismet?

“Ah,” you say, “but that’s really no problem. You say you had Akismet installed; it should catch most of them.” Yes, but it also catches valid comments, which I value highly and don’t want to throw away. I had to pore through the spam queue and find them. If you’ve ever tried that with 1,800 comments in the spam bucket — holy cow, that’s all but impossible. I had to log into my MySQL database at the command line and start nuking them with LIKE patterns just to get it down to something manageable. Even a couple dozen spam comments a day in the spam queue would push me over the edge. If I had to deal with thousands in the spam bucket, and dozens that weren’t caught by Akismet, I’d turn off comments.

I needed a challenge question just to stop the hemorrhaging. Instead of writing my own this time, I decided to try using a pre-built plugin. I chose the popular “did you pass math?” plugin. It is, like most WordPress plugins, not perfect — but it’s good enough. I’m down to about 15 spam comments a day in the moderation queue now. With Akismet helping, that becomes quite manageable.

Notice — and this surprised me — the “did you pass math” plugin lets through more spam than my custom solution. I’d bet dollars to donuts that’s because it’s both popular and not customized per-blog. My system was unique, so it makes sense that it worked better.

So much for the naysayers

There’s a lot of “wisdom” floating around the web (some of it in the comments on my earlier posts, showing me how easy it would be to bypass my custom solution ) that says CAPTCHAs don’t work at all, and you should just use Bayesian filters and the like. I never believed it. Now I have proof. Was my system easy to break? Absolutely, and that’s why it wasn’t a hassle for real people to use. Did it work great despite its flaws? You bet.

I may re-write my solution as a plug-in at some point, if I get time. Till then, good enough is good enough, just as it always has been.

Technorati Tags:No Tags

You might also like:

  1. CAPTCHAs without images, part 2
  2. My apologies if Bad Behavior blocked you
  3. Why CAPTCHAs don’t work well
  4. How to implement CAPTCHAs without images
  5. How to install and maintain multiple WordPress blogs easily

How to create input masks in HTML

Download HTML Input Mask

Note that this is not compatible with all browsers, has known problems and limitations, and I am not maintaining it or replying to requests for help. Thanks! (But also note that you are free to change and redistribute under the license terms, which you should read after downloading)

Have you ever wanted to apply an input mask to an HTML form field? Input masks are common in traditional GUI applications, but HTML has no such feature. This article introduces a library that adds input masks to form fields with unobtrusive JavaScript.

What’s an input mask?

View the Demo

Input masks are guides to help users enter data in the correct format. They typically do not actually validate data; they just ensure the right types of characters are entered in the right places. Typical uses are for dates, times, social security numbers, phone numbers, and credit card numbers. The user enters un-formatted input, and the mask takes care of adding dashes and other separators in appropriate places.

For example, in the United States most people use MM/DD/YY format to write dates. A well-written GUI application honors the user’s locale and creates an appropriate input mask, such as ##/##/##, for date entry. The user types the numbers, and the program inserts the slashes. If the user types something other than a number, that character is discarded, not entered into the field.

How to do this with JavaScript

There are several problems you need to solve to simulate this in a web browser. First things first: let’s state the requirements.

  1. Help the user avoid entering invalid characters.
  2. Automatically insert separators as the user types.
  3. Constrain the length of the input.

Second, let’s create a spec for the masking syntax. In Windows Forms programming, controls have a Mask property, and other GUI libraries have similar functionality. The full behavior of these masks is complex. For an example, see the MSDN documentation for masked edit controls. You can get a lot of that functionality with a simpler specification, though. The following will suffice for many uses:

  1. The mask only allows one type of character for the entire mask. For example, the mask can allow either all digits or all alphanumerics, but you can’t constrain one character to be a digit while letting other characters accept alphanumerics.
  2. The mask specifies the placeholders for input with spaces, and separators as non-spaces.

An example mask, then, has two parts: the format, which says which places can accept user input, and the type, which says what type of character can go in those places. We’ll see how to actually do this later.

The third problem is to unobtrusively attach the masking functionality to input fields, with gracefully degrading behavior if the browser doesn’t support it, and without adding a lot of markup to your forms to specify the mask format and type. This is easy, using the principles I laid out in an earlier article on using classes to specify data types. This technique is 100% appropriate because classes aren’t just hooks for CSS, they’re general-purpose processing information. This lets you easily specify a) which inputs get masks, and b) which type of mask they get.

How it works

To add masks to form fields, reference my library, then make the page’s load event fire the Xaprb.InputMask.setupElementMasks() function in my library. This will find all elements with the class input_mask, which specifies that the element should get a mask. Each element should also have a mask_??? class, where the ??? specifies which mask to attach. The library takes care of the rest.

By the way, this library depends on the Prototype library, so you will also need to reference that in your page. If you don’t, you won’t get an error, but nothing will happen.

The setup function iterates over the elements and connects a callback to the onkeypress event. The callback is created by another function. To decide which mask to apply, it does a regular expression match against the element’s className. If the element’s class is “input_mask mask_date_us“, the regular expression captures “date_us,” and looks up the date_us mask. Here’s how that is defined:

      date_us: {
         format: '  /  /    ',
         regex:  /\d/,
      }

The format property is a string with spaces where input should go, and other characters get inserted automatically. The regex property is a regular expression that matches a valid character, in this case a digit.

Here’s how the callback function works: when it fires, it checks each character in the form field’s value. If there’s a space in that place in the mask’s format string, it looks to see if the character matches the mask’s regular expression. If so, the character is valid for that place in the input; if not, the character is rejected. If there isn’t a space in that place in the format string, the character from the format string is copied into the form field (this is how separators are automatically inserted).

Demonstration

Enough talk, let’s see it in action. This demonstration of Javascript form input masks shows a few of the masks I discussed above: US date, time, and phone number.

If you like the way the form input fields look, you can thank the fine folks at Particletree. I borrowed the styling from their article on how to make forms suck less (it makes the borders of the input areas easier to see).

Limitations

Since this is really just a hack on top of existing HTML form inputs, there are some things that will never work quite as well as a natively designed widget (the same is true for my JavaScript Combo Box widget). Here are some of the limitations:

  • No unicode or international characters (this might be easy to fix).
  • No spaces as placeholders. Sometimes you might want spaces between user input, rather than non-space separators.
  • Only one type of character for the entire input; you can’t constrain the first character to be a digit, and the second a letter.
  • It doesn’t show the mask ahead of time and let the user ‘fill in’ the missing characters; instead, it reveals the mask as the user types.
  • You can’t have two adjacent separators.
  • You can’t type into the middle of the text; all input you type is appended to the end.
  • It hijacks things like Ctrl+A to select all.

Despite the length of that list, these are such minor things (except for maybe international characters) that it’s practically a complete implementation. And as far as I know, everything here could be solved easily. I just haven’t done it, because you haven’t yet told me which things are problems for you (hint, hint: leave a comment, and patches are very welcome). I deliberately kept things really simple in this first version. Future versions can get fancier, or not.

Conclusion

So that’s it! Simple, lightweight, intuitive input masks. With a proper form validation library on the back-end, you should be able to use this to help your users enter data in the format you desire. Again, let me know what you think, and by all means improve this, and send me the results!

Technorati Tags:No Tags

You might also like:

  1. JavaScript date chooser
  2. Javascript date parsing and formatting, Part 2
  3. JavaScript number-formatting library updated
  4. JavaScript date parsing and formatting, Part 1
  5. How to find and fix invalid character data in MySQL

A PHP implementation of the XML DOM

Download dom4php

Several years ago I wrote a pure PHP library for manipulating XML documents with the Document Object Model (DOM) in PHP 4, without external libraries such as libxml. This is often useful on shared hosting providers, where you can’t get C extensions installed. The library uses PHP4’s built-in SAX functions, which are enabled by default. Today I’m re-releasing this library under the LGPL.

Introduction

It’s not too hard to build a DOM implementation on top of SAX. In fact, many DOM libraries actually use this technique. You just need to know the DOM core specification really well, and understand SAX really well. Everything else is easy, haha. The truth is, I don’t know how well I knew the spec back then, and I’ve no time to check right now, so you’ll have to let me know.

Since I wrote this years ago, before I was enamored of unit testing, I don’t know how good it is. I’ve used it for several years in production systems without ever looking at the actual code again — I just use it and take for granted that it works. I may or may not have time to actually write tests for it (probably not, sorry). Maybe you can help me with that. It shouldn’t be hard, but I just don’t have the time for it.

If you do want to hack the source, I encourage you to be ready to use a debugger. Getting references right is the tricky part. There are lots of references to be built and manipulated in a structure as complex as the DOM, and handling references correctly in PHP 4 is anything but easy for most people.

Documentation

I never wrote much documentation for this library, but I might attempt to remedy that at some point (I probably don’t have time though — sorry). In the meantime, here’s a synopsis to get you started:

<?php

# Create a parser and parse a simple document.
include_once("XmlParser.php");
$parser   = new XmlParser($encoding = 'ISO-8859-1'); # encoding is optional
$document = $parser->parse('<p class="test"><strong>this is a document</strong></p>');

# Add a text node.
$text =& $document->createTextNode('foozle');
$document->childNodes[0]->appendChild($text);

# Navigate around the document a bit, starting at the new node we just added.
$strong =& $text->previousSibling;
echo "The content of the node is '" . $strong->childNodes[0]->data . "'\n";

# Serialize the XML document to a string.  Do NOT use print_r() as the cyclic
# data structures will cause problems.  Instead, create an instance of the
# XmlSerializer class.
include_once("XmlSerializer.php");
$serializer = new XmlSerializer("XML");
echo $serializer->serializeNode($document);
echo "\n";

?>

The real documentation is the DOM core specification, as I said. The object you get back from calling parse() is a Document, and you just use the DOM as normal after that.

Differences from the DOM spec

The DOM spec is pretty heavy-weight, and coding something like this in pure PHP isn’t as efficient as using a C library. I made a couple of compromises for simplicity, performance, and convenience. The result should be a nearly complete DOM implementation, with much less code and overhead than it would take to follow the spec exactly. Here are the differences from the official specification:

  1. ID attributes (refer to the XML spec if you don’t know what that means) are assumed to be named “id” and are kept in a lookup table with the document. This makes sure you can’t duplicate an ID, and provides fast access to any element by ID. If you need to change the name from “id” to something else, you can do that.
  2. Attributes aren’t object-ified. Instead, attributes are stored as a lighter-weight associative array with each Node. You can set and retrieve attributes with object methods, but they aren’t objects themselves.
  3. Node contains some convenience methods not found in the official spec. These are, for example, getElementsByAttributeValue(). Most of them are only used internally, but a few are meant for external use too.
  4. Many of the interfaces in the official spec aren’t really necessary for an 80% solution, including DOMImplementation and NamedNodeMap. I omit those.
  5. No support for namespaces or namespace methods (e.g. createAttributeNS)

There may be other differences too, but I can’t think of them right now. Write into the comments if you see anything I missed. By the way, if you need some of the missing pieces such as NamedNodeMap, I can provide skeleton classes for you; I originally coded them, but then deleted them.

License

I’m releasing this under the GNU LGPL. At one time I had licensed it under the normal GPL, but this isn’t appropriate for a library, so I’m re-licensing it.

Feedback welcome, and thanks for all the fish

Please do leave feedback in the comments. Since I wrote this years ago and haven’t really thought about it since then, I have no idea how good it is — I can only say I haven’t run into any bugs in a while. Maybe I haven’t implemented some things I should have, or maybe there are braindead things I’ve done, who knows. Regardless, I hope you find it helpful.

See you next time!

Technorati Tags:No Tags

You might also like:

  1. Why not to use CSS for columnar layouts
  2. How to write unit tests for ease of refactoring
  3. Simple and complex types in XML Schema
  4. Automatic image captions with unobtrusive JavaScript