Archive for November, 2005

How to analyze statistics from SQL Query Analyzer

Microsoft SQL Query Analyzer prints out statistics as text, and it can be tough to wade through it all. Here is an awk script for quickly analyzing those statistics.

Analyzing the stats

First, remember you can use SQL Server Profiler to measure your query’s performance. Profiler is a very powerful tool, and everyone writing queries should be familiar with it.

For one-off queries, though, it may be faster to simply turn on the desired statistics and run the query, then grab the output and analyze it directly. Here is one quick way to analyze the output caused by set statistics io on and set statistics time on: use awk!

Here is a simple awk program. Save it to a file named sql-stats.awk:

/CPU time/ {
    cpu += $4
    elapsed += $9
}
/Scan count/ {
    scans += substr($5, 1, index($5, ",") - 1)
    reads += substr($8, 1, index($8, ",") - 1)
    physi += substr($11, 1, index($11, ",") - 1)
    ahead += substr($14, 1, index($14, ".") - 1)
}
END {
    printf("Scans:            %7d\n", scans);
    printf("Logical reads:    %7d\n", reads);
    printf("Physical reads:   %7d\n", physi);
    printf("Read-ahead reads: %7d\n", ahead);
    printf("CPU time:         %7d ms\n", cpu);
    printf("Elapsed time:     %7d ms\n", elapsed);
}

Now follow these steps to sum the statistics for easy consumption:

  • get awk, if you don’t have it
  • turn on the desired statistics
  • execute your query
  • switch from the results tab to the messages tab, and copy the messages to the clipboard
  • paste the messages into a text file called stats.txt
  • execute awk against the file as follows:
C:> awk -f sql-stats.awk stats.txt

The results should look something like this:

Scans:               1110
Logical reads:     531208
Physical reads:       187
Read-ahead reads:  131895
CPU time:          146922 ms
Elapsed time:      200718 ms

Watch out for query caches!

Remember, for unbiased results, you need to initialize your caches to a known state before comparing queries:

dbcc freeproccache
dbcc dropcleanbuffers

Please share if you have other ideas. You could get as fancy as you want with awk, but this solves my common need.

Technorati Tags:No Tags

You might also like:

  1. How to find missing values in a sequence with SQL
  2. Version 3.0 of mysqlreport released

The four essential tools for software engineers

I rarely blog on other people’s content, but I can’t agree more strongly with this list of essential tools for software engineering:

There are an endless pile of tools engineers are fond of using in their development process, but there are only four that they really need:

  • Editor
  • Compiler
  • Version control
  • Bug tracking

The first two tools are for the individual, and the second two are team tools, at least in most cases. Which tools are used, and how well they work, depends on many factors; in a professional environment, it’s largely a corporate culture issue.

Technorati Tags:No Tags

No related posts.

Why not to use CSS for columnar layouts

Using DIV tags and CSS to format a single text into multiple columns is just as bad as using tables for layout. This article explains why.

Everyone agrees tables are not for layout, because tables are supposed to contain tabular data. Similarly, DIV elements aren’t the right tool for columns, because they are supposed to group content into a generic container — either to group related elements simply for grouping’s sake, or to identify actual content and apply meta-data such as language or class. From the spec:

The DIV and SPAN elements, in conjunction with the id and class attributes, offer a generic mechanism for adding structure to documents.

The keyword is structure. Using DIV elements to lay out text into columns isn’t structural, it’s presentational. Consider how columnar text is typically accomplished: DIV elements, floated left so they’ll stack next to each other. This is not semantically meaningful! In particular, the text must be split into multiple DIVs to get it into columns. Now the text, which is one piece of content and should not be divided, has been divided as though it’s many pieces of content.

Many people don’t like to admit it, taking a sort of “end justifies the means” attitude and forcing the content into columns with hacks, but HTML and CSS currently provide no mechanism for presenting a single text in multiple columns (CSS 3, which is not finalized, will provide a mechanism, and beta versions of Firefox currently support it). What bothers me the most about the hacks is those who approve of and admire them. Violating standards is nothing to be happy about or advocate. Those who do so are certainly not on the cutting edge. The cutting edge is about semantics and standards, not about ignoring semantics and violating standards.

If you must put your text into columns, there’s a good AListApart article about laying out columns with JavaScript. The advantage of this (as opposed to actually coding separate DIVs) is that the content is in a single DIV until the JavaScript works on it. In other words, the document your Web server delivers to the user agent has all the content in a single DIV. Only JavaScript-enabled user agents will see multiple columns. For practical purposes, that really means only visual browsers at this point. This is acceptable to me, because it doesn’t mess with the structure of the document, unless it’s in a user agent that doesn’t actually present structure to the user anyway.

For purists though, there’s no getting around it until CSS 3: anything you do to format a single text into multiple columns is a hack, also known as an abuse. How quick we are to criticize people who used tables for layout, but how soon we forget our high-minded principles when we can’t have what we want.

Technorati Tags:No Tags

You might also like:

  1. How to use CSS to go beyond separation of content and presentation
  2. Simple and complex types in XML Schema
  3. How to display an HTML table as a folder tree
  4. Automatic image captions with unobtrusive JavaScript
  5. How to install beautiful X11 cursors

RAISERROR severity levels demystified

RAISERROR’s severity levels are defined in the Sql Server Books On Line, and give vague guidance about which severity level is best. The short and sweet is as follows:

  • below 11 are warnings, not errors
  • 11-16 are available for use
  • above 16 are system errors
  • there is no behavioral difference among 11-16

The best thing to do is just get everyone to use the same value and reduce confusion. It may be useful to use different values, to signal a desired behavior to the code that catches the error. In this case, a company standard or policy is in order. Also, keep in mind even though you may not use severities this way now, you very well might someday — in which case, you’ll be happy you started using them consistently now.

Technorati Tags:No Tags

No related posts.

How to Break Web Software

I recently did a technical review of How to Break Web Software: Functional and Security Testing of Web Applications and Web Services, by James A. Whittaker and Mike Andrews. My thoughts: it’s well worth reading.

Though what I reviewed wasn’t yet a final draft (my job was to help find technical and other errors, make suggestions on organizing the information and so forth), the content was (mostly) all there. I own other work by Whittaker, and I encourage anyone who’s interested to read this book when it’s published. It is appropriate for a variety of audiences. I will not recommend anything to anyone — I don’t want to be in the position of giving advice, for several reasons — but allow me to “suggest that you consider owning a copy” if you make things people will view with a web browser. Here’s a sampling of folks for whom I consider this book especially relevant:

  • a tester
  • a programmer
  • a team leader or manager
  • a creative copywriter

Security is everyone’s job, and basic understanding of how security can be violated will change the way you think. It’s a good thing.

I am not paid for this veiled endorsement. It is unsolicited, but approved by the authors :^).

Technorati Tags:No Tags

You might also like:

  1. Progress on High Performance MySQL, Second Edition
  2. What is your favorite database design book?
  3. Why you need to read this blog

How to make file names cross-platform

Using a filesystem, such as an external hard drive, across mutiple platforms can sometimes cause annoying behaviors. For example, filenames may be changed to all uppercase or all lowercase, which may cause problems on a system with case-sensitive filenames or vice versa. Here are my suggestions for minimizing conflicts.

Use only letters and numbers, underscores, and dashes

MS Windows prevents special characters in filenameAvoid using characters that any system treats as special characters, not just your own. PCs running Microsoft Windows will generally not allow you to type a special character into a filename, but Mac OS X will. UNIX and Linux will too, if you escape the characters in the argument (though Linux/UNIX won’t allow / even if you escape it). Windows can be forced to accept : as part of a filename, but the resulting file’s name apparently truncates at the :.

I have a friend who uses Mac OS X and needs to share files via an external drive with a PC running Windows at her job. She used the / character in some filenames, which prevented her from copying files to the drive. I recommend avoiding the characters \ / : * ? " < > | no matter what OS or filesystem you are using. These characters are not portable.

For maximum safety, use only letters and numbers, dashes, and underscores.

Use mixed-case names

The next trouble I experienced was with another external hard drive formatted as FAT32. The same friend gave me a website to rewrite. I copied the files from her hard drive to the Windows XP laptop my employer provided, rewrote the site, and then copied it back. Unfortunately, in the process every all-lowercase file and directory name got converted to all-uppercase. I believe this happened when she used her Mac to read the files back off the drive. I’ve experienced similar problems with a Linux driver for VFAT filesystems, such as USB flash drives, except in this case every all-uppercase filename got translated to all-lowercase! I believe this is some filesystem driver “being smart,” though I haven’t investigated further. I think the filesystem driver is deciding “well, it’s all uppercase, so obviously it’s some DOS thing, and lowercase filenames are more readable…” Yuck. This wasn’t just a case of ugliness, either. The destination for the website was a Linux server, where filenames are case-sensitive, so dozens of links broke.

In each case, when the file or directory name was mixed-case it survived without mangling. This led me to my next filesystem portability decision: from now on, I’m going to use InternalCapitalLetters to name files. I typically like lowercase with dashes because it’s easier to type, but I’ll do a little extra work to save myself these types of troubles in the future.

Always treat filenames as case-insensitive

Another lesson I’ve learned is to treat every filesystem as though it is case-insensitive. I personally abhor case-insensitive anything, such as the way SQL or VB are case-insensitive. Filesystems, such as Microsoft Windows filesystems, are no exception. The reality is I have to work with them, though. What this really means in practice is avoiding filenames in a single directory that differ only by letter case, such as File.txt and file.txt. This is not a big deal to do, but if I don’t do it, it becomes a big deal the moment I try to view a directory with such files on a case-insensitive filesystem.

I can’t recall where I saw this, but I also believe I once saw a filesystem where a file and a directory could be named the same thing without a conflict. I don’t know about other filesystems, but in UNIX a directory is just a file like any other, so I’m not sure where I’d have seen this behavior. If I’m not dreaming it up, though, it’s something else to watch out for.

The bottom line

The bottom line is, from now on I’m going to:

  • avoid special characters, and go for the least common denominator
  • use mixed-case letters to ensure a “smart” filesystem driver doesn’t mangle my filenames
  • make sure I don’t use identical-except-for-lettercase for filenames in a directory
  • ask others to do these things too, if I have to work with them
Technorati Tags:No Tags

You might also like:

  1. Windows XP’s built-in unzipping functionality is not trustworthy
  2. How to use Linux’s CONFIG_IKCONFIG_PROC feature
  3. How to find and fix invalid character data in MySQL
  4. How to install and maintain multiple WordPress blogs easily
  5. My personal SQL coding standards

Efficient alternatives to Microsoft SQL Server’s openxml

Microsoft SQL Server’s openxml function is highly CPU-intensive. This article demonstrates more efficient alternatives to openxml.

Why people use openxml

A common usage for openxml is to use XML to pass a “list” or “table” of values into a stored procedure. Here is a scenario I’ve seen a lot at my current job:

  1. An ASP accepts a string as input, such as “ipod car adapter”
  2. The ASP parses the input into space-separated values, then concatenates them together into XML
  3. The ASP passes the XML to a stored procedure
  4. The sproc uses openxml to populate a temporary table from the XML
  5. The sproc does some work (such as searching a product catalog)

This is fine, except openxml can cause a huge CPU spike, which is bad news in a Web-facing search server during holiday shopping season.

I have rewritten such a search sproc to remove openxml, and found it to account for upwards of three-fourths of the total cost, even in an otherwise very expensive sproc. I have anecdotal wisdom from other DBAs about people who over-use openxml and end up with systems that run hot all the time.

A list of simple delimited values doesn’t need XML — it’s overkill. A better alternative is to pass the string directly into the sproc, and write a UDF to return a table with the elements of the string. It’s still string parsing, but it’s better than using XML.

Solution 1: a WHILE loop in a user-defined function

Here is a user-defined function that will split a delimited set of words into a table and return the table:

CREATE function dbo.fn_SplitWords (
    @Words varchar(8000),
    @Delim char(1))
    returns @Words_table table(word varchar(255), ident int identity not null)
as begin

    declare @Wordstart int, @WordEnd int, @DoubleDelim varchar(2)
    
    set @Wordstart = 1
    set @WordEnd = 1
    set @DoubleDelim = replicate(@Delim, 2)

    -- Prepare the data
    while charindex(@DoubleDelim, @Words) > 0
        set @Words = replace(@Words, @DoubleDelim, @Delim)
    -- Left-trim it
    if left(@Words, 1) = @Delim
    begin
        if @Delim = ' '
            set @Words = ltrim(@Words)
        else
            set @Words = substring(
                @Words,
                patindex('%[^' + @Delim + ']%', @Words),
                len(@Words) - patindex('%[^' + @Delim + ']%', @Words) + 1)
    end
    -- Right-pad it
    if right(@Words, 1) <> @Delim
        set @Words = @Words + @Delim

    while @Wordstart > 0
    begin
        -- Extract the next word
        set @WordEnd = charindex(@Delim, @Words, @Wordstart)
        if @WordEnd > @WordStart
        begin
            insert into @Words_table select substring(@Words, @Wordstart, @WordEnd - @Wordstart)
            set @Wordstart = @WordEnd + 1
        end
        else set @Wordstart = 0 -- Terminate the loop
    end
    return
end

Please note the bug I’m avoiding in the while loop above. I explain the SQL Server 2000 replace bug in another post.

Here are some test calls for the UDF:

select * from dbo..fn_SplitWords('this is a test call', ' ')
select * from dbo..fn_SplitWords(' this is a test call', ' ')
select * from dbo..fn_SplitWords('this is a test call ', ' ')
select * from dbo..fn_SplitWords('this-is-a-test-call', '-')
select * from dbo..fn_SplitWords('this is a test      call', ' ')
select * from dbo..fn_SplitWords(' ', ' ')
select * from dbo..fn_SplitWords('', ' ')

After I wrote this, I saw someone else did the same thing elsewhere, though in my opinion very poorly implemented and explained. In any case, here’s a link for the sake of completeness: Treat Yourself to Fn_Split()

A more efficient approach

Instead of using a loop as I did above, it’s actually much more efficient to use an integers table and a JOIN to parse the tokens apart. This approach is slightly less flexible, and doesn’t handle all the special cases I handled above with my UDF, such as tokens being separated by several delimiters instead of just one. Regardless, it is absolutely a better way to go, as long as the input is well-formed. It doesn’t use any nonstandard SQL, either — it’s a relational solution to the problem. Here are three resources where you can learn more about this extremely elegant technique:

Other advantages to a UDF

This approach has other advantages over openxml, too:

  1. it’s easy to string-ify a table for input to another sproc from within SQL (see my article about concatenating strings in SQL for more)
  2. it is much simpler, needs much less code, and is easier to understand and maintain
  3. the UDF can be used directly in a FROM clause; there is no need to create a temporary table or table variable unless you want to store and re-use the values
  4. you can use table variables instead of temporary tables, saving disk I/O and locks in the tempdb database
  5. you can pass more data without running into upper limits on the size of varchar, because space-delimited is much less verbose than XML

I would be remiss if I didn’t mention the downsides:

  1. string-parsing is never efficient, and can be error-prone
  2. you will need to create and maintain UDFs (in my case, I need at least two — one for strings and one for integers)
  3. you have less flexibility about types and schemas; this technique is only convenient for simple cases

More about efficiency

SQL server seems to be smart enough to reuse resources within a query batch, so the high cost of using openxml only seems to happen on the first invocation in a batch. When I benchmarked it with a thousand iterations, the string-parsing solution’s constant cost appeared to be about half the constant cost of openxml — not a significant improvement. However, in the common case where it’s used only once, the string-parsing is much more efficient because there is no startup cost.

All in all, I think string-parsing is the lesser of the evils.

Technorati Tags:No Tags

You might also like:

  1. More alternatives to openxml

A bug in Microsoft SQL Server’s replace() function

I have found a bug in Microsoft SQL Server’s replace() function. The function treats char and varchar differently, and replacements will fail in some cases. Here is a demonstration of the bug:

declare @string varchar(40),
    @find char(2),
    @replace char(1)
set @string = 'two  spaces'
set @find = '  '
set @replace = ' '
if replace(@string, @find, @replace) = 'two spaces'
    print 'Replacement worked'
else
    print 'Replacement failed'

Result: “Replacement failed.” This is clearly incorrect behavior. To make it more interesting, here are some test cases that succeed:

  1. Change the type of @find to varchar(2).
  2. Change the value of @string to '  ' (two spaces).
  3. Use replicate(@replace, 2) instead of @find.

I found another discussion about this on Google groups, but it’s incoherent, doesn’t demonstrate the problem clearly, and speculates about the internal causes rather than suggesting ways to avoid the behavior (pointless). So I count that as “not posted elsewhere” and offer it here.

Technorati Tags:No Tags

You might also like:

  1. Efficient alternatives to Microsoft SQL Server’s openxml
  2. How to simulate the GROUP_CONCAT function
  3. How to find and fix invalid character data in MySQL
  4. More alternatives to openxml
  5. Less is more

Seldom-used HTML form elements

Some of the neatest parts of standard HTML are the least-used. Forms in particular can become much more usable and accessible when marked up correctly with a few standard HTML elements. These are LABEL (and the related CSS outline property), OPTGROUP, LEGEND, and FIELDSET.

Before I launch into this article, I challenge you to read the list of HTML 4 elements and ask yourself how many of these you use regularly. If you are like most web designers (including myself), you may over-use familiar elements and attributes, and perhaps not even realize some of the elements exist. The good news is, now is a great time to start using HTML’s capabilities more fully, because even though you may not know about or use them, browsers support them very well, and have for years.

LABEL elements make forms act more like traditional GUI apps

LABEL elements work in concert with a form input. You already know what they are, because they are used in almost all user interfaces in traditional GUI software. The label “targets” an input by ID. For example, if a website has a login form like so,

<form action="login.cgi">
<input type="text" name="username" id="username" />
Username
</form>

You might surround Username with a LABEL, thusly:

<form action="login.cgi">
<input type="text" name="username" id="username" />
<label for="username">Username</label>
</form>

The browser will then activate the username field when the user selects the LABEL. LABELs are particularly useful for accessibility, but they make forms more usable, too. Consider how you are accustomed to interacting with a list of radio buttons, each with text beside it. In a typical GUI application, the radio button is NOT used to select which option is the active one. Instead, it merely serves as a visual indicator of which option the user chose. Most users (you can verify this for yourself) will select the text, not the radio button, because they know it is “clickable.” Many users will also try clicking on the text in web pages, before they realize they have to click on the radio button itself. Even users who are used to surfing the web will first automatically try clicking on the text, then click on the button! Providing LABELs makes the text “clickable” and meets the user’s expectations. It also makes the form easier to use, because it is not necessary to use the mouse as carefully; there is more clickable area.

Here is a sample form, for your testing pleasure:

I like my forms to be…




A word about styling: Internet Explorer styles labels with a dashed button when the input is selected (In IE, HTML form elements are actually rendered by the underlying OS, not the browser). Firefox does not. CSS2 does provide for styling, which should allow the designer to mark up labels so they look and feel as expected across all browsers. The outline property is meant for this, but unfortunately does not have broad and consistent browser support yet. However, it’s easy to add styles to hint that the label is attached to the input. For instance, adding a background color, making the cursor into a “hand,” and so forth can all help.

OPTGROUP creates a menu of logically grouped choices

The OPTGROUP element groups OPTIONs in a SELECT menu. You have doubtless seen SELECT menus (perhaps even written some yourself!) like the following:

This typically comes with some server-side or client-side code to make sure the user selects one of the “real” entries, not one of the “headings.” This is exactly the problem OPTGROUP was designed to accomplish. Here is the form rewritten with standard, plain vanilla HTML (nothing fancy here! Netscape 4 can even handle this!):

FIELDSET and LEGEND create logical groupings of inputs

Just as OPTGROUP groups related values together, FIELDSET groups related form inputs together. Again, this should be familiar from GUI applications. Preferences dialogs are one common place where you see this metaphor all the time. Here is a quick example:

Goofiness of Example
Wickedness of Example

I think this example should speak pretty clearly for itself.

Summary

Good old HTML 4: it’s more than half a decade old now. In terms of the Internet, that’s a lifetime. It’s good to be familiar with its full capabilities, and more importantly, to know when to use or not to use a particular HTML element.

Technorati Tags:No Tags

You might also like:

  1. How to implement CAPTCHAs without images
  2. How to create input masks in HTML
  3. How to label Excel and OpenOffice.org XY scatter plots
  4. How to create a VB6 console program
  5. More alternatives to openxml

How to guard your privacy with blacklists and whitelists

This article explains how to protect your privacy online, without impairing your ability to surf, by creating a content blacklist and a cookie whitelist. In my opinion, this is the most effective combination of privacy controls that doesn’t inhibit normal browsing.

Motivation: don’t tell “them” any more about myself than I have to

I work in Internet commerce, and I’m excruciatingly aware of the dozens of companies whose sole purpose is to gather statistics about surfers. They tie these statistics to personally identifiable information in a variety of ways, and they know more about you than you’d ever believe. What usually happens is Company X (a fictional e-commerce site) will purchase services from Tracker Inc (a fictional company) to help them understand how surfers use their own site. Company X’s privacy policy usually says your personally identifiable information isn’t tracked or disclosed, and they’re being honest. It becomes an issue when Tracker Inc places a cookie on your computer while you browse Company X’s site. This cookie persists, even when you leave that site, and when you browse Company Y’s site, the cookie serves to tell Tracker Inc “this is the same person.” Tracker Inc only does for Companies X and Y what they promised: they analyze site usage patterns, etc. But they are also independently aggregating the information they gather, and analyzing it in a much broader context. Then they sell it.

How companies track surfers

The most common way to track surfers is a combination of images (sometimes called beacons, 1-pixel GIFs, tracking GIFs and so forth) and cookies. Company X’s website will have an image embedded in it. The image is hosted by Tracker Inc, so every time you view a page on Company X’s site, Tracker Inc gets a request for an image. The request typically contains encoded information. As a simple example, when you request http://www.companyx.com/shoppingcart.html, your browser might also request an image from http://www.trackerinc.com/tracker.gif?cartitem=ipod40gb. At the same time, your browser gets a cookie set on it, such as globalid=1234. This cookie identifies you when you go to Company Y’s website.

How effective is anti-spyware software?

There is a variety of software available to block cookies and such. Caveat emptor! Much of this software is spyware itself! Some of it is also poor-quality and may cause your computer to crash, slow down web browsing, interfere with the proper functioning of pages, and so forth. More to the point though, it doesn’t block your browser from requesting the image from Tracker Inc. It just prevents Tracker Inc setting a cookie. Tracker Inc can still track you, because when they get the image request, your IP address goes along with it. They can use this to identify you even without a cookie. The best thing to do is prevent your browser asking for the image to begin with. You don’t need it. It’s probably a little 1-pixel image hidden in the footer or somewhere else inconspicuous. Its only purpose is to track you.

How to set up a cookie whitelist and a content blacklist

I recommend a combination of cookie whitelists and content blacklists to protect your privacy. I use Mozilla Firefox and recommend you do too, for a variety of reasons: it is more secure, faster, adheres to standards, etc. Most importantly for my purposes, it gives you fine-grained control of privacy and allows extensions, one of which is a great content-blocker. In other words, your browser can guard your privacy without requiring you to install other software. The remainder of this article explains how I set Firefox up to blacklist content and whitelist cookies.

First, I go look at my cookies in the preferences dialog under Privacy:Cookies (to open the preferences dialog, use the Tools:Options menu option in Windows, Edit:Preferences in Linux, and Firefox:Preferences in Mac). I just scan the cookies I see there. Some of them are bound to be “good” or “useful,” such as my email provider, del.icio.us (bookmarks), RSS feeds and so forth. Others are not. These are ones like doubleclick, bluestreak, statcounter, hitbox and so on. These are the Bad Guys (I’m not picking on them exclusively; I’m just using them as examples). I don’t block cookies from them, however! That would be a blacklist. I’m just making a list of the Bad Guys I see.

Why look at cookies if I’m not going to block them? Well, I didn’t get cookies from the Bad Guys by accident. The cookies are there because my browser requested an image, ad, or other content from them. These cookies are clues to domains whose content I need to block. I feed these domains into my AdBlock pattern. If you don’t know what AdBlock is, head over to their website and check it out. It’s designed to block content, which can be used both to remove obnoxious ads and to block invasions on your privacy. Once I’ve blocked the domains, I shouldn’t get any more cookies from them, and I won’t be requesting tracking GIFs from them. They should be prevented from tracking me. It’s feasible to make a Bad Guy blacklist because the market space for Bad Guys is fairly small (there is not much competition in the surfer-tracking, statistic-gathering market).

While I’m there, I also make a list of cookies I think my browser needs for my favorite sites to work right. I’ll feed this into my cookie whitelist below. I also occasionally turn cookies back on and just see who is setting cookies on my computer. This helps me discover more Bad Guys (though I haven’t found any new ones in a long while).

The second part of my method for making my browser only do what I tell it is simply whitelisting cookies. If I deny all cookies, except those from specific domains, I don’t have to be bothered by constant “do you want to allow this cookie?” dialogs, and I don’t have to forbid cookies from all possible Bad Guys. A whitelist says NO cookies are allowed, except from certain cites. This is different from a blacklist, which says ALL cookies are allowed except from certain sites. If I decide I want a cookie for a site, I tell my browser to allow it. A cookie whitelist is feasible because the number of sites that really need cookies on my browser to work is very small. Generally, I only need to allow sites to set cookies if I want to log in for shopping and such.

I’ll break it down into steps:

  1. Open the preferences dialog as above.
  2. Delete existing cookies so you can start with a clean slate. (DO NOT choose the “Don’t allow” option. This is a blacklist, not a whitelist; you don’t want a blacklist).
  3. Click OK to dismiss the cookie dialog. You should still have the preferences dialog open.
  4. Un-check the checkbox to “allow sites to set cookies.”
  5. Click the “Exceptions” button and enter sites that should be able to set cookies. I usually enter the top-level domain of the site. For example, if I want to get into http://members.site.com, I allow cookies from site.com.
  6. If at any point a site isn’t working right (won’t let me log in, etc) I just add it to my whitelist. Easy.

Whether I need a cookie whitelist is debatable. If I’ve blacklisted all content from the Bad Guys, maybe I don’t; maybe I can just allow anyone to set cookies. You decide for yourself. One of the downsides to the cookie whitelist is I might start fetching content from a Bad Guy without knowing it (since the images are tiny, I usually only find out about them by looking at cookies).

I’m not naive. I don’t pretend this is a perfect solution, but it makes me feel better about my privacy online. I hope it has helped you too.

Technorati Tags:No Tags

You might also like:

  1. Permit Cookies: a Firefox extension that makes cookie whitelisting easy
  2. Copyright statement, privacy policy and terms of use
  3. How to exploit an insecure order of access to resources
  4. Stock images are too popular
  5. An alternative to canonical URIs