Friday, April 13, 2007

10 Worst AdWords Campaign Management Mistakes

On today’s highly competitive Google AdWords pay per click (PPC) search engine, it is now more important than ever to ensure that your PPC campaigns are optimized to their utmost potential. You should be achieving maximum return on investment (ROI) for the keywords or phrases that are most relevant to your business and are most likely to provide you with targeted traffic to your website. With ever growing cost-per-click (CPC) prices throughout the various PPC search engines it is essential that you avoid certain mistakes that will undoubtedly result in poorly performing PPC campaigns.

The Mistakes to Avoid

  1. Long list of less than targeted keywords
  2. Not identifying unique aspects of your product or service
  3. Lack of keywords in your ad text
  4. Directing users solely to your home page
  5. Creation of single Ad Groups
  6. Utilizing single campaigns
  7. Using broad match only
  8. Failure to optimize Ad Serving for your ads
  9. Not tracking results
  10. Entering the content network without modifying bids

Long List of Less Than Targeted Keywords

When you first set out to create your AdWords campaign it is of utmost importance that you do not go "keyword crazy", what this means is that you must not create long lists of irrelevant and generic keywords. For example if you were an automotive dealership then it would not be in your best interest to target the keyword "truck". The reason being is that the cost per click (CPC) for such a generic keyword would be incredibly high when compared to a more descriptive relevant keyword such as "T-Z783 Extended Cab". An example of an irrelevant keyword which would not produce conversions if you strictly conducted automotive sales would be "tail light covers" the phrase may bring visitors to your website but if they do not find what they are looking for when they get there they will be gone just as quickly as they arrived.

Not Identifying Unique Aspects of Your Product or Service

Before implementing your AdWords campaign you must first understand exactly what it is that makes you stand out from your competition. By identifying your unique products or services you will have a lot more clarity on how to rise above your competitors and zone in on the keywords or phrases that are unique to your business. I would recommend that you perform an analysis of your competition, have a look and see what they are doing and which phrases they are using. After conducting a competition analysis and after understanding what makes your products or services unique you will be able to come up with a strategy that will topple your competitors.

Lack of Keywords in Your Ad Text

When creating your descriptive ad copy it is imperative that you find a means to inject your keywords in to your title and description while maintaining the delicate balance of clarity and relevance. Your ad copy should be tailored in such a way that when read by a visitor they know exactly what they are getting in to when they click on your ad, which brings me to my next point.

Directing Users Solely to Your Home Page

Taking the time to decide on which destination URL should be designated to which ad instead of pointing all ads in a campaign to your homepage is an oversight that I come across far too often. When you finish with compiling your list of relevant keywords that describe unique products or services of your business, why on earth would you then send everyone to your homepage and let them navigate through your site in hopes of finding what it is that they are looking for. Instead why not send them straight to the page that contains exactly what was described to them within your ad copy. As an example if you were an automotive dealership and your ad contains the keyword "T-Z783 Extended Cab" instead of sending them to www.auto-motive-dealership.com send them to www.auto-motive-dealership.com/T-Z783_Extended_Cab.html.

Creation of Single Ad Groups

By categorizing ads that are targeting related keywords into a common ad-group will create a much higher level of control that you have over your entire campaign. Let's say that you were a sporting goods store, start by grouping all ads targeted towards hockey skates into a single Ad Group. You would then create another ad-group which would be targeting hockey sticks and another containing hockey gloves and so on. By organizing your ad-group structure in this manner gives you the ability to create in-depth reports on the performance of each ad-group.

Utilizing Single Campaigns

Once you have your Ad Groups sorted out into easy to identify categories you may then move on to the next step of creating relevant campaigns. From the example above you have created Ad Groups containing separate products of hockey skates, sticks, gloves etc. Now it is time to create a container for all of the Ad Groups into one campaign entitled "hockey equipment". You would then repeat the process creating Ad Groups for tennis, one group for shoes one for racquets etc and then once again you drop them all into a single campaign entitle "tennis equipment". Having highly organized campaigns is the key to determine which ads are creating the optimal conversions.

Using Broad Match Only

When you do not take advantage of the phrase matching options that are available to you chances are you are missing out on potential customers and creating a higher CPC. Broad matches are usually less targeted than exact and phrase matching. Broad matching is the default option your ads will appear for expanded matches such as plurals or relevant keyword variations. When utilizing phrase match your ad will appear for search terms in the order that you specify and sometimes for other terms. Exact matching is by far the most targeted option to use. You will appear for the exact keyword specified. Negative keyword is also a fantastic option to utilize in order to specify which keywords you do not want to appear for.

Broad match- Default option: Blue widget

Phrase match- Surround the keyword in quotes: "blue widget"

Exact match- Surround the keyword in square brackets: [blue widget]

Negative match- Place a negative character before the keyword: -blue widget

Failure To Optimize Ad Serving For Your Ads

When you take advantage of AdWords Ad Serving service basically what you will be doing is showing your most popular ads more often. The AdWords platform will give weight to ads with the highest click through rates (CTRs) and display them more often then keywords with lower (CTRs) within the same ad-group.

Not Tracking Results

In order to have any idea on your AdWords campaign performance you must be able to see the keywords that work as well as those that do not. Google AdWords supplies a vast array of very useful tracking tools. Google has also built in to the user interface Google Analytics which is a marvellous web analytics tool that provides you with in-depth reporting on all aspects of your campaign performance. I can not stress enough the importance of creating goals for your AdWords campaign to measure your success by.

Entering The Content Network Without Modifying Bids

Within the AdWords platform you have recently been given the ability to set different bids for the content network compared to that of the search network. If you do not set different bids on the content network for certain keywords, you will be paying more per click than you should be. After lowering the prices on certain keywords you will notice that the amount of click throughs that you will be attaining will remain the same as they were at the higher bid.

Conclusion

The purpose for this article was to create awareness for common mistakes and to eliminate frustrations that may emerge when managing Google AdWords campaigns. The points mentioned above are compiled from management mistakes that I have stumbled upon time and time again in hopes to assist you in creating a marketing campaign that will generate dramatic increases to the profits of your business.

The Definitive Guide to Web Character Encoding

Character encoding. You may have heard of it, but what is it, and why should you care? What can happen if you get it wrong? How do you know which one to use?

We'll look into the details in a minute, but for now let's just say that a character encoding is the way that letters, digits and other symbols are expressed as numeric values that a computer can understand.

A file -- an HTML document, for instance -- is saved with a particular character encoding. Information about the form of encoding that the file uses is sent to browsers and other user agents, so that they can interpret the bits and bytes properly. If the declared encoding doesn't match the encoding that has actually been used, browsers may render your precious web page as gobbledygook. And of course search engines can't make head nor tail of it, either.

What's the Difference?

Why does it matter which form of encoding we choose? What happens if we choose the "wrong" one?

The choice of character encoding affects the range of literal characters we can use in a web page. Regular Latin letters are rarely a problem, but some languages need more letters than others, and some languages need various diacritical marks above or below the letters. Then, of course, some languages don't use Latin letters at all. If we want proper -- as in typographically correct -- punctuation and special symbols, the choice of encoding also becomes more critical.

What if we need a character that cannot be represented with the encoding we've chosen? We have to resort to entities or numeric character references (NCR). An entity reference is a symbolic name for a particular character, such as © for the © symbol. It starts with an ampersand (&) and should end with a semicolon (;). An NCR references a character by its code position (see below). The NCR for the copyright symbol is © (decimal) or © (hexadecimal).

Entities or NCRs work just as well as literal characters, but they use more bytes and make the markup more difficult to read. They are also prone to typing errors.

What Affects the Choice?

A number of parameters should be taken into consideration before we choose a form of encoding, including:

  • Which characters am I going to use?
  • In which encodings can my editor save files?
  • Which encodings are supported by the various components in my publishing chain?
  • Which encodings are supported by visitors' browsers?

Let's consider each of these issues in turn.

Character Range

The first parameter we need to consider is the range of characters we're going to need. Obviously, a site that's written in a single language uses a more limited range of characters than a multilingual site -- especially one that mixes Latin letters with Cyrillic, Greek, Hebrew, Arabic, Chinese, and so on.
If we want to use typographically correct quotation marks, dashes and other special punctuation, the "normal" encodings fall short. This is also true if we need mathematical or other special symbols.

Text Editor Capabilities

Some authors prefer to use regular text editors like Notepad or Vim; others like a point-and-click WYSIWYG tool like Dreamweaver; some use a sophisticated content management system (CMS). Regardless of personal preference, our choice of editors affects our choice of encoding. Some editors can only save in one encoding, and they won't even tell you which one. Others can save in dozens of different encodings, but require you to know which one will suit your needs.

Other Components

A publishing chain consists of more than an editor. There's always a web server (HTTP server) at the far end of the chain, but there can be other components in between: databases, programming or scripting languages, frameworks, application servers, servlet engines and more.

Each of these components may affect your choice of encoding. Maybe the database can only store data in one particular encoding, or perhaps the scripting language you're using cannot handle certain encodings.

It's not possible to enumerate the capabilities of all the different editors, databases, and so on in this article, because there are simply too many of them. You need to look at the documentation for your components before choosing the encoding to use.

Browser Support

Some encodings -- like US-ASCII, the ISO 8859 series and UTF-8 -- are widely supported. Others are not. It is probably best to avoid the more esoteric encodings, especially on a site that's intended for an international audience.

What is a Character Encoding?

A character is the smallest unit of writing that's capable of conveying information. It's an abstract concept: a character does not have a visual appearance. "Uppercase Latin A" is a different character from "lowercase Latin a" and from "uppercase Cyrillic A" and "uppercase Greek Alpha".

A visual representation of a character is known as a glyph. A certain set of glyphs is called a font. "Uppercase Latin A", "uppercase Cyrillic A" and "uppercase Greek Alpha" may have identical glyphs, but they are different characters. At the same time, the glyphs for "uppercase Latin A" can look very different in Times New Roman, Gill Sans and Poetica chancery italic, but they still represent the same character.

1560_encodingtable

The set of available characters is called a character repertoire. The location (index) of a given character within a repertoire is known as its code position, or code point.

The method of numerically representing a code point within a given repertoire is called the character encoding. Unfortunately, the term "character set", or "charset", has been used both for repertoires and for encodings, so it is best to avoid it altogether.

Encodings are normally expressed in terms of octets. An octet is a group of eight binary digits, i.e., eight ones and zeros. An octet can express a numeric range between 0 and 255, or between 0x00 and 0xFF, to use hexadecimal notation.

A Brief History

The early computers didn't have a standardised character encoding, but this didn't matter much, because computers could rarely communicate with one another back then. When inter-computer communication became possible, the need for encoding standards became apparent. A common early repertoire/encoding was EBCDIC, another was the American Standard Code for Information Interchange, a.k.a. ASCII. The U.S. version, US-ASCII, has been standardised as ISO 646.

ASCII uses only seven bits (ones and zeros), which means it can represent 128 numbers: 0 through 127, inclusive. The 0-31 range is reserved for C0 control characters and 127 is reserved for DEL (delete), which leaves a total of 95 printable characters. That's enough for the English alphabet in uppercase and lowercase, plus digits and some common (and, admittedly, some less common) punctuation. But it's not enough to take in the accented characters and diacritical marks necessary for many European languages, let alone any writing that doesn't use Latin letters. Mutually incompatible national versions of ASCII used to be commonplace, but they don't work for international information exchange.

The ISO 8859 series was an attempt to provide alternatives for languages other than English. It is a superset of ASCII, i.e., the first 128 code points are the same in ASCII and all versions of ISO 8859. But ISO 8859 uses eight bits and can thus represent 256 characters (0-255). It is therefore sometimes, incorrectly, called "8-bit ASCII". The range from 128 to 159 (0x80 to 0x9F) is reserved for C1 control characters.

The most common version for Western languages is ISO 8859-1, a.k.a. ISO Latin-1. It contains a number of accented versions of vowels, plus various special characters. It has now been replaced by ISO 8859-15, to accommodate the Euro sign (€).

ASCII and the ISO 8859 series are both character repertoires and encodings. The code points range from 0 to 127 for ASCII and from 0 to 255 for ISO 8859. The encoding is a simple one-to-one, since one octet can comfortably express the whole range. "Uppercase Latin A" has code point 65 (0x41) and is encoded as 65 (01000001).

Microsoft, never known for following someone else's standard when it can create its own, has also created a number of character repertoires/encodings. These were called "code pages" in DOS, and CP850 was the code page used for Western languages.

One of the most common Microsoft repertoires/encodings is known as Windows-1252. While very similar to ISO 8859-1, it's not identical. The range reserved for C1 control characters in the ISO encodings is used by Microsoft to provide certain handy characters that aren't available in the ISO series, such as typographically correct quotation marks and dashes.

For languages that don't use Latin letters, similar specialized repertoires/encodings were devised. The problem was that there was no repertoire/encoding that could be used for combinations of such languages.

Unicode / ISO 10646

The solution to this problem is called Unicode -- a character repertoire that contains most of the characters used in the languages of the world. It can accommodate millions of characters, and already contains hundreds of thousands. Unicode is divided into "planes" of 64K characters. The only one used in most circumstances is the first plane, known as the basic multilingual plane, or BMP.

The first 256 code points in Unicode are compatible with ISO 8859-1, which also means that the first 128 code points are compatible with US-ASCII. Code points in Unicode are written in hexadecimal, prefixed by a capital "U" and a plus sign (e.g., U+0041 for "uppercase Latin A" (code point 65, or 0x41)).

A version of Unicode that has been standardised by ISO is called ISO 10646 (the number is no coincidence; compare to US-ASCII's ISO 646). There are minor differences between Unicode and ISO 10646, but nothing that we mere mortals need to worry about.

ISO 10646 is important, because it is the character repertoire that's used by HTML.

But ISO 10646 is only a repertoire. We need an encoding to go with it. Since the repertoire can represent millions of code points, a one-to-one encoding would be very inefficient. We'd need 32 bits (four octets) for each character and that would be quite a waste, especially for Western languages. Such an encoding (UTF-32) exists, but it is rarely used. Another one is UTF-16, which uses two octets for each character, but it hasn't quite caught on.

Instead, a more efficient (for Western languages) encoding known as UTF-8 has become the recommended way forward. It uses a variable number of octets to represent different characters. The ASCII range (U+0000 to U+007F) is encoded one-to-one. For other characters, two, three or four octets are needed. In theory, UTF-8 can employ up to six octets to encode certain characters.

Which Encoding Should I Choose?

For an English-only site, it doesn't matter all that much. Unless you want to use some typographically correct punctuation (curly quotes, etc.), plain old US-ASCII will be sufficient. ISO 8859-1 has become something of a de facto standard for Western sites, and may be of interest if you prefer spellings like "naïve" or "rôle" or "smörgåsbord."

For those of us who need to write in some other Western European language, such as French, Spanish, Portuguese, Italian, German, Swedish, Norwegian, Danish or Finnish, ISO 8859-1 works quite well. Those who need the diacritical marks of Czech or Polish, or completely separate alphabets like Greek or Cyrillic, can choose from other versions of the ISO 8859 series.

As I've mentioned, specialized encodings exist for Hebrew, Arabic and Oriental scripts as well. But what if you need to mix English, Russian, Greek and Japanese on the same site? Or even on the same page?

I would recommend using UTF-8 wherever possible, since it can represent any character in the ISO 10646 repertoire. Even if you only write in English, UTF-8 gives you direct access to typographically correct quotation marks, several dashes, ellipses, and more. And if you need to write in Greek or Japanese, you can do so without having to muck about with entities or NCRs.

On a multilingual site, it's certainly possible to use different encodings for different pages, but think of the maintenance nightmare. Why not use UTF-8 for everything and stop worrying?

Unfortunately, though, a few minor problems are associated with using UTF-8 -- even in this day and age.

UTF-8 Problems

The first problem with using UTF-8 is that not all editors or publishing tools support it. You'd think that all software would support UTF-8 after all these years, but sadly this is not so.

The next problem is something called a byte order mark, or BOM. This is a sequence of two (UTF-16) or three (UTF-8) octets that tells a computer whether the most or least significant octet comes first. Some browsers don't understand the BOM, and will output it as text. Other editors won't allow us to omit the BOM.

A minor problem is that some ancient browsers don't support UTF-8 (even without the BOM). However, those should be few and far between these days.

ISO 8859 Problems

If you're publishing in English, French and German, and encounter problems with UTF-8, you may choose to go with our trusted old friend: ISO 8859-1. But there are still a few pitfalls to look out for.

Many editors under Windows will use Windows-1252 as the default (or only!) encoding. If you save files as Windows-1252 and declare the encoding to be ISO 8859-1, it usually works. This is because the two are very similar.

But if you use certain literal characters, like typographically correct quotation marks, dashes, ellipses, and so on, you'll run into trouble. These characters are not part of ISO 8859-1. In Windows-1252, they're located in the range that the ISO encoding reserves for C1 control characters -- in other words, those code points are invalid in ISO 8859-1. Copying from another Windows application, like Word, is a particularly likely cause of problems.

The W3C's HTML validator will catch these types of invalid characters and report them as errors.

Problems with Other Encodings

UTF-8 and the ISO 8859 series are well supported by modern browsers. Most browsers also support quite a few other encodings, but if you choose an exotic encoding, you run the risk that some visitors won't be able to read your content.

In some countries in which the Latin alphabet isn't used, web developers may use a font that offers the required characters and not care about the encoding at all. This is most unwise. Any visitor who doesn't have that particular font installed will see nothing but gibberish. And those "visitors" include Google and the other search engines.

Specifying the Encoding

Once you've chosen the encoding you'll use, you must make sure that the proper information is passed to browsers, search engines, and so on.

Web pages are served using the HyperText Transfer Protocol (HTTP): a browser sends a request via HTTP and the server sends a response back via HTTP. The response consists of two parts: headers and body, separated by a blank line. The headers provide information about the body (content). The body contains the requested resource (typically an HTML document).

For HTML, encoding information should be sent by the web server using the Content-Type header:

"Content-Type: text/html; charset=utf-8"

You may also wish to provide an HTTP equivalent in HTML that will declare the encoding when the page is viewed offline. You can do so using a META element in the HEAD-section of your document:

"meta http-equiv="Content-Type" content="text/html; charset=utf-8" "

Note, however, that any real HTTP header will override a META element, so it's imperative that you set up the web server correctly. For Apache, you can do so by editing the configuration file (/etc/httpd.conf on most *nix systems). The directive should look something like this:

"AddDefaultCharset UTF-8"

For Microsoft IIS, this setting needs to be located within its numerous dialog boxes.

For XML -- including properly served XHTML -- the encoding should be specified in the XML declaration at the top of the file. In these cases, the Content-Type header should not contain any encoding information at all. XML parsers are only required to support UTF-8 and UTF-16, which makes the choice somewhat easier:

""

Note that this does not apply to XHTML served as text/HTML, because that's not really XHTML at all, so the XML declaration doesn't work.

Summary

Choosing the right character encoding is important. If you choose an encoding that's unsuitable for your site (e.g. using ISO 8859-1 for a Chinese site), you'll need to use lots of entities or NCRs, which will bloat file sizes unnecessarily.

Unfortunately, choosing an encoding isn't always easy. Lack of support within the various components in the publishing chain can prevent you from using the encoding that would best suit your content.

Use UTF-8 (without a BOM) if at all possible, especially for multilingual sites.
And perhaps the most important thing of all: the encoding you declare must match the encoding you used when saving your files!

Reducing Your Website's Bandwidth Usage

Over the last three years, this site has become far more popular than I ever could have imagined. Not that I'm complaining, mind you. Finding an audience and opening a dialog with that audience is the whole point of writing a blog in the first place.

But on the internet, popularity is a tax. Specifically, a bandwidth tax. When Why Can't Programmers.. Program? went viral last week, outgoing bandwidth usage spiked to nearly 9 gigabytes in a single day:

codinghorror bandwidth usage, 2/24/2007 - 3/4/2007

That was enough to completely saturate two T1 lines-- nearly 300 KB/sec-- for most of the day. And that includes the time we disabled access to the site entirely in order to keep it from taking out the whole network.* After that, it was clear that something had to be done. What can we do to reduce a website's bandwidth usage?

1. Switch to an external image provider.

Unless your website is an all-text affair, images will always consume the lion's share of your outgoing bandwidth. Even on this site, which is extremely minimalistic, the size of the images dwarf the size of the text. Consider my last blog post, which is fairly typical:

Size of post text~4,900 bytes
Size of post image~46,300 bytes
Size of site images~4,600 bytes

The text only makes up about ten percent of the content for that post. To make a dent in our bandwidth problem, we must deal with the other ninety percent of the content-- the images-- first.

Ideally, we shouldn't have to serve up any images at all: we can outsource the hosting of our images to an external website. There are a number of free or nearly-free image sharing sites on the net which make this a viable strategy:

  • Imageshack
    ImageShack offers free, unlimited storage, but has a 100 MB per hour bandwidth limit for each image. This sounds like a lot, but do the math: that's 1.66 MB per minute, or about 28 KB per second. And the larger your image is, the faster you'll burn through that meager allotment. But it's incredibly easy to use-- you don't even have to sign up-- and according to their common questions page, anything goes as long as it's not illegal.

  • Flickr
    Flickr offers a free basic account with limited upload bandwidth and limited storage. Download bandwidth is unlimited. Upgrading to a paid Pro account for $25/year removes all upload and storage restrictions. However, Flickr's terms of use warn that "professional or corporate uses of Flickr are prohibited", and all external images require a link back to Flickr.

  • Photobucket
    Photobucket's free account has a storage limit and a download bandwidth limit of 10 GB per month (that works out to a little over 14 MB per hour). Upgrading to a paid Pro account for $25/year removes the bandwidth limit. I couldn't find any relevant restrictions in their terms of service.

  • Amazon S3
    Amazon's S3 service allows you to direct-link files at a cost of 15 cents per GB of storage, and 20 cents per GB transfer. It's unlikely that would add up to more than the ~ $2 / month that seems to be the going rate for the other unlimited bandwidth plans. It has worked well for at least one other site.

I like ImageShack a lot, but it's unsuitable for any kind of load, due to the hard-coded bandwidth limit. Photobucket offers the most favorable terms, but Flickr has a better, more mature toolset. Unfortunately, I didn't notice the terms of use restrictions at Flickr until I had already purchased a Pro account from them. So we'll see how it goes. Update: it looks like Amazon S3 may be the best long-term choice, as many (if not all) of these photo sharing services are blocked in corporate firewalls.

Even though this ends up costing me $25/year, it's still an incredible bargain. I am offloading 90% of my site's bandwidth usage to an external host for a measly 2 dollars a month.

And as a nice ancillary benefit, I no longer need to block image bandwidth theft with URL rewriting. Images are free and open to everyone, whether it's abuse or not. This makes life much easier for legitimate users who want to view my content in the reader of their choice.

Also, don't forget that favicon.ico is an image, too. It's retrieved more and more often by today's readers and browsers. Make favicon.ico as small as possible, because it can have a surprisingly large impact on your bandwidth.

2. Turn on HTTP compression.

Now that we've dealt with the image content, we can think about ways to save space on the remaining content-- the text. This one's a no-brainer. Enable HTTP compression on your webserver for roughly two-thirds reduction in text bandwidth. Let's use my last post as an example again:

Post size63,826 bytes
Post size with compression21,746 bytes

We get a 66% reduction in file size for every bit of text served up on our web site-- including all the JavaScript, HTML, and CSS-- by simply flipping a switch on our web server. The benefits of HTTP compression are so obvious it hurts. It's reasonably straightforward to set up in IIS 6.0 , and it's extremely easy to set up in Apache.

Never serve content that isn't HTTP compressed. It's as close as you'll ever get to free bandwidth in this world. If you aren't sure that HTTP compression is enabled on your website, use this handy web-based HTTP compression tester, and be sure.

3. Outsource Your RSS feeds.

Many web sites offer RSS feeds of updated content that users can subscribe to (or "syndicate") in RSS readers. Instead of visiting a website every day to see what has changed, RSS readers automatically pull down the latest RSS feed at regular intervals. Users are free to read your articles at their convenience, even offline. Sounds great, right?

It is great. Until your ealize just how much bandwidth all that RSS feed polling is consuming. It's staggering. Scott Hanselman told me that half his bandwidth was going to RSS feeds. And Rick Klau noted that 60% of his page views were RSS feed retrievals. The entire RSS ecosystem depends on properly coded RSS readers; a single badly-coded reader could pummel your feed, pulling uncompressed copies of your RSS feed down hourly-- even when it hasn't changed since the last retrieval. Now try to imagine thousands of poorly-coded RSS readers, all over the world. That's pretty much where we are today.

Serving up endless streams of RSS feeds is something I'd just as soon outsource. That's where FeedBurner comes in. Although I'll gladly outsource image hosting for the various images I use to complement my writing, I've been hesitant to hand control for something as critical as my RSS feed to a completely external service. I emailed Scott Hanselman, who switched his site over to FeedBurner a while ago, to solicit his thoughts. He was gracious enough to call me on the phone and address my concerns, even walking me through FeedBurner using his login.

I've switched my feed over to FeedBurner as of 3pm today. The switch should be transparent to any readers, since I used some mod_rewriteISAPIRewrite rules to do a seamless, automatic permanent redirect from the old feed URL to the new feed URL:

# do not redirect feedburner, but redirect everyone else
RewriteCond User-Agent: (?!FeedBurner).*
RewriteRule .*index.xml$|.*index.rdf$|.*atom.xml$
http://feeds.feedburner.com/codinghorror/ [I,RP,L]

And the best part is that immediately after I made this change, I noticed a huge drop in per-second and per-minute bandwidth on the server. I suppose that's not too surprising if you consider that the feedburner stats page for this feed are currently showing about one RSS feed hit per second. But even compressed, that's still about 31 KB of RSS feed per second that my server no longer has to deal with.

It's a substantial savings, and FeedBurner brings lots of other abilities to the table beyond mere bandwidth savings.

4. Optimize the size of your JavaScript and CSS

The only thing left for us to do now is reduce the size of our text content, with a special emphasis on the elements that are common to every page on our website. CSS and JavaScript resources are a good place to start, but the same techniques can apply to your HTML as well.

There's a handy online CSS compressor which offers three levels of CSS compression. I used it on the main CSS file for this page, with the following results:

original CSS size2,299 bytes
after removing whitespace1,758 bytes
after HTTP compression615 bytes

We can do something similar to the JavaScript with this online JavaScript compressor, based on Douglas Crockford's JSMin. But before I put the JavaScript through the compressor, I went through and refactored it, using shorter variables and eliminating some redundant and obsolete code.

original JS size1232 bytes
after refactoring747 bytes
after removing whitespace558 bytes
after HTTP compression320 bytes

It's possible to use similar whitespace compressors on your HTML, but I don't recommend it. I only saw reductions in size of about 10%, which wasn't worth the hit to readability.

Realistically, whitespace and linefeed removal is doing work that the compression would be doing for us. We're just adding a dab of human-assisted efficiency:


RawCompressed
Unoptimized CSS2,299 bytes671 bytes
Optimized CSS1,758 bytes615 bytes

It's only about a 10 percent savings once you factor in HTTP compression. The tradeoff is that CSS or JavaScript lacking whitespace and linefeeds has to be pasted into an editor to be effectively edited. I use Visual Studio 2005, which automatically "rehydrates" the code with proper whitespace and linefeeds when I issue the autoformat command.

Although this is definitely a micro-optimization, I think it's worthwhile since it reduces the payload of every single page on this website. But there's a reason it's the last item on the list, too. We're just cleaning up a few last opportunities to squeeze every last byte over the wire.

After implementing all these changes, I'm very happy with the results. I see a considerable improvement in bandwidth usage, and my page load times have never been snappier. But, these suggestions aren't a panacea. Even the most minimal, hyper-optimized compressed text content can saturate a 300 KB/sec link if the hits per second are coming fast enough. Still, I'm hoping these changes will let my site weather the next Digg storm with a little more dignity than it did the last one-- and avoid taking out the network in the process.

* the ironic thing about this is that the viral post in question was completely HTTP compressed text content anyway. So of all the suggestions above, only the RSS outsourcing would have helped.