spread the dot jenett.radio.randomizer - click to visit a random Radio weblog - for information, contact randomizer@coolstop.com

System Administration

Tools for the toolbox
 Monday, August 26, 2002

Communities of Scale

Userland's John Robb discusses the differences in cost between a centralized content management model, and a decentralized one. He brings up some interesting points.

Say you have 100,000 people that you want to enable to publish a weblog. Given that these are weblogs and not simple homepages (like GeoCities) that are published once and forgotten, a couple of things change. People use the functionality daily, if not several times a day.

The nature of medium encourages frequent publication, but any subject that captures your enthusiasm could be just as frequently updated. In fact, that's exactly what we want.

They also build massive sites. Compared to a one or two page "designed" personal home page on the last generation's site builders, the user of a weblog system will quickly find themselves publishing sites with hundreds if not thousands of pages. They get big fast.

The difference between weblogging software and sites like Tripod and GeoCities is that the user interface doesn't scale. The GeoCities editor and Tripod's have built in limits on the number of pages that may be published. It would be interesting to compare the backends of Blogger and LiveJournal.

If done centrally, you could probably put a thousand or two weblogs on a single server. That would take 50-100 servers, extensive rack space, and a huge budget for admin of those servers given that there is complex functionality on the server. In a decentralized model, you could put 10-20 k weblogs on a single static server. That would require only 5-10 servers (a single rack) and a very low admin budget.

This depends entirely on how the system's built.

First, let's distinguish between the centralized and decentralized models. In the centralized model, all functions are hosted on the server-side. In the decentralized model given here, the server exposes your works to the world, but editing and other tasks are completed on your desk. The classic web hosting arrangement is this decentralized model. For ease of use, to get away from the complexity of synchronizing filesystems over FTP, many vendors provided web-based HTML editors, and moved to the centralized model. But at some point it becomes easier to buy GoLive or DreamWeaver — unless you're using Blogger, Moveable Type, or one of the Userland tools. Think of the server as your upstream cache.

In a centralized environment, the systems performing the editing tasks are often segregated from those serving the final product. Usage shows that pages are read more frequently than they are written, so the editing systems need only support a small fraction of the total membership. If you store the data separate from the presentation, and render the two on the fly, then you run into some processing bottlenecks, but most on-line editors generate static HTML of one variety or another. The point being that you don't need 50-100 servers for 100,000 weblogs unless you're Manila.

It's very easy to throw hardware and bandwidth at sites and make them scale. The costs are more or less distributed depending on your architecture. What's difficult is building a scalable community, of finding like-minded souls. And thus we have the Radio Community Server, the Blogging Ecosystem, blogdex and others — or the less-sophisticated GeoCites Member Pages directory.

2:39:13 PM # Google It!
categories: Writing Online, System Administration