The recent discussions were prompted by a report from Microsoft blogging evangelist Robert Scoble that Microsoft had altered its RSS feeds to reduce server load. While Scoble overstated the issues at Microsoft, the resulting chatter among blogging technologists surfaced numerous strategies to impose discipline on feed-hungry RSS clients.
Last month Scoble wrote that RSS feeds from Microsoft's corporate blogs were growing unwieldy. "Bandwidth usage was growing faster than MSDN's ability to pay for, or keep up with, the bandwidth," he wrote. "Terabytes of bandwidth were being used up by RSS."
Scoble's post was widely debated by bloggers before being corrected by Microsoft's Sara Williams. "In a nutshell, our RSS traffic is neglible compared to all the traffic generated by Windows Update, MSN, downloads, and the rest of microsoft.com," wrote Williams. "We were motivated to reduce the size of the blogs.msdn.com home page primarily for operational efficiency's sake."
RSS, an XML format known as Really Simple Syndication (or Rich Site Summary, as Netscape's early implementation was known), was popularized by weblogs and has since been adopted by many major news sites to deliver headlines to readers. The bandwidth problems are driven by inefficiencies in the way desktop RSS clients update feeds. The default setting on many newsreaders is to refresh headlines once an hour, even if the feed's content has not changed. Some publishers have conmpared the resulting traffic is compare to an hourly denial of service attack.
The discussion of Microsoft's RSS issues focused on the need to improve the handling of RSS feeds. Newsreader developers were encouraged to support features of RSS that allow greater control of the timing of feed retrieval, especially HTTP Conditional Get, which checks to see if a feed has been recently updated. RSS pioneer Dave Winer urged wider use of the time to live (ttl) element in RSS 2.0, which dictates how often feeds can be refreshed. The standard also allows publishers to use the skipHours and skipDays elements to discipline newsreaders. "This way control over polling is shared between the client and the server," Winer writes. "In the current mechanism, the server has no say in how often polls take place."
On the server side, Slashdot is managing RSS load by allowing one RSS request every 30 minutes, citing an "absolutely ridiculous amount of abuse we get on a daily basis from poorly implemented headline readers." Violators are hit with a 72-hour ban.
Compressing feeds is another server-side strategy to manage RSS. "Because RSS is primarily text, I've seen a reduction of 80% of the bandwidth when delivering RSS feeds in a compressed format," notes Tristan Louis. "That represents a fairly large gain in bandwidth that can then accommodate more users."
More sophisticated traffic management tools are also on the wish list for future development of RSS. "While RSS publishers know how many feeds are being pushed out, there is little, in the way of information as to what percentage of those feeds is being read," Louis said. "Stronger metrics need to be developed to get an understanding of passive vs. active subscribers (passive subscribers are subscribers that receive the feed but do not read it, while active subscribers are actually reading the content and clicking through)." Such data would also be critical for the growing number of publishers seeking to offer ads in their RSS feeds.
Critics of RSS' bandwidth issues were urged to join efforts to improve the standard. "Yes, RSS has room for improvement, but it's not bad today - you just have to understand what you're doing," writes Microsoft's Williams. "At the same time, there's tons of headroom for improvements to the spec, improvements to the client software, and improvements in server implementations ... Lots of room for innovation."
Posted by Rich Miller in Performance
Your link here? Advertising on the Netcraft Blog