April 08, 2004

The whole web in ram?

Topix.net (here and here) discusses the reasons for storing the whole of the web's text in ram, the benefits of doing so, how Google may be doing it, and what advantages their infrastructure may give them in deploying other services. [via locussolus].

Note that some of the commenters doubt the arguments against disk-based search algorithms; I haven't thought it through, but life is certainly easier if you don't have to worry about seek times. 100,000 servers with 1G each is 1014 bytes of ram, or 100 TB. Google says it searches 4,285,199,774 (= 4.3 x 109) web pages, so that's roughly 23KB per indexed page, if I've done the arithmetic right. That's plenty, though there is doubtless some disk stuff going on anyhow. In either case, there are certainly advantages of knowing how to build, maintain and use very large cluster farms, which was the main point of the topix.net post.

Posted by Mark Liberman at April 8, 2004 11:10 AM