Jim Fan: Topix.net Weblog: Memory resident

A while ago I was chatting with my old boss Wade about a nifty algorithm I found for incremental search engines, which piggybacked queued writes onto reads that the front end requests were issuing anyway, to minimize excess disk head seeks. I thought it was pretty cool.

Wade smacked me on the head (gently) and asked why I was even thinking about disk anymore. Disk is dead; just put the whole thing in RAM and forget about it, he said.

Orkut is wicked fast; Friendster isn't. How do you reliably make a scalable web service wicked fast? Easy: the whole thing has to be in memory, and user requests must never wait for disk.

A disk head seek is about 9ms, and the human perceptual threshold for what seems "instant" is around 50ms. So if you have just one head seek per user request, you can support at most 5 hits/second on that server before users start to notice latency. If you have a typical filesystem with a little database on top, you may be up to 3+ seeks per hit already. Forget caching; caching helps the second user, and doesn't work on systems with a "long tail" of zillions of seldom-accessed queries, like search.

It doesn't help that a lot of the scheduling algorithms found in standard OS and database software were developed when memory was scarce, and so are stingy about their use of it.

The hugely scalable AIM service stores everything in memory across a distributed cluster, with the relational database stuck off to the side, relegated to making backups of what's live in memory. Another example is Google itself; the full index is stored in memory. Servers mmap their state when they boot; no disk is involved in user requests after everything has been paged in.

The biggest RAM database of all...

An overlooked feature that made Google really cool in the beginning was their snippets. This is the excerpt of text that shows a few sample sentences from each web page matching your search. Google's snippets show just the part of the web page that have your search terms in them; other search engines before always showed the same couple of sentences from the start of the web page, no matter what you had searched for.

Consider the insane cost to implement this simple feature. Google has to keep a copy of every web page on the Internet on their servers in order to show you the piece of the web page where your search terms hit. Everything is served from RAM, only booted from disk. And they have multiple separate search clusters at their co-locations. This means that Google is currently storing multiple copies of the entire web in RAM. My napkin is hard to read with all these zeroes on it, but that's a lot of memory. Talk about barrier to entry.
Posted by skrenta at February 2, 2004 10:48 PM
Comments

This also means that the snippet service will be costing Google more and more to maintain as the web grows exponentially...
Posted by: Ben at February 25, 2004 10:23 AM

The point about the perception of latency vs. disk seek time is an interesting one. But I think your maths is a little off. ISTM you could support over 100 hits per second if you take a minumum of 9ms per hit. However, more than five hits within a 50ms period will put you outside the "window", which I think is what you were getting at.
Posted by: Mark Allerton at February 25, 2004 11:58 AM

Here's what someone who deals with big databases for my employer says:

"This was the holy grail in database administration a few years ago: 'keep it all in ram!'. While the performance of such systems is astounding, the paradigm is very difficult to implement on anything besides a data mine. (Search engines are good examples of data mines.)

"The reason? Data mines aren't updated in realtime -- they use a point-in-time snapshot of their data. Because it is just a copy of the 'real' db, there is no risk of data loss if the system goes down. The drawback to data mines is that data is not being updated in realtime.

"While I'm at it...

"The other end of the database spectrum is ruled by transactional systems. Electronic banking is a classic example -- lots of data, all interdependent, and all changing in real-time. Our systems are transactional. We keep our customer's data current to the millisecond, with a deliberate focus on accuracy and security. In this regard, we kick google's bu**, because there is notihing real-time about google! ;)"

"I hope this info is useful!"

Jim Fan

我的简介

Previous Posts

星期一, 十月 18, 2004

Topix.net Weblog: Memory resident

0 Comments: