Interview with Jim Gray, Manager, Microsoft Bay Area Research Center

Jim Gray won the 1998 Turing Award "for seminal contributions to database and transaction processing research." More recently, he has been working as a Distinguished Engineer in Microsoft's Scalable Servers Research Group, based in San Francisco, on the creation of terabyte-sized distributed online databases. Talking with Glyn Moody, Gray reflects on his career, the power of Web services, and the arrival of sentient machines later this century.

Q. How did you first get involved in working with databases?

A. My primary interest had been in theoretical computer science - I got a PhD in computer science at Berkeley. I went to work at IBM and I was working on a variety of things, but they all more or less revolved around operating systems and programming languages and applications.

My boss came to me one day, and he gave me some career advice and said, "you know, IBM has more operating systems than they need " - this was about 1971 - "and we have more programming languages than we need. We also have more operating systems and programming language researchers than we need. If you have an interest in making a contribution, as opposed to just polishing a round ball, you would be well advised to work on networking or databases because those are areas where we are completely clueless."

Q. How did your work on transaction processing come out of that?

A. We were a group working in the general area of database systems. And so we were sitting around in a room and the question came up, who is going to do what? Since I was the operating systems guy in the group, I got to do all the operating systems stuff. It so happens that that includes things like systems startup and shutdown, security and authorisation and the basic issues of concurrency control execution and so on. When you get involved in startup and shutdown, you also get involved in restart, what happens when things fail. So I fell heir to the whole business of cleaning up the mess after programs all crashed and bringing the world back to the state it was in before the crash.

Q. Did fault tolerance arise in a similarly organic way?

A. Indeed. I was a great fan of something called defensive programming, and still am. Defensive programming says whenever anybody calls you, you check all your parameters and whenever you do anything with some information that you have you check first that the information is correct. And when you're about to return your results, you give it a sniff test to make sure the results look OK.

Unfortunately, when you put a lot of that stuff in your code, your code is the least reliable code in the system. Because whenever anybody calls you with bad parameters, yours is the code that fails. So if the system crashed, it probably crashed in my code, because I was basically the only person who was checking what was going on. And so people would complain to me a lot about how much the system was crashing and how bad my code was. I probably did have the code with the most bugs, but I worked fairly hard to make the system restart automatic and fast and tried to put in a system that would tolerate faults by quickly recovering and resetting the state to a good state.

Q. What about your more recent interest in scalable servers - how did that follow from the work on fault tolerance?

A. Subsequently I went and worked at Tandem Computers. I was very interested in fault tolerance, and they were at the time building computers that were called non-stop systems. The basic idea of Tandem was suggested by the name, which is a tandem bicycle, in tandem: that you had multiple computers working on the task, and if one of the computers or one of the discs got in trouble, the remaining computers could continue to deliver service. There are many aspects of the Tandem architecture, or this modular architecture. We see it today in Web farms. The usual Web farm at AOL or MSN or Google or Inktomi or Yahoo is literally thousands of servers. So the desire to build very, very large computers out of many small ones is an outgrowth of my experiences at Tandem. These are now called blade servers and Beowulf clusters, but the more generic term is just scalable computing.

Q. How did the TerraServer project come about?

A. I had left Tandem and went to work at Digital Equipment Corporation, and Digital Equipment Corporation about 1994 went out of the software business, so it was time for me to leave. I took a leave of absence for a year, and then I came to work at Microsoft. I've been at Microsoft since about '95. We were chartered as a research group to work on scalable systems.

We actually had a pretty simple problem on our hands: we needed to find an application that would be interesting to millions of people, and that could be put on the Internet, and that would show off our technologies and would not be offensive to anybody and that involved very large databases. Interesting to everyone and offensive to no one is a really big challenge. We pretty quickly eliminated porn, although it's interesting to a lot of people it offends an equal number. We also came to the conclusion that any traditional database application wasn't going to give us a large database.

It was clear we needed to have an image database. We had some relationship to people who were doing spatial databases, so we decided to try and take the US Geological Survey (USGS) image of the United States, which is a 1 metre resolution photograph of the United States, and put that online, and work with the Russian space agency to put their assets online.

So we started building the TerraServer in about '96. In late '96 we had a demonstration of it, in mid '97 it came online. It's been online ever since. When it first came online it was only about 600 Gigabytes, which now seems pretty modest - at the time it seemed huge to us. It's now in the 5 terabytes range of online data, and it is in fact derived from about 20 terabytes of raw data.

Historically, we got data from the USGS in the form of tapes. That was, in 1997, 1998 or even in the year 2000, the most economic way of moving data. But the USGS has learned that the best and least expensive way of moving data is to write them to disc rather than to tape, and then ship the discs. The USGS is a fan of FireWire discs, so they deliver to us a fairly large box of FireWire discs, in round numbers a terabyte or two of data, which we then make a copy of and forward to the next recipient of the data.

The alternative scheme is rather than sending around discs, we send around entire computers. The virtue of sending an entire computer is you just plug it into the network - you don't have to go through any discussions of what's the file system, and what's the format and all that sort of stuff. So I think in the future people will actually find themselves archiving and exchanging computers rather than raw disc drives.

Q. Isn't that rather ironic in the age of the Internet?

A. It is, and has to do with the cost of sending data through the PTTs. I believe that if price reflected cost that this would not be required, but, at least in America, the communication links are priced at the same approximate rate as voice grade links. I think that the telcos are terrified that if they lowered their prices and made bandwidth essentially free, every neighbourhood would set up a PBX and would have one link and they wouldn't pay any subscription services. There's many, many fibre optic cables running right in front of my building that are just completely dark. The cable's there, the cost of delivering it to me would be close to zero, but for tariff reasons I can't get access to that stuff - the quoted tariff is a thousand dollars a megabit per second per month. It is throttling many of our design decisions. It does cause us to act in strange ways.

Q. How has your work developed beyond TerraServer?

A. The project that I've been involved with since about '98 is helping the astronomers get their data online and get it all integrated as one large Internet-scale database, called the Virtual Observatory. The idea is astronomy data, most of it, goes into a computer - there's no people looking through the lens of the Hubble Space Telescope, it all comes down to the Earth and goes on computer discs. That's true even of the terrestrial telescopes that are generating gigabytes of data a day. People can't look at gigabytes per day. Only computers look at the data and people look at the output of the computer programs.

The premise is that we can cross-correlate that data and have a better telescope than any other telescope in the world. It would cover all of the known data, so it would have this temporal dimension of going back to the beginning of recorded history. It would be all spectral bands - radio, infrared, ultraviolet - and it would be from all parts of the world. It would be a much more powerful telescope than any individual one. And, it can cross-correlate the data with the scientific literature.

Q. How big are the datasets?

A. The SkyServer is one part of the Virtual Observatory. In round numbers it's about 12 terabytes of released data at this point. We've been evolving the design since 2000, and we're adding features, but it is not our main thrust any more; the real excitement now is to take several archives and glue them together.

Q. How will you be doing that?

A. I'm very enthusiastic about using Web services to do it. We built a prototype called SkyQuery. If you go to Skyquery.net, that is a portal that knows about many different archives. You can go there and ask a query and the portal decomposes your query and sends it to the relevant archives and brings the information that's synthesised from those archives. The portal uses SOAP and WSDL and XML and datasets to talk among the Sky nodes, as they're called. It would be impossible for us to do this without Web services. There's so much that's been done for us in terms of having a standard representation, being able to move stuff in and out of a database, being able to convert it to XML in a heartbeat, having the Internet as a substructure.

Q. Google has changed the way most of us use the Internet. Looking forward, how do you see even larger databases affecting people's use of the Internet?

A. Well, so, as far as I know - and this is just a conjecture - I believe that Google is in the 1 to 5 petabyte range at this point. The interesting thing about Google is that it indexes the surface Web, it doesn't index the deep Web. The astronomy stuff I've been talking about is part of the deep Web. Google has focused a lot on text search, but they are generalising, they are doing things like Orkut, which an index of friends, and they're doing Froogle, and they are moving into other spaces. But for somebody like the astronomers, there's not a commercial model that would cause the Google guys to want to do all of the astronomy data.

So I suspect that we will have Google indexing the surface Web and text, and I think other people will do similar things. If you want to dig deep into Amazon you'll probably go to Amazon; if you want to dig deep into IBM or Microsoft, you will go to IBM or Microsoft; and if you want to dig deep into astronomy you will go to an astronomy portal. It will be a two or more level architecture.

Q. What about the knock on effect of these technologies on everyday life?

A. The thing I've been talking about and working on is the enterprise-level scalable servers, big database, etc., for a community like the astronomers. My colleagues are working on a project called MyLifeBits, and that project is trying to record all of their personal experiences, what they see, what they hear. It's very much inspired by the work of Vannevar Bush on memex.

A challenge that we all face is that the information avalanche has arrived and we are buried under a mountain of email and documents. I find myself spending an increasing amount of time looking for things. I don't think it's just that I'm getting older and my memory is failing. I think it's actually that there's more stuff and much of what I do has a lot of context associated with it, so I need to go back and refer to things from a long time ago.

What we're trying to do is to augment people's intelligence and make it easier for them to find things and to take the information that they have and organise it and summarise it and make it more accessible. That is a huge focus at Microsoft. Our strategic intent - the phrase that you say when people ask What does your company do? - is Information at your Fingertips. The astronomy stuff I'm doing is Information at your Fingertips for the astronomy community, the MyLifeBits stuff is Information at your Fingertips for all of us.

Q. Where do you think we are ultimately heading with computers?

A. I believe that Alan Turing was right and that eventually machines will be sentient. And I think that's probably going to happen in this century. There's much concern that that might work out badly; I actually am optimistic about it.