Of all the subjects of this book, Donald Knuth perhaps least needs an introduction. For the past four decades he has been at work on his multivolume masterwork The Art of Computer Programming, the bible of fundamental algorithms and data structures, which American Scientist included on its list of the top 12 physicalsciences monographs of the century, in the company of works by Russell and Whitehead, Einstein, Dirac, Feynman, and von Neumann. He popularized the use of asymptotic (a.k.a. Big-O) notation in analyzing algorithms, invented LR parsing, and defended goto statements from Dijkstra’s criticism.
But he is not simply a theorist. After finishing Volume III of The Art of Computer Programming in 1976, Knuth took what was supposed to be a year off to write the typesetting software TeX and METAFONT so he could see his books typeset to his own satisfaction. Ten years later he was done, having along the way invented a new style of programming, “literate programming,” and an algorithm for breaking paragraphs of text into lines for typesetting that is still pretty much the state of the art.
His numerous awards have included the first Association for Computing Machinery Grace Murray Hopper Award (1971), the Turing Award (1974), and the National Medal of Science (1979). In 1990 he stopped using email, explaining that his job was not “to be on top of things” but “to be on the bottom of things” deeply understanding and explaining large areas of computer science so he could explain them in his books.
In this interview we talked about Knuth’s enthusiasm for literate programming, his ambivalence about black boxes, and what he sees as a regrettable “overemphasis on reusable software.”
Seibel: When did you learn to program?
Knuth: I was a freshman at Case Institute of Technology. This was the fall of 1956 and during that quarter or semester they got a computer.
Seibel: This was the IBM 650?
Knuth: It was the 650, yeah. That was the first computer that they made more than a hundred of. I think they had thousands of them but maybe not ten thousand. But it was the first mass-produced computer, so even Case got one.
I was employed in the statistics lab sorting cards. I would tabulate data for the statisticians and that helped supplement my scholarship. There was this room on the first floor with a window in it and you could see this machine behind the window with lights flashing. It looked pretty fascinating.
One afternoon a guy from the lab went to the blackboard and was explaining to the three of us freshmen what this machine did. I found a manual for the machine and they had example ten-line programs. It seemed to me they were kind of stupid—it looked like there was a way to improve even those little programs.
And it turned out it was possible to go at night and touch the machine. This was unusual. I think Dartmouth and Case were the only universities that let undergraduates touch machines. Other places they had professionals and you submitted decks of cards and got your answers the next day. But at Case it was hands-on. They just said, “Oh, yeah, watch out for this; you don’t want to do that; it’ll mess up the machine,” so we had a really nice chance to play with it.
Anyway, I got to see if one of my little changes to the program would also work, and it did. So I said, “My goodness, this is pretty amazing. I’m only a freshman and I can do better than what was in this book—this might be something I have talent for.” Well, it turned out I did have a talent for it but not in the way I thought, because almost anybody in the world could have done better than that program in that particular manual.
The machine was a decimal machine, so it wasn’t quite as strange as if I had to learn binary arithmetic, although I played with binary arithmetic a little bit when I was in high school. But the fact that it was decimal made it somehow more human or something—comfortable. I can still remember the machine language—sixty-five is reset-add-lower—it helps me making up passwords and things now.
Seibel: Uh-oh; you just revealed your secret there.
Knuth: Yeah, right. Then I decided I would write a little program to calculate the prime factors of a number. It was about 100 lines long. I would come at night when nobody else was using the machine, and debug it. And I found more than 100 bugs in my 100-line program. But 2 weeks later I had a program that would find prime factors of any 10-digit number that you dialed into the console switches.
That was how I learned programming—basically taking one program that I made up myself and sitting at a machine over a period of some weeks, and kept getting it to work a little better and a little better.
My second program was converting between binary and decimal. But my third program was a program to play tic-tac-toe and that was what really made me a programmer.
I had to use data structures for that. I made three versions of tic-tac-toe, one of which was self-learning so that it would start out knowing nothing about the game and then it would remember every time it lost a game that the moves it made were suspicious and the moves that the opponent made were good, and it would upgrade the quality of certain positions and downgrade the quality of other positions, and then after you played 400 games it would do a fairly decent job of tic-tac-toe.
Seibel: It seems a lot of the people I’ve talked to had direct access to a machine when they were starting out. Yet Dijkstra has a paper I’m sure you’re familiar with, where he basically says we shouldn’t let computer science students touch a machine for the first few years of their training; they should spend all their time manipulating symbols.
Knuth: But that’s not the way he learned either. He said a lot of really great things and inspirational things, but he’s not always right. Neither am I, but my take on it is this: Take a scientist in any field. The scientist gets older and says, “Oh, yes, some of the things that I’ve been doing have a really great payoff and other things, I’m not using anymore. I’m not going to have my students waste time on the stuff that doesn’t make giant steps. I’m not going to talk about low-level stuff at all. These theoretical concepts are really so powerful—that’s the whole story. Forget about how I got to this point.”
I think that’s a fundamental error made by scientists in every field. They don’t realize that when you’re learning something you’ve got to see something at all levels. You’ve got to see the floor before you build the ceiling. That all goes into the brain and gets shoved down to the point where the older people forget that they needed it.
Seibel: I’ve asked the people I’ve talked to for this book about how much they’ve read The Art of Computer Programming. Most have used it as a reference but a few said they’ve read it cover to cover. Should every programmer be able to read your books? It’s pretty mathematically intense stuff.
Knuth: I sometimes wonder if I can read them. I’m trying to organize a lot of wisdom that surrounds the topic that I’m discussing and I gather it from all these places where it appeared in parts and put it into some unity that can be carried forward, and gets the history right, and corrects bugs and obscurities in the original sources.
Like in the parts that I’m writing now, I’m starting out with stuff that’s in math journals that is written in jargon that I wouldn’t expect very many programmers to ever learn, and I’m trying to dejargonize it to the point where I can at least understand it. I try to give the key ideas and I try to simplify them the best I can, but then what happens is every five pages of my book is somebody’s career.
In other words, there’s still so much more beyond any five pages of my book that you can make a lifetime’s worth of study, because there’s just that much in computer science. Computer science doesn’t all boil down to a bunch of simple things. If it turned out that computer science was very simple, that all you needed to do was find the right 50 things and then learn them really well, then I would say, “OK, everybody in the world should know those 50 things and know them thoroughly.”
But it isn’t that way. I’ve got thousands of pages and exercises, and I write it down and put it in the book so that I don’t have to have it all in my head. I have to come back to it and learn it again. And I have the answers to the exercises because I know that ten years from now I won’t remember how to do the darn thing and it will take me a long time to reconstruct it. So I give myself at least the clues to how to reconstruct stuff.
I’m constantly torn between saying, “Well, this is too complicated; you’d better not talk about it at all,” and the other feeling that people are saying, “But all you’ve put in your book is just so trivial; there’s nothing good.” I can argue at any particular time that I should cut everything out or that I have way too little.
What it really boils down to is, all of the really cool things that can be explained in a half a page have to be in my book, on some half a page. And all the things I’ve seen that are just too good to be left out. So I find out that the section I just wrote about binary decision diagrams, it turned out that I had more than 260 exercises because there just was more and more stuff that seemed to me there would be more than a trivial audience for. But I’m not saying everybody is the audience for all 260 of these things. Still, I know there are a large number, for each of these, that are going to appreciate it.
I consider it amazing that some people do go cover to cover in my books. In most cases I know that people are going to pick and choose the parts that they like. But they know that if they dig further then they’ll get something that has only one subset of jargon describing it instead of all different kinds of notations and terminology—if I didn’t write the books it would be much harder for people to find stuff out. That’s what turns me on.
Also, I try to explore the territory in a way that is most relevant to a practical programmer rather than the most academic cachet for getting something published that’s theoretically interesting but wouldn’t really be used in a real program.
The things that I leave out are where somebody has a data structure that saves a factor of log log n only when n gets bigger than two to the million. And there are lots and lots of papers that are doing that. They’re playing games where in principal, if computers were godlike, then we could have algorithms that are faster. But even an algorithm like a balanced tree or AVL tree, I don’t use in my own programs unless I know that it’s going to be a really big tree.
Seibel: What do you use?
Knuth: I use an ordinary binary search tree with a little trick for randomizing it that I just put in.
Seibel: Speaking of practical work, in the middle of working on The Art of Computer Programming you took what turned into a ten-year break to write your typesetting system TeX. I understand you wrote the first version of TeX completely away from the computer.
Knuth: When I wrote TeX originally in 1977 and ’78, of course I didn’t have literate programming but I did have structured programming. I wrote it in a big notebook in longhand, in pencil.
Six months later, after I had gone through the whole project, I started typing into the computer. And did the debugging in March of ’78 while I had started writing the program in October of ’77. The code for that is in the Stanford archives—it’s all in pencil—and of course I would come back and change a subroutine as I learned what it should be.
This was a first-generation system, so lots of different architectures were possible and had to be discarded until I’d lived with it for a while and knew what was there. And it was a chicken-and-egg problem—you couldn’t typeset until you had fonts but then you couldn’t have fonts until you could typeset.
But structured programming gave me the idea of invariants and knowing how to make black boxes that I could understand. So I had the confidence that the code would work when I finally would debug it. I felt that I would be saving a lot of time if I waited six months before testing anything. I had enough confidence that the code was approximately right.
Seibel: And the time savings would be because you wouldn’t spend time building scaffolding and stubs to test incomplete code?
Knuth: Right.
Seibel: Other than the fact that they’re so nicely typeset now, do you think your books would be very different if you hadn’t spent ten years writing TeX?
Knuth: Good question. The experience of using structured programming in a not purely academic way—in other words, I’m not just thinking about invariants in toy programs, but in real programs—probably had a fair influence on how I’m describing algorithms in the new stuff I write now. Or if it hasn’t, it should.
I wouldn’t have known about caching and trends in the way computers change and things like that if I was just going on in the same mold of going from the literature to writing my books. While I was writing Volumes I, II, and III, I wasn’t writing programs like TeX that that would be more typical of a large programming practice. They would be toy programs. So it gave me more of a perspective on numbers and quantity.
It’s really amazing, though, when you’re writing a book, the influences that will make you choose different words. It’s mysterious how it gets in there. That’s the most important influence of writing TeX—that it gave me a different kind of a mental take on things so that my sentences come out different. They’ll be a little bit less hedgy. There is a tone that comes out in the whole thing, of confidence or something.
Seibel: Do you think you were a dramatically better programmer when you finished TeX than when you started?
Knuth: Well, yes, because of literate programming.
Seibel: So you had better tools, but had you actually improved your skills?
Knuth: I learned a terrific amount while I was doing it. One of the things I learned was how much software occupies the brain. It was a much more difficult task than I expected. I couldn’t teach classes full-time and write software full-time. I could teach classes full-time and write a book full-time but software required so much attention to detail. It filled that much of my brain to the exclusion of other stuff. So it gave me a special admiration for people who do large software projects—I would never have guessed it without having been faced with that myself.
Seibel: So programming is harder than writing books, and somewhere I read something where you said that it’s impossible to estimate how long it will take to write books. Does that then mean that it’s even harder to estimate how long programming will take?
Knuth: Yeah, right. That’s a very good corollary.
This year I’ve written probably three major programs which are pushing one hundred pages of code—literate code, with 8.5x11 pages. Two of them are related to each other, so it’s more like two and a half major programs. And about 150 small programs. Probably more than I did the previous year. So I programmed galore this year on small programs but also, a couple of them were things that took a month or more to do.
Seibel: And did you expect them to take a month?
Knuth: Well, I expected one of them to take a month. I knew that it wasn’t going to be easy but I didn’t know how much richness there was going to be, so I added more features as I got to using them. I think it is always going to be true that a person who manages programmers should not expect it to be predictable.
Seibel: In addition to writing The Art of Computer Programming and TeX, you’re also the inventor of—and advocate for—literate programming, a way of writing code so it can be more easily read by people. And you wrote WEB and CWEB, tools that implement literate programming languages based on Pascal and C.
Knuth: So you say advocate—it’s sort of my shtick to say that this is good. But I also am the kind of guy that’s uncomfortable preaching or trying to convert someone. I think programming is a lot like religion; people have their beliefs. Some people like to force their beliefs on others. Others say, you know, here’s what I think; I can’t prove that this is the best thing, but it sure works for me. Then you hope that other people will try it and come to the same conclusion. But I don’t like going out and telling people what they ought to believe.
Seibel: Well, maybe you can explain why you like it so much and how it differs from illiterate programming.
Knuth: The first rule of writing is to understand your audience—the better you know your reader the better you can write, of course. The second rule, for technical writing, is say everything twice in complementary ways so that the person who’s reading it has a chance to put the ideas into his or her brain in ways that reinforce each other.
So in technical writing usually there’s redundancy. Things are said both formally and informally. Or you give a definition and then you say, “Therefore, such and such is true,” which you can only understand if you’ve understood the definition.
Or you’ll say, “We define a equals the such-and-such to be the set of all leading elements.” So this informal term, the set of all leading elements, is complemented by the mathematical description of how we constructed the set a.
So literate programming is based on this idea that the best way to communicate is to say things both informally and formally that are related. And it just provides a natural framework for switching between the natural language, English, and the formal language, C or Lisp or whatever is your formal language, and putting this together. So that, to me, has to be a win for documentation.
Now, the other thing is, as I write the program, I don’t have to present it in the form that the compiler wants to see it. I present it in the form that I think is easiest for a reader to understand.
You can write your code bottom-up and make subroutines that give you bigger and bigger things and your confidence builds because now you can do more things. Other people write top-down; they start out and say, “Well, I have this problem to solve, so first I’ll do this and then I’ll do this.”
When I write a literate program I can choose between these as I like. And almost always the way my final program comes out is in the order in which I actually thought of the things myself. So I’ll start out and I’ll say, “I have this problem to solve, so first I’m going to have to solve this and then I’m going to have to solve that.”
But then I say, “Now let’s start building some tools bottom-up.” We have the goal in mind but we build a few bottom-up tools and then we’ll go back and do a little top-down. But in what order we do this is, first I write about what I thought about the first day I had to work on this problem. And then, the next chapter would be the thing I decided to tackle next.
And I start to tackle the thing that’s most worrying to me but that I’m also ready to solve at the moment. Instead of postponing something ’til an evil day, if I’m ready to do it now, I get that out of the way. But it’s a different order—it’s neither top-down nor bottom-up. It’s psychologically, “What do I find is the thing that’s going to make me most satisfied to get done next and I’m ready to do it?” It doesn’t have too many unknowns in it. So the freedom to put the program into that human-understandable order is very important to me.
Now, why hasn’t this spread over the whole world and why isn’t everybody doing it? I’m not sure who it was who hit the nail on the head—I think it was Jon Bentley. Simplified it is like this: only two percent of the world’s population is born to be super programmers. And only two percent of the population is born to be super writers. And Knuth is expecting everybody to be both.
I don’t think we’re going to increase the total number of programmers in the world to more than two percent—I mean programmers who really resonate with the machine and that’s their bread and butter that they’ve been born to do. But now that people are blogging, I’ve seen a great rise in the average ability of ordinary everybody to express themselves. So the second part of that that argument isn’t so strong anymore.
I tried it only a limited amount at Stanford. I worked with a group of undergraduates. They would write their programs on a summer project and I introduced them to this idea of literate programming. There were only seven of them working with me that summer. And six of them loved it to the point that they’re still using it today. The seventh one hated it. His idea of writing a literate program was to take an ordinary program and put a wrapper around it and say, “This is module one,” and so on. Of course, Stanford admits people who are good writers, so this isn’t a random sample.
Seibel: Have you ever written a literate program where you dramatically reorganized it into a different order for explication? It’s hard for me to imagine that stream of consciousness is always the best organizing principle.
Knuth: It just hardly ever happened. I can’t remember going back and really changing the order of the chapters. It just always seemed like there was almost only one choice what to attack next. I can’t explain it exactly, but it just seemed that one would segue into the next.
Seibel: Do you write literate code for programs that no one but you will ever see?
Knuth: Exactly. This is what literate programming is so great for—I can talk to myself. I can read my program a year later and know exactly what I was thinking.
Seibel: Does that always work?
Knuth: Well, it’s often a lot harder to understand a year later than before. But compare that to what I had without it. It doesn’t make the complicated thing trivial, but it’s just way better than any other method I know.
I just printed out a small subset of a large collections of subroutines that are all written in C that are pretty much state of the art for manipulating BDDs—Boolean decision diagrams. This is the opposite of CWEB—this is what almost everyone in the world does when they’re developing packages now. They do it by means of fairly disciplined commenting conventions that are understood by a large community. And the code is not real difficult to understand because it’s separated out into a logical form and you’ve got these header files and I can see the data structures and there are comments on each part of the data structure explaining what it’s doing. So this is another style of programming that works.
Yet I can’t help but think that it’s considerably below what can be achieved with literate programming. Because of lots of intangible things that I can’t prove. The most convincing to me was that I believe that I’ve written some programs that I could not have written at all without literate programming. For example, the MMIX simulator would have been such an intellectual challenge that if I had to do it in the conventional style, I don’t think I ever would have finished it. Just separating it out into subroutines wasn’t enough to simplify it to make it intellectually manageable.
This is a simulator that takes an extremely general specification of a computer: what functional units it has, how many instructions can execute at a time, the caching strategies, how the bus works, how the output works, how are you doing branch prediction, and how you maintain the pipeline.
So you can imagine a computer that would have six division units and a pipeline of certain stages and this will simulate it. Would you be able to calculate prime numbers faster if you had this machine? You don’t have to build the machine.
I’m not saying it’s impossible to take that program and put it into subroutines, but I would never have finished it that way. And also it’s only 170 pages and it is possible to understand it—I’m not the only person in the world who understands it.
Seibel: I read your literate reimplementation of the game Adventure and noticed that it seemed somewhat monolithic. It seemed like the literate style, because it lets you interpolate things, draws you away from breaking things down into subroutines.
Knuth: This is true. Instead of calling a subroutine, it’s like inline subroutines all the way through. The idea of a subroutine is there but it’s not in your final program as a subroutine. It’s more like a macro in some ways. But the thing is, at the conceptual level, the subroutine-calling mechanism isn’t necessary if you have some other way to do it in the language you’re using.
In Adventure I don’t think I actually removed subroutines from Don Woods’s Fortran program—I was taking his Fortran program and putting it into English and C. But it’s true that when you look at my code for TeX, the number of subroutines that you’re in, on the subroutine stack, might be 4 or 5 whereas a program written by somebody else, without literate programming, it might be 50 or 100.
What I try to work on is units that correspond to the way I have it in my head, rather than the way a logician might want it to be in some formal system. My programs are supposed to match my intuition more than somebody else’s rigid framework.
Of course, eventually it has to go into a computer, which is rigid, which has its precise rules of understanding. But to me the idea of the right kind of a program is something that matches the way I think as closely as possible rather than something that matches the machine as closely as possible. I have to find the way to do the conversion, but my source text tries to stay closer to my brain than to the machine.
I also believe that literate programming is a powerful style of documentation that can be used to communicate across groups of people. I had many people who understood the code for TeX well enough to construct scenarios where it would fail. I think more people, at one point in their life, understood that program than any other program of its size.
Seibel: Did you ever find, even with that, that you still had people who had read the thing and then sent you questions that made you think, “Wow, they really missed something here”?
Knuth: Of course. That always happens, but it’s a mistake in my exposition. Let me give you a simple example. In The Art of Computer Programming I’m talking about the early history of bit-oriented operations and I have the following sentence: “The Manchester Mark 1 computer, built about the same time as the EDSAC, included not only bitwise and but also or and exclusive or. When Alan Turing wrote its first programming manual in 1950 he remarked that bitwise not can be obtained by using exclusive or in combination with a row of ones.”
Now, in my sentence I’m saying, “Alan Turing wrote its first programming manual,” meaning the first programming manual for the Manchester Mark 1. But four or five readers independently said, I must have meant “his”: “When Alan Turing wrote his first programming manual in 1950”.
Well, actually, he had written other programming manuals, so what I said was correct but it was misinterpreted by people. So now I say, “When Alan Turing wrote the first programming manual for the Mark I, in 1950….”
Mathematical things: similarly I’ll get people who miss it. So then I’ll say, you know, I actually said it correctly, but I know I still have to change it and make it better.
Seibel: When you publish a literate program, it’s the final form of the program, typically. And you are often credited with saying, “Premature optimization is the root of all evil.” But by the time you get to the final form it’s not premature—you may have optimized some parts to be very clever. But doesn’t that make it hard to read?
Knuth: No. A good literate program will show its history. A good literate program will say, “Here’s the obvious way to do it and then why we don’t follow that road?”
When you put subtle stuff in your program, literate programming shines because you don’t just have the code that does it but also your documentation. You say, “This is a dirty trick here, it works because—” and then you state very carefully the reasons and the assumptions.
I’ll use dirty tricks for two reasons. One is, if it’s really going to give me a performance improvement and my application is one that the performance improvement is going to be appreciated. Or sometimes I’ll say, “This is tricky; I couldn’t resist being tricky today because it’s so cute.” So just for pure pleasure. In any case, I document it; I don’t just put it in there.
Seibel: Would that be more in the prose?
Knuth: It’s in the prose part. I don’t show the code that I’ve taken out. I could.
Seibel: Is there any facility in CWEB for actually including code that isn’t part of the application, so rather than just document it in the prose, you can say, “Here’s a really dirt-simple version of this function.”
Knuth: You just have the code but you never use it. It comes out in the documentation saying this code is never used.
Seibel: So it would just be a fragment that you would never reference?
Knuth: Yeah. Also, I have code in there that I can then invoke from the debugger. I can say, “Call such-and-such with such-and-such parameters.” The subroutine is never actually called in the program itself, but it’s there in the documentation. So I can stop a program in the middle and I can call this subroutine and it’ll take a look and see how it’s doing, see how big things have gotten.
Seibel: So by the same token you could write, “Section one—here’s a naive implementation of this algorithm; section two—here’s a slightly souped-up version of section one; and section three, here’s the one we actually use which you would never understand if you hadn’t read the first two sections.”
Knuth: Exactly. I have some programs on the Web that solve the 15 puzzle. And I go through three different versions. And I say, “Read version one or you’ll never understand version two. And read version two or you’ll never understand version three.”
I write a whole variety of different kinds of programs. Sometimes I’ll write a program where I couldn’t care less about efficiency—I just want to get the answer. I’ll use brute force, something that I’m guaranteed I won’t have to think—there’ll be no subtlety at all so I won’t be outsmarting myself. There I’m not doing any premature optimization.
Then I can change that into something else and see if I get something that agrees with my brute-force way. Then I can scale up the program and go to larger cases. Most programs stop at that stage because you’re not going to execute the code a trillion times. When I’m doing an illustration for The Art of Computer Programming I may change that illustration several times and the people who translate my book might have to redo the program, but it doesn’t matter that I drew the illustration by a very slow method because I’ve only got to generate that file once and then it goes off to the publisher and gets printed in a book.
But right now I’m working on combinatorial algorithms, which are, by definition, humongous-size problems. So in order to have interesting examples in my book I’ve got to write programs that solve problems that readers will say, “Oh, yeah, I couldn’t have done that just by simple methods, so I need to learn something about the art of programming or it’ll take 100 years to solve this problem by the brute-force method.”
Combinatorial algorithms are fascinating because one good idea can save you ten orders of magnitude in running time. But I don’t sneer at ideas that save you twenty percent when you’re doing it a trillion times. Because if you can save a hundred nanoseconds in a loop that’s being done a trillion times, I think you’re saving a day. If the code is going to be used a lot it can really pay off so you’ve got to go to subtle tricks that aren’t easy to understand.
About a year ago I saw a review in Computing Reviews—the guy was reviewing a book; the title was Programming Tricks or something like this. And the thrust of the review was, “If I ever caught any of the programmers working for me using any of these tricks, I would fire them.” And so naturally I went out and looked at the book because I thought, “This is a book I want to see and learn from. Unfortunately, the tricks weren’t actually that good.”
Seibel: Were they really firing offenses?
Knuth: They were very weak, actually. It wasn’t presented systematically and everything, but I thought they were pretty obvious. It was a different culture entirely. But the guy who said he was going to fire people, he wants programming to be something where everything is done in an inefficient way because it’s supposed to fit into his idea of orderliness. He doesn’t care if the program is good or not—as far as its speed and performance—he cares about that it satisfies other criteria, like any bloke can be able to maintain it. Well, people have lots of other funny ideas.
People have this strange idea that we want to write our programs as worlds unto themselves so that everybody else can just set up a few parameters and our program will do it for them. So there’ll be a few programmers in the world who write the libraries, and then there are people who write the user manuals for these libraries, and then there are people who apply these libraries and that’s it.
The problem is that coding isn’t fun if all you can do is call things out of a library, if you can’t write the library yourself. If the job of coding is just to be finding the right combination of parameters, that does fairly obvious things, then who’d want to go into that as a career?
There’s this overemphasis on reusable software where you never get to open up the box and see what’s inside the box. It’s nice to have these black boxes but, almost always, if you can look inside the box you can improve it and make it work better once you know what’s inside the box. Instead people make these closed wrappers around everything and present the closure to the programmers of the world, and the programmers of the world aren’t allowed to diddle with that. All they’re able to do is assemble the parts. And so you remember that when you call this subroutine you put x0, y0, x1, y1 but when you call this subroutine it’s x0, x1, y0, y1. You get that right, and that’s your job.
Seibel: Many people will agree with you that, yes, it’s more fun to write the code yourself. But other than the fun—
Knuth: It’s not only fun. The job of a mathematician is to make proofs but almost never, when you’re solving a mathematical problem, do you find a theorem for which the hypotheses are exactly what you need for the problem you’re solving. Almost always you’ve got something that’s sort of like the theorem that’s in the book. So what you do is you look at the proof of that theorem and you say, “Oh, here’s how I have to change that proof in order to prove the hypothesis that I really have.” So mathematical books are packed with theorems, but you never plug in exactly the theorem—you want to see that proof because it’s one time in a hundred when you’ll find just the theorem that you wanted. I think it’s exactly the same with software.
Seibel: Yet isn’t software—I think you’ve said it yourself—about the most complex thing human beings have ever made?
Knuth: It was Dijkstra, I think, first, but yeah. It’s because the complexity of putting something together is so nonuniform. Pure mathematics tends to have a few rules that are applied universally—three or four axioms that describe a system. Computer programs have many parts—step one is different from step two is different from step three. It brings together all these things and they have to fit in an intricate way.
Seibel: So given that complexity, it seems like at some point you have to take some black boxes and say, “OK, we know how this thing works, and we can use it.” If we were forced to look inside every black box, we’d never finish.
Knuth: I’m not saying black boxes are useless. I’m saying that if I’m not allowed to open ’em up—if I have to do everything with a library or something like this, I would come up with much, much worse results and much slower.
Seibel: Slower-running or slower-developing?
Knuth: Both. Well, OK, I can get programs to work in a hurry, so I can’t claim that. It’s just taking me longer because I have to search through more reference manuals and find the right parts, so it’s more of a search problem than a creative problem.
Seibel: A while back the guy who wrote the standard Java collection libraries wrote an article about how there had been a bug in their implementation of binary search for nine years. Basically they took the min and the max and added them together and divided by two. But, of course, if that addition overflows, that’s a bug. So, it’s bad the standard library had a bug in it, but they found it eventually and fixed it. If everyone wrote binary search themselves, the percentage of binary searches that would be wrong would probably be quite high.
Knuth: That’s true. And that’s just one of a huge number of examples. Binary search is a particular example that we started out our programming classes in the ’70s. The first day of class everyone writes a program for binary search and we collect them and the TAs take a look. And you find that fewer than ten percent are correct. And there are four or six different bugs. But not with respect to the overflow that you mentioned, which is a new one—it didn’t occur in my classes that we thought adding these two numbers might be a problem.
But this black-box idea—why do I hate it so much? We’re talking arithmetic, so say you have a program for matrix multiplication. You have a matrix multiply black box and then you change the data type from real to complex and so then you’ve got a complex matrix multiply box and it takes 4xn3 steps instead of n3 steps. If the real case takes a certain time, t, then the complex case takes time 4t. But if you’re allowed to open the box, then you’ll only need three real matrix multiplications instead of four because there’s an identity that will describe the product of two complex matrices. That’s just one small example—it just goes on and on.
I’ll have a priority queue or I’ll have some kind of a heap structure; whatever it is—binary search—I’ll have a good source for the algorithm but it won’t quite fit what I want, so I’ll adapt it every time. And I think I’ll be better off. I know I’m going against a lot of people who think their job is to write things that everybody is going to use, so if there are any bugs in it, they get to fix ’em and everybody else’s program will start working better now. OK. I’m unhappy with that kind of world. I like to see their program.
When I wrote Volume I of The Art of Computer Programming people didn’t realize that they could use linked lists in their own programs, that they could use pointers for data structures.
If you had a problem that had something beyond arrays, you would go to somebody’s package, or an interpreted language like IPL-V or Lisp. There were also Fortran versions and you could get these subroutines and then you would learn how to use that package and you would do your program in that. It was completely preposterous for somebody to teach an ordinary programmer how to include linked lists in their program. Everything was supposed to be done by these canned routines.
But general packages have got to have all of this extra machinery in order to handle cases that only come up for a small number of the users. So my book sort of opened people’s eyes: “Oh my gosh, I can understand this and adapt it so I can have elements that are in two lists at once. I can change the data structure.” It became something that could be mainstream instead of just enclosed in these packages.
Well, I’m seeing the same thing now that I’m writing about BDD structures. At the moment there are three or four packages of subroutines that work with BDDs but the thrust of what I’m writing now for The Art of Computer Programming is that you too can program simple versions of BDDs for lots of applications and it’ll go very far. You can use them for lots of different kinds of problems where you don’t need all the bells and whistles of these other packages and these things you can do are easy to understand and easy to put into programs you’re writing yourself.
Earlier this year I finished my section on bitwise tricks and techniques and these are things that have been a black art in the hacker community for years. I decided it’s time to say that there’s a theory of these things that you can understand the ideas, how they fit together. You can use them yourself with confidence. And you can build on things and do amazing things that last year you didn’t know how to do well at all. It went through the underground until now; it’s something that we might as well teach to people—something that deserves to be common knowledge.
I write a lot of programs and I can’t claim to be typical but I can claim that I get a lot of them working for a large variety of things and I would find it harder if I had to spend all my time learning how to use somebody else’s routines. It’s much easier for me to learn a few basic concepts and then reuse code by text-editing the code that previously worked.
Seibel: How has the way you think about programming changed from those earliest days to now?
Knuth: We already talked about literate programming—that’s a radical departure, that I’m viewing myself as an expositor rather than trying to just put together the right instructions. Dijkstra came out with that same evolution. In the end his programs were even more literate than mine in the sense that they didn’t even go into the machine. They were only literate.
And he was one of the people largely responsible for structured programming, where we saw patterns that we could use to scale up our program so that we could make larger programs and still keep our head straight. You write a program that’s ten times as big but you don’t have to lose ten times as much sleep over it because you have tools that allow you to put things together reliably in a larger system. That was definitely different.
So that’s one important aspect, the idea of the abstractions that we have to understand, the abstractions that allow us to deal with large systems and still be pretty confident that we’re in control and we know what we’re doing, even though they’re a mind-bogglingly complex things.
There are lots of other things that look like they’re important changes but to me they don’t seem to make that much difference. These are the surface, the different type of syntactic sugar and the different dialects of languages that we have. There are many different flavors that appeal to different personality types. Some people are more logical than I am, for example. They really like to have lots of parentheses, and things matching up and saying that, “I’m now going to start something,” and then at the end you say, “I’m now going to finish it.” And that’s not as appealing to me. That’s not the way I think. But that’s the way other people think and there’s no one best way to think.
To me one of the most important revolutions in programming languages was the use of pointers in the C language. When you have nontrivial data structures, you often need one part of the structure to point to another part, and people played around with different ways to put that into a higherlevel language. Tony Hoare, for example, had a pretty nice clean system but the thing that the C language added—which at first I thought was a big mistake and then it turned out I loved it—was that when x is a pointer and then you say, x + 1, that doesn’t mean one more byte after x but it means one more node after x, depending on what x points to: if it points to a big node, x + 1 jumps by a large amount; if x points to a small thing, x + 1 just moves a little. That, to me, is one of the most amazing improvements in notation.
Seibel: So that’s certainly powerful compared to what preceded it. But since then, a lot of people have decided that having raw pointers to memory is pretty dangerous and that they’d rather have references that behave a lot like pointers but without some of the dangers.
Knuth: Pointers have gone out of favor to the point now where I had to flame about it because on my 64-bit computer that I have here, if I really care about using the capability of my machine I find that I’d better not use pointers because I have a machine that has 64-bit registers but it only has 2 gigabytes of RAM. So a pointer never has more than 32 significant bits to it. But every time I use a pointer it’s costing me 64 bits and that doubles the size of my data structure. Worse, it goes into the cache and half of my cache is gone and that costs cash—cache is expensive.
So if I’m really trying to push the envelope now, I have to use arrays instead of pointers. I make complicated macros so that it looks like I’m using pointers, but I’m not really. In a way it’s a small thing and it’s going out of fashion. But to me it was an important idea in notation at the lower levels; when I’m working and debugging and so on I’m still very grateful to Thompson and Ritchie. I don’t know who came up with that particular one.
Seibel: Are there any other important parts of your programming toolkit?
Knuth: Change files are something that I’ve got with literate programming that I don’t know of corresponding tools in any other programmers’ toolkits, so let me explain them to you.
I had written TeX and Metafont and people started asking for it. And they had 200 or 300 combinations of programming language and operating system and computer, so I wanted to make it easy to adapt my code to anybody’s system. So we came up with the solution that I would write a master program that worked at Stanford and then there was this add-on called a change file which could customize it to anybody else’s machine.
A change file is a very simple thing. It consists of a bunch of little blobs of changes. Each change starts out with a few lines of code. You match until you find the first line in the master file that agrees with the first line of your change. When you get to the end of the part of the change that was supposed to match the master file, then comes the part which says, “Replace that by these lines instead.”
Maybe the change says, “Replace these six lines by these twelve lines. Or by no lines. As soon as you get a match, you stick in the twelve that you changed. Then you go onto the next one.” You’ve got to write the changes in order—it doesn’t do anything intelligent for matching; it just says, “Go until you find the first line of the next change that has to match some line in the master file.”
It’s a system that only takes an hour to program and it’s good enough for the purpose. Then all the tools that we have for literate programming, the weave and tangle programs, will take the master file and the change file.
So every so often I’d have to release a new master program. All these hundreds of people around the world had their change files—maybe their six lines that were supposed to match mine no longer matched, so they might have to make some changes. But they wouldn’t have to do very much. Every time I would correct a bug it would almost automatically work—the bug fix would also apply to their program. This solved the problem very simply and it worked. Anybody could figure it out and do it.
The extreme example of this was when TeX was adapted to Unicode. They had a change file maybe 10 times as long as the master program. In other words, they changed from an 8-bit program to a 16-bit program but instead of going through and redoing my master program, they were so into change files that they just wrote their whole draft of what they called Omega as change files, as a million lines of change files to TeX’s 20,000 lines of code or something. So that’s the extreme.
But now I use change files all the time because I’m writing programs for myself that I’m using writing my book—I’ve got lots of problems that I want to solve and I want to experiment with different versions. Like yesterday I wanted to find out how big a Boolean circuit is for multiplication of n-bit numbers. I have a program that takes any Boolean function and finds out how big its BDD is. So I’ve got a program that takes any Boolean function and computes its BDD.
In my original program you input the truth table of the function online—it says, “Give me a truth table,” and I type in a hexadecimal number, because I had a lot of small functions that I’m using as examples. But it only works for small functions, the ones that I want to type in the truth table. Then I’ve got a big function like, “Multiply all pairs of 8-bit numbers.” This is a function of 16 variables—there are 8 bits in x and 8 bits in y. So I write a little change file that takes out this interactive dialog and replaces it with a program that makes a truth table for multiplication.
Then I changed that by saying, “Let’s read the bits from right to left instead of from left to right, which gives you a different BDD.” Or, “Let me try all Boolean functions of six variables and I’ll run through them all and find out which one has the largest BDD.” But all of these are customizations of my original thing.
I’ll have maybe 15 variants of that program that are easily understood. This was an unexpected spin-off from literate programming because of our need for sending out master files to a lot of people that were changing it for their own system; I’m now using it in a completely different way.
Seibel: It seems sort of obvious why that would be useful for you in the kind of work you’re doing, where you want to do a lot of variations on different themes.
Knuth: Yeah, I’m writing a book.
Seibel: Do you think this mechanism could be more broadly applicable?
Knuth: I have no idea. I’m not sure how it would work if I was in a team of 50 people. But I hope that the idea of an individual programmer writing programs in order to learn something is not a dying breed.
Seibel: In your earliest days you were writing machine code; then you found structured programming, which provided literally a structure for organizing programs. And then you invented literate programming, which gave you another way to structure programs. Since the invention of literate programming, has there been anything else that’s as dramatically changed the way you think about programming?
Knuth: I’ve got better debugging tools for literate programming; that’s basically all.
Seibel: OK, let’s talk about debugging. What better tools do you have now?
Knuth: It turned out that the inventors of the GNU debugger realized that you could have preprocessors writing programs. So you can correlate the low-level stuff to a high-level source in a completely different language. So I’m writing in CWEB but I still never have to look at the lower-level things because it’ll flash my CWEB source as I’m stepping through the program.
Seibel: So that’s a facility built into GDB which CWEB takes advantage of?
Knuth: And was built into GDB because it was built into C to have __LINE__ directives. We had to work to make use of the __LINE__directives, but it works beautifully. The computer is sitting there with a binary instruction but GDB knows that this came from something in my WEB source file even though WEB came 10, 20 years after C. So it was a very good, very forward-looking part of their design to make that work.
Seibel: So you use GDB. What other debugging techniques do you use?
Knuth: I’ll add a lot of code that will check to see if my data structures, with all their redundancies, are properly done. This sanity checking might slow the whole thing down by a factor of 100 if I turn it on.
For instance, I had a complicated data structure that involves reference counts. So I’m writing some pretty complicated programs, and getting these reference counts correct is mystifying. Every once in a while I have to increase the reference count or decrease the reference count. But when a pointer is in a register or is a parameter to a subroutine, does that count as a reference in the data structure or not? So I’ve got the sanity check written that goes through the millions of counts seeing how many references are really made and are the numbers correct? Then I’ll do a little computation and check the whole thing. That way errors will be detected billions of steps before they would surface in a crash.
There was a program that does multiplication in a new way, so I tried it exhaustively. I made 256 numbers and I multiplied each of them by each other one, but after each one I would do a sanity check. I multiply 2 by 3—fails! So I fixed that. And then something else. Finally I got it to where all 256 by 256 were working and getting the right answer.
So that’s an important debugging technique for me. Maybe ten percent of the code is devoted to something that I don’t need except when I’m debugging. And the sanity-check code also documents the data structure.
I’ll also write something that gives a nice symbolic form of a data structure so I don’t have to decode a whole bunch of binary things. Then, if necessary, I can print out a data structure in some decent structured form or I can dump it out in a file and I can write another program that analyzes it to find out what’s going wrong.
Seibel: Related to invariants and various kinds of assertions, folks like Dijkstra would argue that we’ve got to put very formal assertions at every step of our program so that we can then prove our programs correct. I’ve read where you’ve talked about wanting to prove your programs “informally correct”; what’s your take on the idea that we should go beyond that and formally prove things correct?
Knuth: On one hand you have this impossibility of ever having something proved. Somebody will say they have a program that’s verified and it’s only verified because it met its specifications according to some verifier. But the verifier might have a bug in it. The specifications might have bugs in them. So you never know that the program is correct. You have more reason to believe it, but you never get to the end of the loop. It’s theoretically impossible.
The very first paper by Tony Hoare about formal proof, “Proof of a Program: FIND,” was a great achievement and advanced the state of the art. But there were two or three bugs in that proof. It hadn’t occurred to them that you had to verify that subscripts lie in-bounds or something like this. There’s always a chance for gaps. Still, he had verified it much more than anybody else had at that point.
Now, the program that I did yesterday—I have no idea how I would state all of the assertions that are there. I would never get done because I wouldn’t have any more confidence in my assertions than I would in the program.
Or TeX, for example, is a formal mess. It was intended to be for human use, not for computer use. To define what it means for TeX to be correct would be incomprehensible. Some methods for formal semantics are so complicated that nobody can comprehend the definition of correctness.
Seibel: When you were working on TeX you wrote a really horrendous torture test of the program.
Knuth: Right.
Seibel: How do you get in the frame of mind to do that? Programmers often tend to want to protect their baby, and so they don’t test as hard as they could.
Knuth: Well, I’ve been a nitpicker all my life. So if I can get my kicks out of finding errors then I just have to make sure that I forget that I was the author of the program. I try to imagine that somebody else was the author. But otherwise it’s fairly easy for me to get into attack mode. I don’t know why.
For example, some of the best work I did for Burroughs Corporation was to debug their hardware designs. Their engineers would show me the specs for their computer and I would look at it and I would try to construct examples where they would be off by 1 or something. I got more than 200 bugs out of their B-5000–series machines before they went into production, although it had passed the simulators.
Seibel: So essentially you were inventing programs that were correct according to the semantics of the language but the machine would then execute incorrectly?
Knuth: Right. Certainly if their floating point isn’t calculating the right product of two numbers, I would try to find examples of numbers where the floating point didn’t work. But also there were cases where they were implementing a stack in hardware and they had cases where registers would be empty or not at the top of the stack and I would find scenarios where their logic would get screwed up.
Seibel: Did you have a systematic way of doing that? How did you find them?
Knuth: Am I just a mean guy? I don’t know. But if I’m trying to prove something—a theorem in mathematics—instead of proving that it’s correct, it’s easier for me, usually, to say, “Well, find a counterexample.” I can get psyched up to find a hole in this or explain why it doesn’t work. And then when I can’t find any holes, then I see the proof.
I think it’s just my personality that I like to attack things and find errors. My juices are working when I’m playing the game as the opponent rather than if I’m just sitting there trying to say, “Oh, yeah; now why is this working?”
Seibel: It’s curious that that’s what gets you going, yet your life’s work is explaining things. Do you think that approach somehow feeds into how you explain things?
Knuth: The only thing I can claim for my explanations is that I try to match a natural brain process of seeing things in two different ways at a time in order to understand something better. I think the key is usually to have a stereo view instead of a one-dimensional view. I don’t know how that affects this attacking business.
But when you’re attacking—when you’re playing the game, trying to defeat something—it arouses competitive hormones or something that just stimulates the brain in a way that probably just means that I’m trying more than one way to get at it. A good explanation is a similar thing. A good explanation somehow combines different viewpoints.
Seibel: Another thing that came out of working on TeX, which you described in “The Errors of TeX,” was a log of every error that you found in the program. Folks like the Software Engineering Institute people say that part of a mature software-engineering process is keeping track of all your bugs and learning how to prevent the same kind of errors in the future. But you said that having kept this log, it doesn’t help you prevent future errors.
Knuth: Yeah. Though it’s hard to say that I wouldn’t have been even worse without the log.
Seibel: But you didn’t feel like, “Ah, now that I’ve seen this I won’t do it again.”
Knuth: I just got to recognize my sins. People keep coming back for absolution, if you know theological terms.
Seibel: So you find yourself now making bugs in your programs and then saying, “Oh, I’ve done it again, that same kind of bug.”
Knuth: Yeah.
Seibel: So why is that? Is there something about the nature of the mistakes that makes it hard to distill a lesson that will prevent making them again?
Knuth: I think it’s probably more that I’ll try harder things. I always try things that are at my limit. If I had to go back and write those kinds of programs again, the easier ones, I wouldn’t make so many mistakes. But now that I know some more, I’m trying to write harder stuff. So I make mistakes because I’m always operating at my limit. If I only stay in comfortable territory all the time, that’s not so much fun.
Seibel: So if you just kept writing typesetting systems for the rest of your life?
Knuth: Yeah, I would get those pretty good. But we keep raising the bar and then we stumble on it. We’re dealing with—as we said earlier—things that are on the edge of what human beings can handle and more complicated than have been done before.
If we restrict ourselves to the things that are really easy, then that’s not satisfactory because our appetite is always to push the boundary and go until it gets to something we can barely do. And once we’ve got to there, then we’re going to want to push that boundary and so on.
So inevitably we’re going to have bugs unless we decide we’re never going to write anything that stretches our capabilities. So how are we going to do it better? Every three years there’ll be another buzz word as to something that’s going to solve all these problems and make it really work. Extreme programming was one the last two or three years. Before that there was something else. Somebody will come up with another supposedly silver bullet and there’ll be a lot of people jumping on that bandwagon and then they’ll find, “Oh, it’s still hard.”
Seibel: Has the kind of person who can be a good programmer changed over time?
Knuth: Pretty much a constant in my experience, over a long period of years, is that every time I’m exposed to 100 people from some population or other, except majors in computer science, 2 of them are programmers in the sense that they really resonate with the machine. Wasilla, Alaska, has 10,000 people, so it’s probably got 200 programmers.
Seibel: So has programming changed enough that the kind of person who falls in that two percent has changed? Or is it still really the same?
Knuth: I don’t know—you can use the word programming in different senses. We’re always making tools that are intended to make more of a match between people’s brains and getting something done in a computer. I’m mostly talking about the way a machine really works when the machine is being pushed to the envelope rather than just getting an answer out.
We’ve got machines that are so powerful now that people who aren’t really good at programming, in my esoteric sense, are able to get answers out of these machines that would have taken a huge expert to do on old machines. But with the new machines, the people that I’m talking about are going to be doing the problems that couldn’t be handled by the old machines.
So there’s that change and then there’s the change that I’m really worried about: that the way a lot of programming goes today isn’t any fun because it’s just plugging in magic incantations—combine somebody else’s software and start it up. It doesn’t have much creativity. I’m worried that it’s becoming too boring because you don’t have a chance to do anything much new. Your kick comes out of seeing fun results coming out of the machine, but not the kind of kick that I always got by creating something new. The kick now is after you’ve done your boring work then all of the sudden you get a great image. But the work didn’t used to be boring.
Seibel: But you still find the kind of programming you do interesting?
Knuth: Oh my God, yes. I’ve got this need to program. I wake up in the morning with sentences of a literate program. Before breakfast—I’m sure poets must feel this—I have to go to the computer and write this paragraph and then I can eat and I’m happy. It’s a compulsion; that I have to admit.
OK, let me show you the program I wrote yesterday. I’m multiplying huge integers that are way bigger than the universe—they’re special integers that you can compress the representation down, and so I can deal with them even though I couldn’t represent them in an ordinary notation, and I’ve been multiplying these integers that are inconceivably large and I’ve been squaring them and finding out how they look after squaring them. I’m very puzzled about what’s going on, but this is exciting to me.
Seibel: You’re an academic but also have worked on big systems and have done some work in industry. How do you see the relation between academic computer science and industrial practice?
Knuth: It’s gone in waves. In the ’60s the academics were way ahead of the industry and the programs that were produced in industry, except for maybe airline-reservation systems, were laughable to everybody in universities.
By 1980 the situation had pretty much reversed and the programs that were being written by people in universities were laughed at by the people in industry because the universities had gone into theological mode and you weren’t allowed to use goto statements. I’m exaggerating to simplify, but basically there were no-nos in university programs that were keeping people’s hands tied, and the people in industry didn’t have to worry about that.
But then in universities people came up with some better ideas about networking and dealing with large pieces of data and so on, and got ahead. So it goes back and forth. But the trend in a lot of the algorithm and datastructure community has not been to my liking when they have lots of data structures that are just… baroque is the only word I can think of. They’re intricate and clever and you have to admire them for the intellectual challenge, but I find them sterile. They don’t connect with life; they’re working in another world. It’s an OK world and it’s got its structure, and they’re friendly and nice people, but it doesn’t appeal to me personally and it doesn’t really relate to practice.
I don’t know why it’s important to me if something relates to practice or not. There are mathematicians who never think about anything finite, and they hardly ever come down to countably infinite—they publish terrific papers just talking about kinds of infinity that are mind-boggling and they’re able to make sense out of it and that gives them satisfaction. And there are similar things like that in algorithms. But for me I’m turned on much more by the ideas that I would be able to use in my machine.
Seibel: In 1974 you said that by 1984 we would have “Utopia 84,” the sort of perfect programming language, and it would supplant COBOL and Fortran, and you said then that there were indications that such language is very slowly taking shape. It’s now a couple of decades since ’84 and it doesn’t seem like that’s happened.
Knuth: No.
Seibel: Was that just youthful optimism?
Knuth: I was thinking about Simula and trends in object-oriented programming when I wrote that, clearly. I think what happens is that every time a new language comes out it cleans up what’s understood about the old languages and then adds something new, experimental and so on, and nobody has ever come to the point where they have a new language and then they want to stop at what’s understood. They’re always wanting to push further.
Maybe someday somebody will say, “No, I’m not going to be innovative; I’m just going to be clean and simple, and I’m going to stick to it.” Pascal was started with that philosophy but then didn’t continue. Maybe we’ll get to a time when somebody will say, “Let’s set our sights lower and really try to make something that’s going to be stable.” It might be a good idea.
Seibel: Isn’t part of the problem that while there are misfeatures, there are also missing features and if a thing is missing you’ve got to make up something to fill that gap.
Knuth: Yeah, that’s right. It’s got to be extensible somehow. Java didn’t make itself extensible in a good way.
Seibel: You’ve designed some languages yourself—probably the most widely used of which is TeX.
Knuth: So TeX is a programming language but I had to put in those features kicking and screaming. Guy Steele, Terry Winograd, Leslie Lamport, and different people needed things when they were using TeX as a front end for their material. I think Terry Winograd was writing a book on the syntax of natural languages, so he had some really powerful macros that he wanted to write in order to make the diagrams in his book. That pushed TeX a lot towards becoming a programming language in the earliest days.
Seibel: Do you ever wish you had focused more on the design of the language, as a language?
Knuth: I don’t know. I guess so. In a way I resent having every language be universal because they’ll be universal in a different way. It’s a little bit like Unix having 30 definitions of regular expressions under one roof—depending on which part of Unix you’re using you’ve got a slightly different flavor of regular expressions. If every tool that you have includes a Turing machine inside, is this really the way to go? I was really thinking of TeX as something that the more programming it had in it, the less it was doing its real mission of typesetting.
When I put in the calculation of prime numbers into the TeX manual I was not thinking of this as the way to use TeX. I was thinking, “Oh, by the way, look at this: dogs can stand on their hind legs and TeX can calculate prime numbers.”
Seibel: But people use the fact that it’s a Turing-complete programming language to do typesetting-related computations. If it wasn’t Turing-complete they would be unable to do those things.
Knuth: Yeah, that’s right. I wrote a programming language for simulation in the ’60s that I had to work hard to kill because it had a lot of users, but then when Simula came out I liked Simula better and I told people to stop using my SOL language. Mostly I don’t consider that I have great talent for language design.
With TeX I was interacting with hundreds of years of human history and I didn’t want to throw out all of the things that book designers have learned over centuries and start anew and say, “Well, forget that guys; you know, we’re going to be logical now.” In this case, the name of the game was mostly to take an enormously complicated problem and find a fairly small set of primitives that would support it. Instead of having 1,000 primitives, I have 100 primitives or something like that. But going down to 50 primitives, 10 primitives—which we would do if we wanted to be mathematically clean—I believe wouldn’t work. The problem of making books goes too much into the complexity of the world, which just doesn’t want to be simplified.
Seibel: I haven’t really done a study, but it seems like the vast majority of mathematical and scientific papers are typeset with TeX these days. There must have been things you’ve seen typeset in TeX that made you think, “Wow, my program played a part in this.”
Knuth: Well, the proof of Fermat’s Last Theorem was one of those. It’s one of the most famous mathematical papers. And it happens all the time that I see books that I know wouldn’t have been written if the authors had had to go through channels the way they used to. It’s again a little bit of the black-box thing.
It used to be, you would have to type something up and that would go into a compositor and it would come back in galley proofs, and so on. You’re going through all kinds of levels of people who aren’t mathematicians and then coming out with the product at the end. So you don’t dare do anything that would confuse any of the people in that line.
But if you can see yourself what it’s going to look like, and you can make up a notation that isn’t in somebody’s style sheet because it just happens to be the right one for your problem, then you’re encouraged to do a much better job.
So this brings me great satisfaction all the time when I know that people have been able to cut through this and their creativity goes directly to the reader.
Seibel: Do you feel like programmers and computer scientists are aware enough of the history of our field? It is, after all, a pretty short history.
Knuth: There aren’t too many that are scholars. Even when I started writing my books in 1963, I didn’t think people knew what had happened in 1959. I was reading in American Scientist last week about people who had rediscovered an algorithm that Boyer and Moore had discovered in 1980. It happens all the time that people don’t realize the glorious history that we have. The idea that people knew a thing or two in the ’70s is strange to a lot of young programmers.
It’s inevitable that in such a complicated field that people will be missing stuff. Hopefully with things like Wikipedia, achievements don’t get forgotten the way they were before. But I wish I could also instill in more people the love that I have for reading original sources. Not just knowing that so-and-so gets credit for doing something, but looking back and seeing what that person said in his own words. I think it’s a tremendous way to improve your own skills.
It’s very important to be able to get inside of somebody else’s way of thinking, to decode their vocabulary, their notation. If you can understand something about the way they thought and the way they made a discovery, then that helps you make your own discoveries. I often read source materials of what brilliant people have said about this stuff in the past. It’ll be expressed in unusual ways by today’s conventions, but it’s worth it to me to penetrate their notation and to try to get into their idea.
For example I spent a good deal of time trying to look at Babylonian manuscripts of how they described algorithms 4,000 years ago, and what did they think about? Did they have while loops and stuff like this? How would they describe it? And to me this was very worthwhile for understanding about how the brain works, but also about how they discovered things.
A couple of years ago I found an old Sanskrit document from the 13th century that was about combinatorial math. Almost nobody the author knew would have had the foggiest idea what he was talking about. But I found a translation of this document and it was speaking to me. I had done similar kinds of thinking when I was beginning in computer programming.
And so to me reading source materials is great enrichment for my own life and creativity.
I was unable to pass that on to any of my students. There are people alive now in computer science who are doing this well—a few. But I could count on the fingers of one hand the people who love source materials the way I do.
I’ve got lots of collections of source code. I have compilers, the Digitek compilers from the 1960s were written in a very interesting way. They had their own language and they used identifiers that were 30 characters long but very descriptive, and their compilers ran circles around the competition at the time—this company made the state-of-the-art compilers of 1963 or ’64.
And I’ve got Dijkstra’s source code for the THE operating system. I haven’t read that. I’ve just skimmed it so far but I collected it because I’m sure it would be interesting to read if I had time.
One time I broke my arm—fell off a bike—and I had a month where I couldn’t do anything much, so I read source code that I had heard had some clever ideas in it that hadn’t been documented. I think those were all extremely important experiences for me.
Seibel: How do you tackle reading source code? Even reading something in a programming language you already know is a tricky problem.
Knuth: But it’s really worth it for what it builds in your brain. So how do I do it? There was a machine called the Bunker Ramo 300 and somebody told me that the Fortran compiler for this machine was really amazingly fast, but nobody had any idea why it worked. I got a copy of the source-code listing for it. I didn’t have a manual for the machine, so I wasn’t even sure what the machine language was.
But I took it as an interesting challenge. I could figure out BEGIN and then I would start to decode. The operation codes had some two-letter mnemonics and so I could start to figure out “This probably was a load instruction, this probably was a branch.” And I knew it was a Fortran compiler, so at some point it looked at column seven of a card, and that was where it would tell if it was a comment or not.
After three hours I had figured out a little bit about the machine. Then I found these big, branching tables. So it was a puzzle and I kept just making little charts like I’m working at a security agency trying to decode a secret code. But I knew it worked and I knew it was a Fortran compiler—it wasn’t encrypted in the sense that it was intentionally obscure; it was only in code because I hadn’t gotten the manual for the machine.
Eventually I was able to figure out why this compiler was so fast. Unfortunately it wasn’t because the algorithms were brilliant; it was just because they had used unstructured programming and hand optimized the code to the hilt.
It was just basically the way you solve some kind of an unknown puzzle—make tables and charts and get a little more information here and make a hypothesis. In general when I’m reading a technical paper, it’s the same challenge. I’m trying to get into the author’s mind, trying to figure out what the concept is. The more you learn to read other people’s stuff, the more able you are to invent your own in the future, it seems to me.
We ought to publish code. The Lions Book is available. And Bill Atkinson’s programs are now publicly available thanks to Apple, and it won’t be too long before we’ll be able to read that. That’s well-documented code with lots of pioneering graphics algorithms in it.
Seibel: Certainly with open source there’s a lot more code out there to read than there use to be.
Knuth: Yeah, that’s right. But the more varieties of different kinds of notations are still useful—don’t only read the people who code like you.