Oral-History:David Kuck

About David Kuck

David J. Kuck was a professor in the Computer Science Department the University of Illinois at Urbana-Champaign from 1965 to 1993. While at the University of Illinois at Urbana-Champaign he developed the Parafrase compiler system (1977), which is used as a testbed for the development of vectorization and program transformation. He also led the construction of the CEDAR project, a supercomputer completed in 1988. He founded Kuck and Associates (KAI) in 1979 to build optimizing compilers especially focused upon exploiting parallelism. KAI was later acquired by Intel. Kuck is a fellow of the American Association for the Advancement of Science, the Association for Computing Machinery, and the Institute of Electrical and Electronics Engineers. He is also a member of the National Academy of Engineering. He has won the Eckert-Mauchly Award from ACM/IEEE and the Charles Babbage Outstanding Scientist Award. Kuck holds two patents and has published over 100 papers.

In the interview, he discusses the development of Parafrase, his contributions to the design of parallel computing and memory, including Cedar and numerical software, and his funding from the National Science Foundation. He compares NSF funding to that of the Department of Energy and DARPA, concluding that the NSF spread small amounts of money to a large number of start-up projects which, if they grew, would then receive larger amounts from other agencies.

About the Interview

DAVID KUCK: An Interview Conducted by Andrew Goldstein, Center for the History of Electrical Engineering, August 15, 1991

Interview #132 for the Center for the History of Electrical Engineering, The Institute of Electrical and Electronics Engineering, Inc.

Copyright Statement

This manuscript is being made available for research purposes only. All literary rights in the manuscript, including the right to publish, are reserved to the IEEE History Center. No part of the manuscript may be quoted for publication without the written permission of the Director of IEEE History Center.

Request for permission to quote for publication should be addressed to the IEEE History Center Oral History Program, IEEE History Center, 445 Hoes Lane, Piscataway, NJ 08854 USA or ieee-history@ieee.org. It should include identification of the specific passages to be quoted, anticipated use of the passages, and identification of the user.

It is recommended that this oral history be cited as follows:

David Kuck, an oral history conducted in 1991 by Andrew Goldstein, IEEE History Center, Piscataway, NJ, USA.

Interview

INTERVIEW: David Kuck

INTERVIEWER: Andrew Goldstein

DATE: August 15, 1991

PLACE: Telephone Interview

Beginnings of Parafrase Project

Goldstein:

I want to talk to you about, like I described, some of the work that you did with National Science Foundation support. I would like to call your attention to the fact that I am recording this conversation. Is that okay with you?

Kuck:

Sure. Fine.

Goldstein:

I received the letter that you wrote us back in April where you describe the Parafrase Project, and I was hoping that you could tell me some details about how it got started, how the NSF got involved, and some details about what it is you were trying to do.

Goldstein:

I had been working on the Iliac IV project, which is a big parallel machine effort here at Illinois, and writing compilers and inventing programming languages and looking at applications, and realizing how difficult that whole subject was and was going to be if parallel processing was ever to get started. I started thinking about what could you do that would automatically translate programs and make it easier to program new machines using old programs. That was the original goal.

Goldstein:

Translate programs written in FORTRAN to —

Kuck:

FORTRAN. Right. Actually, let me erase that remark. What I was really frustrated about was the fact, with Iliac IV, programming the machine was very difficult and the architecture probably was not very well suited to some of the applications we were trying to run. The key idea was that I did not think we had a very good match in Iliac IV between applications and architecture. And I said to myself the real way to figure out what is needed in an architecture is to take the reality of the world, which is a large body of FORTRAN programs, and analyze those things in terms of the kinds of constructs people use, how they use them, how often they use them, and then to derive from that an architecture. That was the real thing that our first National Science Foundation proposal and that my thinking set out to do.

Goldstein:

And how were you going to conduct this analysis?

Kuck:

We started thinking about how to do that. I would say that was in 1969 or something like that. And we started thinking then about how do you analyze a FORTRAN program. Nobody had done much of anything. There was a little bit of activity at Burroughs in Paoli in the research center, and then there was some activity by Jerry Estrin at the University of California at Los Angeles, to build a program graph and do something with it. We just started inventing ideas about how you would take the program and the idea was that we would put the program in as parallel a form as we could automatically and then see what that looked like architecturally. We fooled around and I got a few students looking at it, and at some point in the early 1970s we wrote an NSF proposal on the basis of having a little bit of software or some ideas about how to do this.

Goldstein:

The work you had been doing up to that point had been funded through Illinois?

Kuck:

Right.

Goldstein:

One thing I wanted to back up and just ask you about, did you consider that the software that existed at the time, its structure had been influenced by the architecture available?

Kuck:

You mean that the programs were written in some particular way?

Goldstein:

Right. To optimize the architecture.

Kuck:

In fact, that is a good remark and it is a common question these days, but in reality in those days everybody who wanted to get performance out of a machine pretty much wrote an assembly language. Compilers for the CDC-6600 were not until the early 1970s able to do what you could get when you went to assembly language. The hotshot programmers all descended into assembly language and the FORTRAN programs were kind of — I mean they certainly did some things that were architecturally oriented, but I think that problem has become more serious as the years go by.

Goldstein:

When you say you analyzed these programs, I do not have a sense of how such an analysis was conducted. Did you introduce any concepts or when you said that Estrin at UCLA had been doing program graphs, could you describe that approach?

Kuck:

They were looking more at scheduling and so on. In the 1960s people had built multiprocessors and there was a certain amount of interest in how do you program them, how do you find tasks that are rather large and that you could put on one of two or four processors. I believe that both the Burroughs people who had come out of the D-825 and Estrin and his students were more or less motivated by that. I just saw it as related work and an interesting starting point. Jerry Estrin also had students in those days looking at evaluating arithmetic expressions in parallel. They were Dan Bovet and Jean Loup Baer.

So we came at the problem from really two angles. One is how you get a dependence graph of the statements in a program, that this has to come before that basically. That was the concept that both Burroughs people and UCLA people had been looking at. That was the kind of global view. And then the local view was well within a node of graph, and we took our nodes down to a much lower level than I think those people had. We said how do you make the little atomic things that you find in this graph going parallel. How do you do expressions and how do you do arithmetic recurrences, and so on, fast, and then how do you get the links in that graph broken as much as possible. Because every time you break one, you have another potential activity that can go in parallel.

Goldstein:

A node in the graph would be a program statement and then you tried to work out what sequences were necessary, what statements had to come before other statements?

Kuck:

Exactly.

Goldstein:

Why did you try to execute all these algorithms, reformulate them in parallel with that?

Kuck:

Because I was looking for a parallel machine. Physicists have a physical world to deal with, chemists have a real world to deal with, my real world was all the FORTRAN programs I could get my hands on. And what did those things imply about what you want in an architecture, a parallel architecture.

Goldstein:

It sounds like you had gone in with having made the decision to create a parallel architecture.

Kuck:

Right.

Goldstein:

You say all the FORTRAN programs you could find. Where did you collect these programs? From colleagues of yours who were working with it or —

Kuck:

Yes. In the beginning we got some programs from the national labs, we got some programs from the University of Illinois, from student jobs up to faculty members’ programs. It was not that hard to collect a lot of FORTRAN.

General Purpose vs. Special Purpose

Goldstein:

What things did you discover? Can you think of anything specific that you discovered through the analysis that influenced your architecture?

Kuck:

What happened then was, in a paper that was published I believe in 1972 with my two students, we got the first, we called it the analyzer in those days, because we were trying to analyze programs to infer architectures. We got that analyzer working, and we were able to analyze, I do not know, 20 or 30 programs. We wrote a paper in the IEEE Computer Transactions that I would say was probably the first paper that was any kind of experimental evidence that you could automatically do this kind of translation. So we felt we had a good start. We felt that it was a success at that point. But then we started to realize two different things. One of them was that the software we were writing was what you really needed as a compiler for any machine that you were going to end up with, and therefore you needed to keep doing that kind of work, the analyzer work, more intensely to get good compilers in the future. That was a kind of positive revelation and it was obvious. The negative thing is actually still discovered and rediscovered and debated by people. Let me say what the conclusion is first, and then try to work my way back.

Goldstein:

Okay.

Kuck:

If you are going to build a general purpose machine, you have to be ready to deal with a tremendous range of stuff. And if you are going to back off even to a given field like computational chemistry, unless you want to build a non-programmable machine that runs one code basically, you still have to have that architecture that is kind of general purpose.

Goldstein:

Right.

Kuck:

Audio File: MP3 Audio
(132_-_kuck_-_clip_1.mp3)

In other words, in every field if you do not focus down beyond let’s say computational chemistry or that level, you are going to find people with algorithms that span the whole range of thought processes, and those things are going to get translated into programs that in reality exploit a big range of what you can write in FORTRAN. So the thing that I discovered, I do not know exactly when I discovered this, but in the early 1970s, was that I would be able to focus what you needed in an architecture down to certain pieces of FORTRAN. And I can remember giving talks in which we talked about special purpose machines to do linear algebra, to do this and that.

But what I came to realize was that any special purpose machine was going to be almost unprogrammable and very narrow. And that a general purpose machine needed to do almost everything. So it changed. At some point there we realized that what we should really be working on was compilers than can take any FORTRAN program and do things with them. And then, on the other hand, architectural building blocks. When we went back to the NSF for more money, we started talking about parallel memories, parallel interconnection networks that were of a very general purpose nature.

Goldstein:

So you considered some special purposes machines, but —

Kuck:

I thought it was a dead end.

Goldstein:

What was necessary in order to expand the potential applications of the machine?

Kuck:

General purpose versus special purpose?

Goldstein:

Right.

Kuck:

This same idea has been hit upon by many people over the years. We had something called wave fronting in our very first paper, go through a two dimensional array. Many years later H. T. Kung at Carnegie Mellon started talking about systolic arrays, very special. In the early papers people actually related it back to our paper and talked about wave fronting and certain two dimensional problems, and those systolic arrays would be special purpose chips, basically. That kind of discussion continues to go on in architectural circles and it especially goes on in the user side, in the physical sciences and mainly in physics and chemistry.

People say, "Let's build a special purpose box." So I am saying that these ideas of special purpose versus general purpose have been pursued again and again and again by people and I am sure I was not the first person to think of it in the early 1970s. People talked about checkers. Arthur Samuels talked about a checkers playing machine in the 1950s, I believe. The conclusion, I think, that no one would argue with today is that if you give me an algorithm I can design you a computer that is one or two orders of magnitude more cost effective than anything that is general purpose.

Goldstein:

Okay.

Kuck:

The question then is, who cares about that thing? Because it is going to be essentially not programmable, and it is going to take a certain size of problem, etc. And countless people have busted their pick on "Let's build a quantum chromo dynamics machine," "Let's build a computational chemistry machine." And physicists get money to do it, and actually there is a bunch of them running loose right now under the HPCC aegis saying, "Give us $40 million and we'll solve the quantum chromo-dynamics problem." And it is true that you can do that until you change your algorithm or change your basic ideas. There is a group at Columbia who has been doing this kind of thing. Ken Wilson, when he was at Cornell, kept talking about it and so on. So my view though for the last 20 years has been, "That is okay if you want to just solve one problem." And that was my objection to Kung with the systolic arrays.

Then the thoughts were, okay, these things are now chips, and they are so cheap that we can hang a whole array of different kinds of them on the bus of a VAX. Then when you come to a certain subroutine that does a certain kind of linear system, you just call this chip up. Well, sounds good, but it leads to a lot of other architectural and especially compiler difficulties. You know, recognizing those things and making it all work right. So the answer is, except in an imbedded processor that is in the wing of an airplane — all it ever has to do is take the data from this sensor and process it and so on — that idea does not really work too well.

Goldstein:

That is, again, a conclusion based on your experience with the real world. You found the way these programs work.

Kuck:

We found that, right, that there was such a range of things that happened in programs that I just turned my back on that approach basically.

Goldstein:

Was it an economic analysis to decide that it is just simply not —

Kuck:

My idea was I do not want to build special purpose things. There is a kind of a sharp dichotomy between special purpose and general purpose .

Influence of Work

Goldstein:

Having decided to build the general purpose machines, what would be the highlights and the novel features of the architecture. You write that these were vector machines. And that was new?

Kuck:

Our very first work actually was not vectors. It had a very general parallel model to it. Then we started going in the direction of vector machines. We designed at Burroughs the Burroughs Scientific Processor, which was a Single Instruction Multiple Data, but it was a memory-to-memory pipelined machine. And the Iliac IV was like that, and in modern times the connection machine is like that. Then we decided by the late 1970s that SIMD really was too inflexible, and went MIMD, Multi Instruction Multi Data.

Goldstein:

You say you built that for Burroughs. I mean one thing you wrote was that you had influenced almost all vectorizing parallelized compiling efforts.

Kuck:

I do not think that is an immodest statement, but —

Goldstein:

Could you give me some examples of important machines that you can see your influence in?

Kuck:

I think that in the compiling side if you look at the papers that are written about the Crays or the idea machines or the Japanese big vector machines or the attached array processors, they either refer to our papers in the early days or — I started a company with a couple of my students in 1979, and now they are doing business with almost every company. So either direct, either one way or the other I guess we have had our influence.

Goldstein:

What is the nature of the influence? Just the approach in general or specific techniques?

Kuck:

I guess that maybe you could say there were two things, three things. One is people still refer to that 1972 paper as solid evidence that something could be done. We actually wrote a series of, I would say, three papers over the 15-year period in which we did more comprehensive experiments about what we could do at a later date. I guess that gave a lot of people some confidence that something was possible. Second thing was we published lots of algorithms and so on, and people always seem to refer to those when they talk about how they are writing their compiler. The third thing is we handed Parafrase around, and now at Kuck & Associates they sell some software that lots of people have bought, so that some people are using this stuff directly.

Goldstein:

Who did you distribute to? Do you have any idea?

Kuck:

Parafrase I was actually sent to a number of people. I would say among the more influential things that has happened was we gave it to Fran Ellen's group at IBM in Yorktown Heights. That led IBM people to get very interested in this subject. She has continued until present with a group of people. And then as it turns out Ken Kennedy was on sabbatical there at the time, and he worked on it and took the thing back to Rice. And I believe today that Rice and Illinois are probably the only two places that — are the first two places that people talk about when they talk about this subject.

Goldstein:

Right.

Kuck:

So we got him started and we got her started, and we keep doing it ourselves.

Goldstein:

What motivated you to send it on up to IBM? What that something?

Kuck:

We made it available to people, and Fran Allen wanted it. Sent it to other people. I would say it was a very hard thing to use, and I would say that the IBM-Rice effort was the most substantial fallout from it.

Goldstein:

Okay. When you say it was hard to use, could you describe?

Kuck:

It is an enormous piece of code, needed a big IBM machine to run it. It was written in PL-1, which was by that time ten years old, and so the language continued to die as time went by.

Goldstein:

How was it used? How would a user begin?

Kuck:

I am not sure what you mean. The problem with it was that it would take an enormous amount of resources. You need a very big memory and a lot of disk space, and it ran slowly. And it was okay, and then when you wanted to use it, there were lots of switches and options on the thing. It was the result eventually of a dozen Ph.D. theses. So in order to transform a program in a certain way you would have to set these switches up to do an experiment. It took a big investment of time to figure out how to use it.

Purpose of Parafrase I

Goldstein:

I just want to be sure I understand exactly its purpose. A piece of existing code would be entered as data and it would be translated into a different code that would run on a machine with a different architecture?

Kuck:

You are almost right. It was translated into a source language that looked like FORTRAN but had a lot of explicit parallelism in it. Then that was one thing you got out. The other thing you got out was a two-page (on a line printer) summary of all kinds of details about what was in that program. And a dependence graph and sort of the loop structure and what kind of recurrences there were and what kind of branching there was, all kinds of things that we thought were interesting. Then you could set the number of processors like to any power of 2, and you would have to give it a problem, a certain piece of data. Or you would have to set the loop limits, I should say. You did not really execute the program, you just set the loop on it. Then it would tell you about the speed up and efficiency and various other things on these various architectures.

Goldstein:

Did it require much custom tailoring to different architectures?

Kuck:

No. That was all built in.

Goldstein:

By that you mean?

Kuck:

It generated this kind of report. If you ran it on a 4-processor SIMD machine and an 8-processor SIMD machine, on those numbers of processors MIMD machine, what speed would you get. Then that would assume the memory. You could set some parameters about how the memory was going to work and things like that, right? You could get a kind of a snapshot of the architectural expectations for this compiler on a certain range of machines. What people would do would be work on certain passes to try to make the compiler better and get a better result.

Parallel Networks and Memories

Goldstein:

I see how this could be used for people with machines that employed some particular architecture, but did your work here inspire your work in new architectures?

Kuck:

As a related activity. I suppose it was in our second NSF grant. The first one was probably in 1970 or 1971, so the next one was three years later. We started talking about parallel interconnection networks and parallel memories. I should mention some of the students. That first Parafrase paper — we did not call it Parafrase in those days — was written with Steve Chen, who designed the Cray YMP and XMP. He has now started his own company, SSI in Wisconsin. And Yoichi Muraoka, who is now a very famous professor in Japan. Then we started on the networking business. Duncan Lawire, who is now the computer science department head at Illinois and was the co-founder of our center here five, six years ago, wrote a classical paper from his thesis about the so-called Omega Network, which was a shuffle exchange network.

In those days I started to realize that we needed big parallel interconnection networks. Actually, Yoichi Muraoka and I, one long weekend, figured out how you could decompose a crossbar switch to smaller switches. And I said, “Boy, this is amazing, but the telephone company should know about this.” I started making phone calls and I found out that the telephone company in fact knew all about that. They knew about the decomposition, but they did not know about controlling the network. In particular, when you want to make a telephone call the time to set the switches in the network is, let’s say, proportional to the time of dialing the telephone or something like that.

Goldstein:

Okay.

Kuck:

It is seconds. But we wanted to set the switch in a memory cycle time, in microseconds or in nanoseconds now. I had a lot of discussion over the phone with the famous guy Benes there and it took me a while to realize that he thought there was no problem with this whole field. Then, when we finally got on the same wave length he realized, I realized, that he had only solved half the problem. A crossbar switch has, for inputs and outputs, N-squared cross points in it.

Goldstein:

And everybody can speak to each other.

Kuck:

Right. What Benes proved was that you could do it in n log n switches, and that the delay through the network was log n time. Well, that is the same as the delay through a crossbar really.

Goldstein:

Okay.

Kuck:

And I said, “Well, that is great, but I have to set that thing in log n time.” And he said, “Well, why? and it is okay,” so then we had a cultural difference there. Then, we started working on how can you set those switches in logarithmic time. And Duncan actually in his thesis by fooling around with Benes type networks, what we did was we narrowed it down, because don’t want everybody to be talking to everybody else necessarily at any moment. We have very regular patterns coming from the memory to the processors. But there are lots of different ones. So Duncan described in his thesis these Omega networks that was a whole big class of patterns that we could observe through Parafrase were common in programs in the data structures. And he proved how to set the switches in real time.

Goldstein:

Alright.

Kuck:

So as you go through the network it sets itself, essentially, and it does what you want it to do. Well, his thesis is, I would say, referred to it. Any issue of the computer transactions you pick up there is one more guy beating that subject, even today.

Goldstein:

I am puzzled by this. You say Parafrase could detect patterns and it could detect patterns in the data structure?

Kuck:

I skipped the whole subject of parallel memories. We started looking at how do you design a parallel memory that has no conflicts when you go at certain data structures. And the network also would not have any conflicts and would control itself in real time. That was the objective. What does that mean? Well, if I have matrix A and matrix B and the program is adding those two matrices together, then I want to fetch rows or columns or some way that FORTRAN would specify and I want that stuff to come out of the memory so that in a parallel memory every bank is giving me a piece of data on every cycle. Then, when it hits the switch, it can flow to the processors where A-1-1 is going to be added to B-1-1 in one processor and A-1-2 to B-1-2 in another processor. So you want all that to happen.

Now, if I am multiplying matrices, then I need rows of one and columns of another. If I am doing some other thing with P-E-E problems, I may need diagonals of matrices, and I may need a red-black checkerboard pattern so the stride is not always one. It may be that I want every other element or another. This kind of information is all statically available in a FORTRAN program. Do you see what I mean? It says A-sub-2i plus 7 plus B-sub-3i minus 6 equals C-sub-j or something.

Goldstein:

Right.

Kuck:

So you look at all that stuff and you try to figure out what it means in terms of storing it and then aligning it. We call them alignment networks for that reason. And then into the BSP and into the other machines that sort of notion has gone forward.

Goldstein:

I think I see the problem you are describing, but I do not see how Parafrase helped people with it.

Kuck:

It would be possible in Parafrase to simply collect statistics about what kind of memory access patterns you need. I mean the simplest thing is you just code all this stuff. But, then, in reality, the memory is always too small. The dimension of the matrix has to be folded over somehow and packed into the memory and all that sort of thing. So when you start thinking about partitioning these arrays and so on.

Goldstein:

You say that AT&T was one application here for phone switching. Are there any other notable —

Kuck:

Well, okay. The point I am trying to make is that in computers in the 1960s there were multiprocessors with like up to four processors.

Goldstein:

Okay.

Kuck:

And there would be a crossbar switch, and that was that. Some people would talk about 8 or 16 processors in a god awful crossbar. We started saying you could go up to hundreds or thousands of processors if you used this kind of Benes style or Omega Network, as we called it, because the logarithm of a thousand is — Okay, a thousand squared is a million, but the thousand times the logarithm of a thousand is only ten thousand.

Goldstein:

Right.

Kuck:

So it is almost linear in that sense. That kind of discussion led to people actually building machines like that. It has just taken a long time, but there are a lot of experimental machines, IBMs, RP-3, Cedar system that we built here, lots of other academic projects. In industry Bolt Baranek and Newman built the Butterfly and the Monarch and this PC-2000 and so on that were all that way. Other parallel machines were built during the 1980s and the companies have come and gone. A number of people have tried various variations on this kind of network. It is still not settled what to do about these networks, and in reality there are four networks that you can consider: crossbar, shuffle, hypercube or two dimensional array. The hypercube people at Caltech started saying, “Well, hypercube is the best switch.” And then we get into discussions about scalability, and if you really want to build them a big machine and so on. In reality this subject is not ended yet. There is still a lot of debate.

Goldstein:

What advantages does your system offer over these other three?

Kuck:

In a nutshell I claimed, I still claim, it is what you want because, okay, crossbars just got too expensive. Even today people are not going beyond 32 by 32. A mesh, a north-south, east-west type of mesh, is a very weak connection, and people build them because they are very easy to build, but it takes you a long time to get to the other side of the machine if you have to. The two that are fast are hypercube and a shuffle. And in fact, the hypercube is a little bit faster than a shuffle in terms of the number of stages you have to go through, although they are both, for the numbers that we are talking about, they both look like the logarithm.

And the difference then is how many wires are there per processor or per memory or whatever, and the shuffle has a fixed number of wires independent of the size of the machine. From a packaging point of view, it’s an easy thing to [inaudible word] with, whereas the hypercube every time you double the size of the machine you add another wire. And by wire I mean, you know, 32 or 64 wires, basically. You can see it in the new Intel Hypercube. They concluded that they could not build a hypercube bigger than 200, and then they would switch to a mesh. And it is — So now you will start engaging me in a technical debate about what is going on today, and all I would like to say is there is still a lot of confusion.

Goldstein:

Perhaps I missed this, but by what name is the system that you advocate called?

Kuck:

Our machine here is called Cedar. And these networks are usually called shuffle exchange networks.

Involvement of NSF

Goldstein:

I would like to find out about your history with the NSF. You say you approached them in 1971, or you began receiving funds from them in 1971. What led to your involvement with the NSF? Why did you turn to them?

Kuck:

It was the most prestigious and obvious place. It is true that John Pasta had been our department head here, and at about that time he went to the NSF. That was another motivation, I guess. I would not want it to be said that he gave me the money, but it was true that I felt the friendly audience there.

Goldstein:

Were they sympathetic to your particular project? Was this research that they were looking to promote?

Kuck:

The NSF has always been kind of neutral about what it does. But they certainly were happy to help. They gave me money, and the first time I asked. It was John Lehman, who is still there and who has actually been involved with some of my grants ever since.

Goldstein:

Were they very interactive? Did they follow up and offer guidance or suggestions?

Kuck:

I would say that they have always been very friendly and supportive and so on. They are not people who interact with you very much really.

NSF, DOE and DARPA

Goldstein:

How did their funding relate to the Defense Advanced Research Projects Agency? I think we were talking about your work before, and correct me if I am wrong. Bruce Barnes, I think, suspected that you received money from the NSF and then after your work was off the ground started getting a lot of money from DARPA. Is that true?

Kuck:

Okay. By 1980 my little group here had I guess three different NSF grants. With Ahmed Sameh we were doing some numerical algorithms, and parallel, and with Duncan we were doing architecture, and then the Parafrase work that went on. And then there was a lot of interest in the early 1980s in building machines and high performance computing and so on. So at that point we wrote a proposal to build Cedar, to try out some of these ideas. In the end what we had was two identical proposals: one to the NSF and one to Department of Energy, okay? In the 1970s the DOE, through Livermore and then through Germantown, had started to support this work as well.

So by the early 1980s we wrote a very large — I would say $7½ million, I guess it was — DOE proposal, and an approximately $1½ million NSF proposal. As I say, the words were exactly the same, so they kind of commingled funds and they both agreed to this; since they had been supporting it and they thought we were ready to do something, they both agreed to the same terminology. And that was our big starting point here for the center in 1984 or so, or 1985. Subsequently, we got money from DARPA and the Air Force and lots of industrial sponsors and so on. But we continue to get NSF money.

Goldstein:

At what stage? Has NSF always been the principal funder?

Kuck:

No. In terms of the dollar amount after the center started, they have not been not the largest funder, no.

Goldstein:

Who is?

Kuck:

The DOE.

Goldstein:

But not DARPA.

Kuck:

DARPA has given us substantial amounts of money, but the DOE has given us at the level of one or two million dollars a year over the last seven or eight years.

Parafrase II

Goldstein:

There is the Parafrase work, which you say continued to develop. In what ways did it develop?

Kuck:

As a matter of fact, by the time we started the center, Parafrase I had been sent out to lots of people, and it had been, I would say, successfully used by only a very small number, for some of the reasons I mentioned before. But it was a tremendous training ground. In other words, lots of people who have gone off to work in companies or in academia have worked on it and we have learned a lot and lots of papers have been written based on it. And yet we still had a lot of ideas that had not been tried. I think we have been doing more experimental work than most other people have been doing, so we wanted to keep that nature going.

So we said, “Let’s do Parafrase II. Let’s start all over from scratch, write the thing in a more portable, modular, blah blah blah way.” Some of the guys here, in particular Constantine Polychronopoulos, who was my student and stayed here as a faculty member, with a lot of other new students, now have a Parafrase II effort that started about five or six years ago. And Parafrase II has been working and is usable by people for the last two years, and that has been sent to, I do not know, 25 other sites, or more than that maybe, by now.

Goldstein:

Universities mostly?

Kuck:

Mostly universities.

Parallel Processors and Workstations

Goldstein:

Then there is the Cedar machine

Kuck:

Right. Well, other things happened. As a direct consequence of the papers we wrote, when Alliant Computer Systems was founded in the early 1980s they came out here and worked with us and actually built their machine to kind of work in the way we were proposing. They are still in business selling parallel machines with up to 28 processors.

Goldstein:

But that was a private enterprise.

Kuck:

That is a company on the outside, right. Steve Chen, when he went to Cray, had a certain amount of interaction with us. My company had a lot of interaction with CDC and ETA and the Cyber-205 and the ETA machines. Now, this technology through Kuck & Associates over the last six months — Okay, let me back up, if you care to hear about that.

Goldstein:

Go ahead.

Kuck:

We worked a lot on memory hierarchy management in Parafrase and discovered ways of managing the paging behavior of a program, in virtual memory. This was in the 1970s. That technology and those ideas have continued to be worked on. BLAs 3 — basic linear algebra subroutine library — in numerical linear algebra, that all the dense linear algebra is based on, were formulated here at the center by the applications guys using those ideas. And now every computer has the BLAs running on it. Then, at Kuck & Associates they have been moving in the workstation direction, they have customers building parallel workstations. But, with the RISC processors and the cash management problems, it was discovered that they can make some of the standard benchmarks go fast by their techniques. At this moment, I guess by this fall, every workstation manufacturer has licensed this stuff on a uniprocessor. It is actually making the spec benchmark go, I would say, 20 to 30 percent faster, sometimes even more than that, on all the current workstations.

Numerical Software

Goldstein:

Then the other area is work on numerical software?

Kuck:

Right. That was mainly the work of Ahmed Sameh, but I guess we were probably co-PIs on some of those. I am not sure about the details. We wrote some papers together. He has become a leader in parallel numerical linear algebra and was a co-founder of the center here and he just became the Computer Science department head at the University of Minnesota. He just left in the last couple weeks.

Goldstein:

His work was founded on your work in parallelism, on the parallel architecture you had been focusing on, the new numerical techniques for the new architecture?

Kuck:

Right. He was a student here and did some parallel algorithms for Iliac IV that just kept going.

Goldstein:

Are there any particularly noteworthy applications to his, to some of the software developed in [inaudible word] problems?

Kuck:

The Blas 3 for example was done by his group and him. The Blas 3 that I mentioned that, those basic building blocks. They use techniques of memory management that started really in Parafrase I, but that is a very commonly used piece of software now. Beyond that, it gets more esoteric. I think a lot of parallel algorithms work refers to the work that he has done, but that is the most popular thing.

Goldstein:

On the work on architecture you have mentioned that some of the basic areas have been parallel memories, paging and virtual memories. Any other large areas?

Kuck:

I think the other thing that we, besides the network that we take credit for and are pushing, is in Cedar, a kind of, what we call, clustered hierarchical machine. Since Alliant was so close to us, we built Cedar out at Alliant, and we think of them as clusters in which you do a small parallel computation. They have a shared memory. Then there is another level at which you hook some of those together in parallel and add a shared memory behind that first shared memory, and then you can just keep going that way. And that approach is the Cedar approach. Steve Chen publicly acknowledges that that is what he and his IBM related projects are doing. A lot of Japanese projects talk about clusters, and some other domestic ones are at this point too. Alliant is going in the cluster direction. In terms of our influence on industry, I guess I can say that lots of such things, guided self scheduling algorithms of Polychronopolous, Convex acknowledged that they put into the hardware of their parallel machine.

Nature of NSF Support

Goldstein:

It seemed to me like you characterize the National Science Foundation as a steady source of support that allowed you a great degree of independence. Is that accurate?

Kuck:

That is true.

Goldstein:

Do you have anything you want to add about them? Did they help to stimulate this area of research in a time when it was just being founded?

Kuck:

I would say they did, and they have supported me, just to repeat, from the beginning, and they have supported other people. They have done a good job in what is now almost a 20-year period. Some things go from ideas to actually working things in real software and real hardware. If I were to criticize the NSF, or come close to it, I would say that they have been a little bit conservative about supporting large projects. As small ideas grow into larger projects that need to be demonstrated, they continuously talk about supporting that kind of activity, but their reviewing process and their basic commitments tend to shy away from it.

Goldstein:

Do you think they rely on, say, the DOE or DARPA to pick up? The thing I am wondering about is if one could make the case that they seed certain areas, but then in an effort to be as democratic or as broad as they can, do not pump large sums of money.

Kuck:

Yes. That is the reality. The only problem is that it can leave an ambitious and aggressive group of people kind of dangling at some point. In other words, you start going and going, and you are a leader intellectually and you never have any trouble really getting the NSF money. I would say among assistant professors they think probably that the NSF is the most prestigious and sometimes the most difficult place to get money from. I think it is easier sometimes to go to the mission-oriented agencies and get $100,000 to study some little thing that they want to have studied.

Goldstein:

Has that been the case consistently since the early 1970s?

Kuck:

You probably shouldn’t quote me on that, but it is the feeling that you would get from a lot of people. You can sort of strike a deal with a contract administrator at the Office of Naval Research, but you can never strike a deal with anybody at the NSF to a first approximation. They always rely on peer review.

Goldstein:

Right.

Kuck:

But, the problem of scientific support — and Erich Bloch tried to do something about it — is that you sometimes find the Army or the Navy supporting some kind of stupid large project in a university with lots of money because it has grown up in that agency. But you find the NSF tending not to do that, and there has to be this crossover point for people. You have to kind of go out and figure out what is going on in those mission-oriented agencies, and that is sometimes difficult for people and it is, I would say, it is also sometimes counterproductive for people.

Goldstein:

Has the NSF been interested in supporting graduate students? Is that a particular interest of theirs?

Kuck:

Yes. They do a good job of supporting graduate students, and they do a good job of supporting up to a small group of faculty working on a single project. But if you want to build something that needs a lot of capital expenditure or if you just need to hire a lot of programmers to help in a big software project, that is where it starts to get thin. Now they have gone into engineering research centers, and science and technology centers — Eric Block’s ideas, which are good ideas. But the problem with those is that people are forced to form very large consortia of faculty and students. So you end up with what looks like a big project and the big, big bucks, but the details show up as still lots of faculty members with a few students and maybe a number of different universities. It is really the same level of support on a per capita basis when you get right down to it.

Goldstein:

Is there is anything you want to add about your research, some high points to your research or particularly influential things that you feel ought to have special attention drawn to them.

Kuck:

I do not think so. I think we have spent lots of time here, and I think that you certainly asked lots of good questions, so I cannot think of anything else at this point.