# Difference between revisions of "Oral-History:Thomas Kailath"

(37 intermediate revisions by 6 users not shown) | |||

Line 1: | Line 1: | ||

== About Thomas Kailath == | == About Thomas Kailath == | ||

− | + | [[Image:Thomas_Kailath_2573.jpg|thumb|left]] | |

+ | [[Thomas Kailath|Thomas Kailath]] was born in Poona, India, in 1935. He received the BE degree in telecommunications engineering from the University of Poona in 1956 and an MS and Sc.D. degrees in EE from MIT in 1959 and 1961, respectively. He claims to be the first student from India to receive a doctorate in EE from MIT. His master's thesis explored "Sampling Models for Time-Variant Filters," and his doctoral dissertation "Communication via Randomly Varying Channels." From 1961 to 1962 he worked at JPL in Pasadena, California, and concurrently taught part-time at California Institute of Technology. From 1963 to the present he has been a professor in the EE department at Stanford University. He was involved in the development of Stanford's Information Systems Laboratory from 1971 to 1981. In 1981, Kailath became a co-founder of Integrated Systems, Inc., a small company specializing in the development and licensing of high-level CAE (Computer-Aided Engineering) software and hardware products for the analysis, design and implementation of control systems in a variety of applications. | ||

+ | His research has focused on statistical data processing in communications and control. Control theory was his original field, but it led him by the mid-seventies to work on (in today's terminology), model-based signal pressing, including applications such as the Linear Protective Coating (LPC), and [[Milestones:Speak & Spell, the First Use of a Digital Signal Processing IC for Speech Generation, 1978|Speak and Spell]]. During the 1980s, his research focused on the algorithms related to speech; VLSI, and antennae array processing. Among his awards are the Outstanding Paper Prize for 1965-1966 of IEEE Information Theory Group; the Outstanding Paper Prize for 1983 of IEEE Acoustics, Speech and Signal Processing Society; 1989 Technical Achievement Award of the IEEE Acoustics, Speech and Signal Processing Society; Society Award of the IEEE Signal Processing Society, May 1991; Received IEEE Circuits & Systems Society Education Award, May 1993; Education Medal of the IEEE, 1995.. He is a fellow of the IEEE (1970) [fellow award for "inspired teaching of and contributions to information, communication, and control theory"] and Institute of Mathematical Statistics (1975) and served as President of the IEEE Information Theory Group 1975. | ||

− | + | The interview says little of Kailath’s early years or indeed of his schooling. The interview primarily consists of Kailath discussing the various contributions he and his peers have made to the signal processing field. In addition, Kailath presents a relatively detailed historical perspective of the field of signal processing in general. His research has focused on statistical data processing in communications and control. However, more recently, he has worked in model-based signal pressing, including applications such as the Linear Protective Coating (LPC), and Texas Instrument’s “Speak and Spell.” During the 1980s, his research focused on the algorithms related to speech; VLSI, and antennae array processing. Kailath focuses the interview on his perceptions of model and non-model based research, stating that the gap between theoretical and research work and industrial application is now closing at a faster rate than in the past. | |

− | == About the Interview == | + | == About the Interview == |

− | Thomas Kailath:An Interview Conducted by | + | Thomas Kailath: An Interview Conducted by Andrew Goldstein, Center for the History of Electrical Engineering, 13 March 1997 |

− | + | Interview #328 for the Center for the History of Electrical Engineering, The Institute of Electrical and Electronics Engineers, Inc. | |

− | + | == Copyright Statement == | |

+ | This manuscript is being made available for research purposes only. All literary rights in the manuscript, including the right to publish, are reserved to the IEEE History Center. No part of the manuscript may be quoted for publication without the written permission of the Director of IEEE History Center. | ||

+ | Request for permission to quote for publication should be addressed to the IEEE History Center Oral History Program, 39 Union Street, New Brunswick, NJ 08901-8538 USA. It should include identification of the specific passages to be quoted, anticipated use of the passages, and identification of the user. | ||

− | + | It is recommended that this oral history be cited as follows: | |

− | + | Thomas Kailath, an oral history conducted in 1997 by Andrew Goldstein, IEEE History Center, New Brunswick, NJ, USA. | |

− | + | == Interview == | |

− | + | Interview: Thomas Kailath | |

− | + | Interviewer: Andrew Goldstein | |

− | |||

− | |||

+ | Date: 13 March 1997 | ||

+ | Place: Stanford Universit. Palo Alto, California | ||

=== Education === | === Education === | ||

− | ''' | + | '''Goldstein:''' |

− | Can you tell me something about your education? How did you become involved with signal processing technologies? | + | Can you tell me something about your education? How did you become involved with signal processing technologies? |

− | + | '''Kailath:''' | |

− | I was an undergraduate in India in telecommunications engineering. It was a pretty standard education not very mathematical, but I always had some mathematical interests. I learned about Information Theory from an article in ''Popular Science | + | I was an undergraduate in India in telecommunications engineering. It was a pretty standard education not very mathematical, but I always had some mathematical interests. I learned about Information Theory from an article in ''Popular Science Magazine''. It said something about [[Claude Shannon|Shannon]] and [[Norbert Wiener|Wiener]]. We didn’t have [[Claude Shannon|Shannon’s]] book in our college library, but Wiener’s ''Cybernetics'' and his book on filtering theory were available when I was an undergraduate in the mid-'50s. That fascinated me. I especially enjoyed Wiener’s introductory chapters. That was my first exposure to the mathematical side of electrical engineering. I was very fortunate to have a good professor in India who encouraged me to go beyond the usual boundaries in academics and in my personal life. |

− | + | A friend of my father’s, Dr. G. S. Krishnayya, pushed me to apply to study abroad. Given our economic status at that time, that was an inconceivable thing to do. Normally one just sought to work for the government for a moderate but secure income. When I graduated in 1956 as a radio engineer, I had a job offer to work for All India Radio. But by then I had already applied to Harvard and MIT, and fortunately I got offers from both. By then I think I’d already begun to read about information theory. I wrote to Shannon at Bell Labs, and he replied that he was going to be at MIT. So I went there. My interest in signal processing really began at MIT where I studied information theory, Wiener filtering, and communication through radio channels. I’ve always felt that my main interest was "signal processing," but I interpreted the term in a broader sense than I think the early IEEE Audio group initially had. | |

− | |||

− | A friend of my father’s, Dr. G. S. Krishnayya, pushed me to apply to study abroad. Given our economic status at that time, that was an inconceivable thing to do. Normally one just sought to work for the government for a moderate but secure income. When I graduated in 1956 as a radio engineer, I had a job offer to work for All India Radio. But by then I had already applied to Harvard and MIT, and fortunately I got offers from both. By then I think I’d already begun to read about information theory. I wrote to Shannon at Bell Labs, and he replied that he was going to be at MIT. So I went there. My interest in signal processing really began at MIT where I studied information theory, Wiener filtering, and communication through radio channels. I’ve always felt that my main interest was "signal processing," but I interpreted the term in a broader sense than I think the early IEEE Audio group initially had. | ||

− | |||

− | |||

=== Entry into signal processing === | === Entry into signal processing === | ||

Line 51: | Line 51: | ||

'''Kailath:''' | '''Kailath:''' | ||

− | The Audio group really got a new lease on life with the re-discovery of the FFT in the mid-1960s, and the adoption by the MIT group of the FFT as a tool for signal analysis. Actually, that part of the subject began at MIT earlier with Bill Linvill and Sample-Data System Theory, which was the precursor to Digital Filtering theory. That subject began to take off after Jim Kaiser left MIT and went to Bell Labs. | + | The Audio group really got a new lease on life with the re-discovery of the FFT in the mid-1960s, and the adoption by the MIT group of the FFT as a tool for signal analysis. Actually, that part of the subject began at MIT earlier with Bill Linvill and Sample-Data System Theory, which was the precursor to Digital Filtering theory. That subject began to take off after Jim Kaiser left MIT and went to Bell Labs. |

− | + | I worked on statistical signals, which come up in Wiener’s theory of estimations and [[Claude Shannon|Shannon’s]] communication theory, both in analog (continuous time) and digital (discrete-time) form. My publications at first were mostly in signal detection theory and information theory and dealt largely with random processes. In the late 1960s, my work moved towards control theory, and dealt with what are called state-space models of systems. This was a very interesting development. Next, in the mid-1970s, a number of developments led me to work on some mathematical topics that turned out to be closely related to Linear Predictive Coding, which was used in TI's very successful [[Milestones:Speak & Spell, the First Use of a Digital Signal Processing IC for Speech Generation, 1978|Speak & Spell]] toy. So I began to publish in the Transactions on Signal Processing, first in 1977 and then more extensively after 1982. | |

− | + | The view we took of signal processing is what would today be called model-based signal processing. The other distinction is parametric signal processing versus non-parametric signal processing. The FFT-based approach was largely non-parametric. It took, for example, a thousand pieces of data, and just regarded it as a collection of numbers and processed it efficiently. I learned from Control Theory that often these thousand pieces were really determined by a smaller number, say ten, of so-called state variables whose evolution could be kept track of much more easily. | |

− | + | This model-based approach can be seen in many fields of science, but in control the systematic use and development of models was really the contribution of [[Rudolf E. Kalman|Rudy Kalman]]. I had gotten interested in the famous Kalman filter through my work in feedback communications, and I spent some time on it, so I was well-attuned to the power of that point of view. In the 1960s, I started writing a book on detection theory, but I never finished it because I got interested in control theory and state-space theory. I recently wrote a textbook on that subject. | |

− | + | '''Goldstein:''' | |

− | + | Is that the one on linear systems? | |

− | + | '''Kailath:''' | |

− | + | Yes. It has a very long preface, which discusses some of my educational philosophy. So, that was my entry into signal processing. I still publish in information theory and communications and control theory, but more and more, the large majority of my papers is in signal processing. | |

− | + | ==== Control systems background ==== | |

− | + | '''Goldstein:''' | |

− | + | Was your exposure to control systems unique? Were there other people in your position who saw signal processing problems in a similar way? | |

− | + | '''Kailath:''' | |

− | ''' | ||

− | + | I cannot fully say. To some extent the LPC people, including [[Oral-History:Manfred Schroeder|Manfred Schroeder]], [[Bishnu S. Atal|Bishnu Atal]], and Itakura were doing that. But they, or at least Itakura, were using a so-called lattice filter model with a special kind of parametrization. It was sort of in-between parametric and non-parametric. My students and I were among the early ones who introduced the state-space point of view, which is much more the model-based. The only other major name that comes to mind is [[Alan S. Willsky|Alan Willsky]] of MIT, but he's much younger than I am, and started a few years later. | |

− | + | '''Goldstein:''' | |

− | + | The way you told the story before made it sound like that development was due to the peculiar history of your training in control systems. | |

− | + | '''Kailath:''' | |

− | |||

− | |||

− | + | In the book it says exactly that. No, I wasn’t trained in control systems. In fact, at MIT I never took a course in control; MIT was the center of information theory in communications. One of the first Ph.D. theses I supervised at Stanford was in the area of feedback communications. | |

− | + | Very briefly, the idea is that information is sent to a satellite that collects data and sends it back. But a satellite has limited power, so it tends to be noisy data. However, you have the possibility of having a lot of power on the ground, so you can send back clean questions to the satellite. You can say, for example, “This data was poor. Re-send it.” This leads to what may be called a recursive scheme. As we get more data, we have to keep updating what we think the satellite is sending. | |

− | + | Communications and signal processing people at that time were not interested in this data updating process. They usually had a fixed piece of data, like a speech waveform. They analyzed it via fast Fourier transforms. If you had one more piece of data, you had to take the whole Fourier transform again. But from the control people at Stanford, and actually through my students who took their courses, I learned about the Kalman filter algorithms for updating. The concept of state that I mentioned is critical to that. So through a Ph.D. student, I began to learn about this method of processing data in order to solve this communications problem. Then I got interested in it for its own sake, and found more problems to study. That’s how I got into control. | |

− | + | === Signal processing research === | |

− | + | ==== Integrated circuits ==== | |

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | ==== Integrated circuits ==== | ||

'''Kailath:''' | '''Kailath:''' | ||

− | In the 1960s I had worked mostly in communication and information theory in random processes. In the 1970s I worked mostly in control. In the 1980s I worked mostly in signal processing. That was the evolution. As I mentioned earlier, the analysis of LPC turned out to be close to the mathematics that we had hit upon through our work in control. That was called fast algorithms. At about that time, Stanford, which was very strong in VLSI, was putting together one of the largest university centers for research in VLSI integrated circuits. My challenge was in finding uses for the computational power of integrated circuits. | + | In the 1960s I had worked mostly in communication and information theory in random processes. In the 1970s I worked mostly in control. In the 1980s I worked mostly in signal processing. That was the evolution. As I mentioned earlier, the analysis of LPC turned out to be close to the mathematics that we had hit upon through our work in control. That was called fast algorithms. At about that time, Stanford, which was very strong in VLSI, was putting together one of the largest university centers for research in VLSI [[Integrated Circuits|integrated circuits]]. My challenge was in finding uses for the computational power of integrated circuits. |

− | + | Now the Speak & Spell chip had implemented the LPC algorithm in a certain non-traditional way for signal processors. It was built in a cascade modular structure, a series of relatively simple and uniform blocks. This was much easier to build with integrated circuits than a random connection of logic. Our math theories had been leading us to studying signal processing in terms of these cascades, and so we now began to look at problems in VLSI signal processing. A DARPA project, jointly awarded to [[James D. Meindl|Jim Meindl]] of our IC Lab, Forrest Baskett of the Computer Lab, and I, helped build a group to do this. I think it was one of my former students, S. Y. Kung, who took the initiative in this, but I was actually one of the founding members of the signal processing technical committee on VLSI. | |

− | + | ==== Committees ==== | |

− | + | '''Goldstein:''' | |

− | + | That was a committee here at Stanford? | |

− | + | '''Kailath:''' | |

− | + | No. The Signal Processing Society has various technical sub-committees for different areas. Neural Networks is one that has been created recently. Array Processing and Image Processing are others. The VLSI committee was started some time in the mid-1980s. | |

− | + | So by the 1980s I was already heavily into signal processing, working on three kinds of problems. One was fast algorithms for problems in speech and seismic signal processing, the other was VLSI design, and the third was antenna array processing. | |

− | + | ==== Sensor Array Processing ==== | |

− | + | '''Kailath:''' | |

− | + | You have signals coming from different directions, which you have to determine by using an antenna array. The traditional methods for doing that are really equivalent to taking the DFT, or Discrete Fourier Transform of the data. It’s a spatial DFT rather than a temporal DFT, but it’s similar. Incidentally, the FFT was also rediscovered by an antenna engineer, W. Butler, as a way of speeding up the antenna processing. If you have sinusoidal waves in and noise, you would tend to get peaks of the DFT where the sine waves are. Similarly, when you take the spatial DFT of signals coming from different directions, you tend to get peaks at the directions where these signals are coming from. But this is a non-parametric method: it doesn’t take account of the fact, for example, that these are plane waves coming in, or that there are only two or three of them, and they may have a temporal structure. | |

− | + | The DFT doesn’t care about those things. You have some numbers, you take the DFT. Then, a Stanford Ph.D. student, Ralph Schmidt said, “Well, even if you have no noise, that method can’t solve the problem exactly. But, if you properly use the information that there are only, say, three plane waves, then we can get an exact solution without noise.” To do this required nonlinear calculations--finding the eigenvalue and eigenvector of a matrix, et cetera. So that launched a new field of model-based Sensor Array Processing. I supervised six or seven Ph.D. theses in that area. It was timely because those were the days of SDI, and that was one of the funding sources for this because they were interested in tracking missile directions. | |

− | ''' | + | '''Goldstein:''' |

− | + | Was that work was conducted for a specific project? Was there a specific large-scale project for the feedback communication system you were talking about? | |

− | + | '''Kailath:''' | |

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | No. That was just a Ph.D. thesis that was done in 1966, actually by my first Ph.D. student, Piet Schalkwijk. We were looking for interesting problems and this came up. Communication satellites were already well established by then. I have done a lot of theoretical things, but I like them to be related to problems that other people are interested in. So this seemed to be a nice challenge, and in fact our results got a lot of attention from information theorists. | + | No. That was just a Ph.D. thesis that was done in 1966, actually by my first Ph.D. student, Piet Schalkwijk. We were looking for interesting problems and this came up. Communication satellites were already well established by then. I have done a lot of theoretical things, but I like them to be related to problems that other people are interested in. So this seemed to be a nice challenge, and in fact our results got a lot of attention from information theorists. |

==== Funding and collaboration with students ==== | ==== Funding and collaboration with students ==== | ||

− | ''' | + | '''Goldstein:''' |

− | How would you become aware of those kinds of problems? What’s the source of your inspiration and funding? | + | How would you become aware of those kinds of problems? What’s the source of your inspiration and funding? |

− | + | '''Kailath:''' | |

− | + | <flashmp3>328 - kailath - clip 1.mp3</flashmp3> | |

− | + | Well, that’s a good question. I would say that by and large these different areas I mentioned were first identified as interesting areas to work on given the challenges of the time. Then we got funding for them from basic research agencies, which by and large gave us a lot of freedom to explore what we thought best. The times were easier then for funding. | |

− | Another factor was that I had changed from the MIT model I started with. A professor would have a few students working with him, each on a different topic. The professor did his own research and the students’ talked to each other, but not so much about their own work. That’s how I was in my first years at Stanford. But, for example I realized I had to learn about control theory. I went and sat in some courses, but it was easier for me to learn one-on-one. I began to find that graduate students would teach me one-on-one, and then in a few years, we began to follow the pattern that I guess I learned from my solid-state colleagues at Stanford. They often had, as experimentalists, teams of students working together. I began also to realize that it was better to work with students rather than in parallel, as I did until I became full professor. When there was a team of students, and we all worked together in a big area, we got a lot more done, and over time we were able to build up a body of knowledge in different problem areas. | + | Another factor was that I had changed from the MIT model I started with. A professor would have a few students working with him, each on a different topic. The professor did his own research and the students’ talked to each other, but not so much about their own work. That’s how I was in my first years at Stanford. But, for example I realized I had to learn about control theory. I went and sat in some courses, but it was easier for me to learn one-on-one. I began to find that graduate students would teach me one-on-one, and then in a few years, we began to follow the pattern that I guess I learned from my solid-state colleagues at Stanford. They often had, as experimentalists, teams of students working together. I began also to realize that it was better to work with students rather than in parallel, as I did until I became full professor. When there was a team of students, and we all worked together in a big area, we got a lot more done, and over time we were able to build up a body of knowledge in different problem areas. |

− | + | At Stanford there were a lot of very good students and I’m sort of a sucker for good students. I needed to find funding and problems for them. That was one of the driving forces that inspired our coming up with research topics that had some relevance in the outside world, so we could get funding. For example, there was funding for VLSI and SDI. Our ideas seemed relevant, so we got money for them. There never was enough money to cover expenses, so it was always a scramble. The 1980s were a very busy time. At one point, I had a group of twenty-four people: two secretaries, fifteen students, four or five post-docs, visitors, et cetera. Also, our own computer network, named after the Little Rascals! We got lots of things done. In fact, some companies came out of it. The area we were working on, sensor array processing, has become hot again. As we were winding down on the SDI side, we began to develop its potential for cellular and mobile communications. | |

− | |||

− | At Stanford there were a lot of very good students and I’m sort of a sucker for good students. I needed to find funding and problems for them. That was one of the driving forces that inspired our coming up with research topics that had some relevance in the outside world, so we could get funding. For example, there was funding for VLSI and SDI. Our ideas seemed relevant, so we got money for them. There never was enough money to cover expenses, so it was always a scramble. The 1980s were a very busy time. At one point, I had a group of twenty-four people: two secretaries, fifteen students, four or five post-docs, visitors, et cetera. Also, our own computer network, named after the Little Rascals! We got lots of things done. In fact, some companies came out of it. The area we were working on, sensor array processing, has become hot again. As we were winding down on the SDI side, we began to develop its potential for cellular and mobile communications. | ||

==== Commercial applications ==== | ==== Commercial applications ==== | ||

− | ''' | + | '''Goldstein:''' |

− | This is interesting. I’m always interested in cases where the academic research spawns a commercial side. These were companies that came out of the antenna array work? | + | This is interesting. I’m always interested in cases where the academic research spawns a commercial side. These were companies that came out of the antenna array work? |

− | + | '''Kailath:''' | |

− | Yes. There was one of them, which was called Array Com. One of my students who was here for many years, Dick Roy, started that along with another associate, Pailraj. I had actually started a company in a control-related field many years earlier that was fairly successful. It's now a public company, Integrated Systems, Inc., doing software for embedded systems. | + | Yes. There was one of them, which was called Array Com. One of my students who was here for many years, Dick Roy, started that along with another associate, Pailraj. I had actually started a company in a control-related field many years earlier that was fairly successful. It's now a public company, Integrated Systems, Inc., doing software for embedded systems. |

− | + | That took us to about the end of the 1980s. Then, via a circuitous route, DARPA asked us to study for the first time the application of advanced control and signal processing techniques in manufacturing. Manufacturing tends to be sophisticated, but empirical, whereas a model-based approach is more mathematical. A model-based approach means you make a mathematical model of some phenomenon. It is always inaccurate because the world is very complicated, but the models have to be simple. The power of mathematics and statistics is that even simple models allow you to predict things, and you can correct deficiencies in the model by using feedback, which is another key principle of control engineering. We successfully applied that way of thinking to [[Semiconductors|semiconductor]] manufacturing. | |

− | + | '''Goldstein:''' | |

− | + | When did DARPA come to you? | |

− | + | '''Kailath:''' | |

− | + | In 1989. | |

− | + | '''Goldstein:''' | |

− | + | Let me look back for just a second. You said that at the beginning of the 1980s, Stanford had this VLSI laboratory, and you had gotten involved with them in a signal processing project. Can you tell me how that worked? | |

− | + | '''Kailath:''' | |

− | + | I was one of five faculty, one from computers, and three from [[Semiconductors|semiconductors]]: Linvill, [[James D. Meindl|Meindl]], and [[Oral-History:James Gibbons|Gibbons]]. Their interest was largely in technology, and it was very expensive to fabricate these VLSI circuits. The issue was, what do you do with all this computer power? What you can do is signal and image processing. So, they called this new laboratory, built with a lot of industry support, the Center for Integrated Systems. I believed in the concept, and we started a successful research group. In fact, one of my colleagues who joined our first this project in this area has already been involved in founding three companies related to VLSI. A couple of my students later formed a company, Silicon Design Experts, Inc., which they have just sold, which came out of our theoretical work. But my early interactions with the CIS, were very minimal at that time. The people there were more interested in the technology, and we were more interested inthe theory. But then when DARPA asked us to work on control and signal processing in manufacturing, any manufacturing, the CIS was the obvious place. They already had a project with TI, which we got involved with, and we worked quite closely with them. So about ten years after my work on the start of the CIS, we began to do some work with them. | |

− | + | '''Goldstein:''' | |

− | + | I see. Do you know how the work that they did at CIS has spun out into the commercial sector? | |

− | + | '''Kailath:''' | |

− | + | It happened in many ways. There is a company that was in the news a couple of weeks ago that came out of CIS. They have a very high-speed architecture for chips. Intel has just licensed it, and is going to pay them a royalty of one percent for every chip they sell. So, that’s pretty high impact. But that's only one of several others. | |

− | + | '''Goldstein:''' | |

− | + | Well, that’s really what I’m getting at. Is the general model that the development might have been patented by the faculty member, and then perhaps that faculty member then has the choice to either commercialize it him or herself, or license it? | |

− | + | '''Kailath:''' | |

− | + | Yes. Stanford has a mechanism for this. Some of these companies are just people going out, and it’s not necessarily patents. We have a few patents, one or two, which are licensed. But Stanford has an Office of Technology Licensing. It was a pioneering office, actually, that took these patents and attempted to interest outside companies in purchasing them. After a few years when more faculty got interested in starting companies, they would license the inventions back to the faculty companies, and under reasonably generous terms. But the revenues from the licenses are shared by the University, by the Department, and by the investigator. | |

− | + | '''Goldstein:''' | |

− | + | So, how does it benefit TI to help to sponsor the Center? | |

− | + | '''Kailath:''' | |

− | + | That’s a very touchy issue, because “Intellectual Property Rights” is a really big issue. I don’t know the details of this, but I think Stanford owns the rights to everything that Stanford faculty and students do. The main benefit that companies get is that they can have their own people in the Center. So they have more knowledge of what’s going on, and will have a few months’ lead over somebody else because of their involvement in the Center. CIS has annual review visits back and forth. | |

− | + | '''Goldstein:''' | |

− | + | You said that you became involved in signal processing maybe towards the end of the 1970s. | |

− | + | '''Kailath:''' | |

− | + | Yes. | |

− | + | ==== Displacement theory ==== | |

− | + | '''Goldstein:''' | |

− | ''' | ||

− | Can you tell me the state of the art then? What were the interesting theoretical frontiers or applications? How has it changed since that time? | + | Can you tell me the state of the art then? What were the interesting theoretical frontiers or applications? How has it changed since that time? |

− | + | '''Kailath:''' | |

− | Let me first make a separate remark on the pace of technology that I’ve made on other occasions. When I came to this country in 1957, there were no jet planes and it was cheaper to come by sea. So we took a ship to England. We had to spend a few days in London and then sail to New York. I visited Imperial College because a senior classmate of mine was there. I met Professor Denis Gabor. This was before he got a Nobel Prize for inventing holography, but he was a famous guy there. He was working on adaptive filtering, and very proud of it. People were much more accessible in those days. He showed me his scheme for it, which included three or four racks of equipment. He said, “It looks like a lot, but it’s just strings and sealing wax. We don’t have as much money as the Americans. Norbert Wiener and MIT are doing much more fancy things. But look, I’m making these things work.” Then the technology moved from racks to boxes. Then from boxes to cards. Now to a collection of chips on a board and perhaps even to a single chip. It’s amazing. All in less than forty years. It's quite an evolution, and it’s accelerating. | + | Let me first make a separate remark on the pace of technology that I’ve made on other occasions. When I came to this country in 1957, there were no jet planes and it was cheaper to come by sea. So we took a ship to England. We had to spend a few days in London and then sail to New York. I visited Imperial College because a senior classmate of mine was there. I met Professor [[Dennis Gabor|Denis Gabor]]. This was before he got a [[Nobel Prize|Nobel Prize]] for inventing holography, but he was a famous guy there. He was working on adaptive filtering, and very proud of it. People were much more accessible in those days. He showed me his scheme for it, which included three or four racks of equipment. He said, “It looks like a lot, but it’s just strings and sealing wax. We don’t have as much money as the Americans. [[Norbert Wiener]] and MIT are doing much more fancy things. But look, I’m making these things work.” Then the technology moved from racks to boxes. Then from boxes to cards. Now to a collection of chips |

+ | on a board and perhaps even to a single chip. It’s amazing. All in less than forty years. It's quite an evolution, and it’s accelerating. | ||

− | + | '''Goldstein:''' | |

− | That tracks along with the development of manufacturing technology, or solid-state technology. It’s independent from the signal processing technology. | + | That tracks along with the development of manufacturing technology, or solid-state technology. It’s independent from the signal processing technology. |

− | + | '''Kailath:''' | |

− | That’s right. But the biggest uses of these chips are going to be for signal processing, and image processing in particular, because of the massive amounts of data that have to be crunched. There’s another side. There are databases to be organized, which means more software development. However, to finally return to your original question. | + | That’s right. But the biggest uses of these chips are going to be for signal processing, and image processing in particular, because of the massive amounts of data that have to be crunched. There’s another side. There are databases to be organized, which means more software development. However, to finally return to your original question. |

− | < | + | My first paper in the <i>Transactions on Signal Processing</i> was in October 1977 on LPC, linear prediction. Some of our mathematics was related to that. There was an unsolved problem related to LPC. Atal and [[Oral-History:Manfred Schroeder|Schroeder]] posed the problem so as to introduce Toeplitz equations, which are a particular kind of structure in matrix equations; using this structure helped speed up the solution. But you could get a better physical model by using certain non-Toeplitz equations. People didn’t use the better model, because they thought these non-Toeplitz equations things don’t have the particular nice structure, and so you can’t get the nice implementation of LPC. However, based on some of our math, we said, “No, this other method, the so-called covariance method, has structure too, except it’s not so explicit.” |

− | + | '''Goldstein:''' | |

− | + | The covariance method is one of these non-Toeplitz types? | |

− | + | '''Kailath:''' | |

− | + | It has structures that are not explicit. In fact, we slowly invented a name for it -it’s called displacement theory. I published it first in 1979 in a mathematics journal, but I had been working on the idea for about ten years. Anyway, the October 1977 paper was the first paper that showed that this apparently non-structured matrix could also be solved in a fast way. | |

− | + | ==== Speech signal processing ==== | |

− | + | '''Goldstein:''' | |

− | ''' | ||

− | Could you give me an example of what system a non-structured matrix can model better? | + | Could you give me an example of what system a non-structured matrix can model better? |

− | + | '''Kailath:''' | |

− | Speech, for example. It turns out that in speech signal processing, the boundaries are a headache, because you only observe things over a finite interval. To make the mathematics come out right, i.e. simple, you can’t have that sort of discontinuity, so people artificially extended the data out beyond the boundaries. That was done in order to get these structured matrices. They knew this was an artificial extension and that the facts were being distorted. So, what could you do if you just took the data that you had and didn’t prejudice it with assumptions about the missing data. The new model has structure too, but it’s a less simple or less explicit structure than with the first more artificial assumptions. The new model is closer to the physical situation. | + | Speech, for example. It turns out that in speech signal processing, the boundaries are a headache, because you only observe things over a finite interval. To make the mathematics come out right, i.e. simple, you can’t have that sort of discontinuity, so people artificially extended the data out beyond the boundaries. That was done in order to get these structured matrices. They knew this was an artificial extension and that the facts were being distorted. So, what could you do if you just took the data that you had and didn’t prejudice it with assumptions about the missing data. The new model has structure too, but it’s a less simple or less explicit structure than with the first more artificial assumptions. The new model is closer to the physical situation. |

− | + | This might come up in extending LPC coding to be useful for non-nasal sounds. If you know the terminology, the systems in LPC have only poles and no zeros. Zeros arise when you have two interfering signals. For example, air through the nose and through the mouth interfere and make nulls in the spectrum. LPC can’t model that so well. So our analysis could enable better models of nasal sounds, perhaps. This is one potential application, though I don't know if it has been seriously pursued. | |

− | + | In the mid-1970s, I wasn’t that oriented to the actual applications. We were interested in the math. The next papers in the Transactions were in 1981 and 1982, on adaptive filtering. Then soon after that I began to publish papers on VLSI, and on the antenna array processing community, with special Technical Activity Committees for them. We were the leaders in both those areas. | |

− | + | ==== Adaptive filtering ==== | |

− | + | '''Goldstein:''' | |

− | + | Can you give me some of the background on adaptive filtering? You said that you first encountered it I guess in 1956, at Imperial College. | |

− | |||

− | + | '''Kailath:''' | |

− | + | Yes, as I said, in [[Dennis Gabor|Gabor's]] laboratory there. He mentioned Wiener at MIT. I went to MIT, but I wasn’t interested in that field. One of the big names in adaptive filtering was my colleague [[Bernard Widrow|Bernie Widrow]]. He invented a famous algorithm called the LMS algorithm. He had studied and worked at MIT, but was just leaving MIT when I first met him. He came to Stanford and he invented this LMS adaptive algorithm, which is a very simple way of massaging data. He will tell you more about it. It allows you to deal with data as it comes, without having a lot of models for it. It’s very successful and widely used. It has a certain mathematical structure to it. Professor Widrow is less of a mathematician, much more of an engineer, so he didn’t pursue those directions. But some of the math that we got into through LPC was related, because similar kinds of equations are solved in adaptive filtering. I said to a visitor I had here, Professor [[Raj Reddy|Reddy]] from India, “You know, there’s all this talk about adaptive filtering. Why don’t you see what these guys are doing there, and see whether our techniques can help?” Reddy didn’t know much about adaptive filtering either, but he started to look into it. We had some discussions, and we found that in fact our techniques were applicable. My intuition was right. Our ideas could be used to speed-up adaptive filtering algorithms. | |

− | + | We worked on what is called the RLS, Recursive Least Squares algorithm. It’s a competitor to LMS, except that it is a little more complicated. The LMS algorithm is a very simple algorithm. However, there is a price for this. Suppose it’s used to identify an unknown system. LMS takes a long time to identify that system, it converges slowly. But it had been known from the time of [[Carl Friedrich Gauss|Gauss]] that there’s a much faster way using what is called Recursive Least Squares. It required a greater complexity. If you had n pieces of data, Widrow's algorithm used about 2n operations, whereas this used n squared. Well, when we came into this field, we showed that you could implement the least squares with, say, seven to ten n rather than n squared. That’s a big difference. Of course, it’s still not 2n, but it’s in the ballpark. | |

− | + | Just two or three years ago, another direction of work that we had gotten into is called H Infinity Theory. It came from the control field, and has led us to do the following result on the LMS algorithm. That algorithm is very widely used, but it’s regarded as sub-optimum. It’s not the best thing to do. The Least Squares algorithm is technically the better algorithm. The reason is that, following Norbert Wiener, we start by setting up a performance criterion and minimizing that. The thing that minimizes the criterion is the optimal solution. Well, Widrow’s algorithm is sub-optimal for the Least Squares criterion. But it doesn’t matter; everyone uses it anyway. Now as I mentioned, we began recently to study a different criterion called the H Infinity Criterion. One of my students, Babak Hassibi, found that Widrow’s algorithm actually is optimal, but not for the Least Squares criterion; for that it is only approximate. But it is optimal for the H Infinity criterion. The H Infinity criterion says, “Suppose you nknew nothing about the disturbances, and in fact that nature was malicious in picking the disturbances, and that you had to protect yourself despite nature trying its best to outwit you, what algorithm would you use?” Well, that’s Widrow’s algorithm. So, it’s pessimistic sometimes, because if you know more, of course, you can do more. But it is very robust, which is also the reason why it has been so successfully used. | |

− | + | '''Goldstein:''' | |

− | + | Whereas the Least Squares is optimal if the noise is random? | |

− | + | '''Kailath:''' | |

− | + | Yes, if you have more statistical knowledge. If it’s random and you know a little bit of the statistics, and so on. The noise can be anything. You’ve heard the buzz word Gaussian? But what if the noise is not Gaussian or white, or even random, but deliberately chosen to frustrate you? An example is interference, like another signal from someone else's conversation. If you want to combat that kind of situation, it turns out that LMS is optimal. So, the theory has finally caught up with Widrow’s algorithm! That happens a lot with theories! We’ve kept working on adaptive filtering on and off. One of my students, [[John M. Cioffi|John Cioffi]], is now on the faculty, and he did some more on adaptive filtering. In fact, he has a company called Amati that didn't use exactly those ideas, but it grew out of them. Amati developed the ADSL, or Asymmetric Digital Subscriber Loop technology. It sends data at several megabytes per second over ordinary phone lines, but only in one direction. In other words, a movie company can send you their movie very fast. However, it's asymmetric. You cannot send your stuff back to them very fast. But all you’re sending them is a credit card number or something, so speed is not important. But the point is that you don’t need fiber. You can use ordinary copper lines. | |

− | + | '''Goldstein:''' | |

− | + | And, it’s coming out of this? | |

− | + | '''Kailath:''' | |

− | + | It’s not directly related, but some of it is. John Cioffi did a thesis on adaptive filtering and communications, and then he went on into ADSL. So many, many things get intertwined. | |

− | + | '''Goldstein:''' | |

− | + | I’m wondering why Widrow’s algorithm was used if it was sub-optimal, before people realized it was optimal under the H-infinity. | |

− | + | '''Kailath:''' | |

− | + | But that was only recently. Unconsciously, it was used for two reasons. One, it was a very simple algorithm, so it was easier to implement. That has a big appeal, because the solution is simpler and so already more robust. It is always true of model-based methods that they work very well when the model is reasonably accurate. But, they can be disastrous otherwise. I think people found that Widrow’s was simple, and seemed to work better in a larger variety of situations. The big success of the model-based method is the [[Rudolf E. Kalman|Kalman]] filter, which is largely used for aircraft and satellites. They obey Newton’s law, by and large, so you know a lot about those systems. The aerospace industry uses model-based algorithms and the Kalman filter, because they have models. But what’s a model for communications involving the ionosphere? Or the telephone line? It’s too complicated. So model-based techniques have not been widely used so far, but now results are emerging in that direction, too. | |

− | + | ==== Model-based problems ==== | |

− | + | '''Goldstein:''' | |

− | That’s interesting. What models do the aerospace industry use? | + | That’s interesting. What models do the aerospace industry use? |

− | + | '''Kailath:''' | |

− | The dynamics of the satellite or rocket or a ship or automobile. | + | The dynamics of the satellite or rocket or a ship or automobile. |

− | + | '''Goldstein:''' | |

− | So what problem are they solving when they have these models? | + | So what problem are they solving when they have these models? |

− | + | '''Kailath:''' | |

− | Guidance, for example. | + | Guidance, for example. |

− | + | '''Goldstein:''' | |

− | But that’s a signal processing problem? | + | But that’s a signal processing problem? |

− | + | '''Kailath:''' | |

− | I interpret signal processing more broadly, as including things like estimation and guidance For example, when a satellite is launched to the moon, a trajectory is chosen, but that’s based on some calculations with models of a spherical earth and a spherical moon, and knowing exactly the distance, being in free space, and of the satellite being spherical, et cetera. These are all approximations. The earth isn’t a sphere, and there’s atmospheric drag. So, if you just launched the satellite and never looked at it again, it would never hit the moon. But, here’s the control theory. Control works on feedback. You look at the ship's trajectory, and then you measure where the thing is. You use the difference between where it is actually measured to be and where you think it should be, and use a so-called feedback error signal by which you change the direction of the ship. But all you’re getting to do all this is some radar measurements of its position, and this is noisy. Because there are disturbances in the propagation part and in your receivers. So you’ve got to extract the information about where the ship actually is, that's the estimation problem. | + | I interpret signal processing more broadly, as including things like estimation and guidance. For example, when a satellite is launched to the moon, a trajectory is chosen, but that’s based on some calculations with models of a spherical earth and a spherical moon, and knowing exactly the distance, being in free space, and of the satellite being spherical, et cetera. These are all approximations. The earth isn’t a sphere, and there’s atmospheric drag. So, if you just launched the satellite and never looked at it again, it would never hit the moon. But, here’s the control theory. Control works on feedback. You look at the ship's trajectory, and then you measure where the thing is. You use the difference between where it is actually measured to be and where you think it should be, and use a so-called feedback error signal by which you change the direction of the ship. But all you’re getting to do all this is some radar measurements of its position, and this is noisy. Because there are disturbances in the propagation part and in your receivers. So you’ve got to extract the information about where the ship actually is, that's the estimation problem. |

− | + | '''Goldstein:''' | |

− | I see. Well, the thing about it being noisy, that’s a communications problem, right? | + | I see. Well, the thing about it being noisy, that’s a communications problem, right? |

− | + | '''Kailath:''' | |

− | No, not necessarily. Noise arises everywhere. The origins of estimation theory, at least for engineers, go back to a military application, anti-aircraft fire control. Take, for example, an enemy aircraft. If you aim your gun where you see it, it’s gone by the time the shell reaches that point. So you’ve got to figure out where the plane is going to be at some time in the future based on the speed of your rocket and the speed of the aircraft. So you’ve got to estimate, or predict, the future position of the aircraft, given noisy measurements of the past trajectory. I call that signal processing, because you are extracting information from a random signal. It’s not a deterministic signal. It’s not music, or speech, thought they can usefully be modeled as random, too! That's information theory for you! It’s a random signal. You’re extracting information by processing the measurements, which are related to the useful signals. I say that modern signal processing started with Wiener actually. He formulated the problem of tracking aircraft in a statistical way. | + | No, not necessarily. Noise arises everywhere. The origins of estimation theory, at least for engineers, go back to a military application, anti-aircraft fire control. Take, for example, an enemy aircraft. If you aim your gun where you see it, it’s gone by the time the shell reaches that point. So you’ve got to figure out where the plane is going to be at some time in the future based on the speed of your rocket and the speed of the aircraft. So you’ve got to estimate, or predict, the future position of the aircraft, given noisy measurements of the past trajectory. I call that signal processing, because you are extracting information from a random signal. It’s not a deterministic signal. It’s not music, or speech, thought they can usefully be modeled as random, too! That's information theory for you! It’s a random signal. You’re extracting information by processing the measurements, which are related to the useful signals. I say that modern signal processing started with Wiener actually. He formulated the problem of tracking aircraft in a statistical way. |

− | + | But we can encounter such problems everywhere. Here is an example of signal processing in our semiconductor work. We heat a wafer to 1,000 degrees very fast in a chamber with hot halogen lamps. How do you measure the temperature? You can’t touch the wafer with a probe, because it pollutes the wafer. You’ve got to be indirect. So what you say is there’s radiation coming off the wafer. If you can count the number of photons, then that’s related to the temperature by Planck's law, which says how many photons are emitted at a given temperature. However, the number of photons is random, so you get a count that’s random. You’re interested in something else, which is the temperature of that wafer as it changes in form. You’ve got to infer knowledge about one signal, which is not observable, from another signal, which is observable. Both of these are random (in the latter case, because of the errors in measurement), but there’s some dependence between them, some statistical dependence. That’s signal processing in its fundamental or generic form. It’s extracting information from signals for other purposes. Communications uses signal processing. Control uses signal processing. Some people call it estimation theory. I interpret signal processing quite broadly as processing, working with data. | |

− | + | It could also be binary data as in algebraic decoding theory. We used some of our signal processing work to decode in a new way, what are called Reed Solomon Codes, which were invented in the early days of information theory at MIT around the time I was a student there. It took about thirty years to get them implemented, but every compact disk has them. But here you’re working with ones and zeros. However, much of the mathematics is similar to real numbers. So, I call that signal processing, too. Fourier transforms can be used for such problems too, and in a very nice way, actually. This was shown by [[Richard E. Blahut|Dick Blahut]] of IBM, and others. | |

− | + | '''Goldstein:''' | |

− | + | You suggested a very interesting perspective before, about dividing up signal processing problems into model-based problems and ones that aren’t. Can you talk about the history of signal processing from that perspective? | |

− | + | '''Kailath:''' | |

− | + | Yes. I’m not a scholar of these things, but let me give you my perspective. You see, some of the early signal processing problems and where Wiener got interested was with the subject called time series analysis, which is non-parametric signal processing. An early example was sunspot data. That’s somewhat periodic; every eleven years or so there’s an eruption of solar activity. A man called Yule made a statistical model for trying to fit the sunspot data and predict when the next cycle will be. These are called auto-regressive models. | |

− | + | '''Goldstein:''' | |

− | + | When was that? | |

− | + | '''Kailath:''' | |

− | + | In the 1920s. Yule's work was one of the things that spurred an increase of activity in the analysis of random phenomenon. How do you study them? You study them through their statistics. But a useful statistic, or a useful function is the so-called power spectrum. The spectrum is the Fourier transform of the process, and the power spectrum is basically the square of it. But, for random processes, it’s a little fancier, and you’ve got to be careful in how you define it. You can’t just take the Fourier transform and square it, because the result fluctuates wildly with the samples you take. Well, there’s a theorem from the 1930s, known as the Wiener/Khinchin theorem, which says that the power spectrum can be found as the Fourier transform of the covariance or correlation function. But there is a fascinating story here. | |

− | + | In 1914, a meteorologist friend of Einstein’s asked him a question along the following lines, “I have this weather data, and I want to detect some periodicites in it.” Einstein wrote a short, two-page paper, giving some suggestions on how to process this random data. It’s a remarkable note. It has many ideas in it, including the famous Wiener/Khinchin theorem, which was first presented in the 1930s. Einstein didn’t even know all the math required to prove this theorem, because random processes hadn’t been formally defined in 1914. Einstein somehow got the key idea, as a way of computing the power spectrum. | |

− | + | I used to visit the Soviet Union frequently to attend the conferences on Information Theory, and during one of my visits I was told that someone had discovered this paper of Einstein’s and it actually had the Wiener/Khinchin theorem in it. Later, there was a whole Signal Processing magazine issue devoted to a reprint and discussion of Einstein’s letter. That sort of signal processing and the emphasis on power spectra was the initial definition or emphasis of the field of signal processing. Power spectra were being computed in lots of fields, and then came the FFT. I guess you’ve heard all the stories about the origin of the FFT, so I won’t go into it here. | |

− | + | However, the fact is that the rediscovery (it goes back to Gauss) of the FFT in 1965 gave a new impulse to the signal processing field. | |

− | + | A group at MIT led by [[Oral-History:Alan Oppenheim|Al Oppenheim]] and at Lincoln Lab by [[Oral-History:Ben Gold|Ben Gold]] and [[Oral-History:Charles Rader|Charlie Rader]] picked up the FFT and combined it with the ideas of sampled-data or discrete-time systems and defined this as Digital Filtering and Digital Signal Processing. Those activities I would call non-parametric signal processing, in contrast to parametric or model-based signal processing. | |

− | + | Yule’s work in 1927 was, in fact, model-based, because he fitted a model to the sunspot data; he just didn’t take the Fourier transform. He tried to fit a simple (all-pole) model with a smaller number of parameters than the amount of data. LPC was based on the same idea. But I think the systematic and widespread use of model-based ideas was in the controls area, beginning with the work of Bellmann, and especially [[Rudolf E. Kalman|R.E. Kalman]], in the late 1950s. | |

− | + | A lot of the “classical” signal processing doesn’t deal with random things, except fpr (nonparametric) calculation of the power spectrum. The name "statistical signal processing" emerged because there’s more to the signal processing problem than just estimating signals as such. Take [[Radar|radar]], for example. A lot of signal processing arises there too. In radar, the question is whether the received data shows if a target is present. Your answer is binary, yes or no. But you process analog data to make this decision. So, we include the study of the radar problem in the field of statistical signal processing. But the newer processing people don’t see it that way. I should emphasize that statistical, model-based signal processing is not necessarily only linear. The antenna array work I mentioned earlier is model-based with non-linear models. The design of special purpose VLSI digital filters is another area which brings in tools from graph theory to solve circuit optimization problems. Nowadays there are many people working on these things. For example, Jerry Mendel at USC got interested in this type of signal processing. He was a control person, and he’s moved to signal processing. Alan Wilsky at MIT is another wonderful example. And there are many others. | |

− | + | '''Goldstein:''' | |

− | + | What is the impact of the model-based approach? Does it open up new problems that might not have been solvable? Or does it provide an alternative approach to an existing set of problems? | |

− | + | '''Kailath:''' | |

− | + | The answer to the latter two questions is yes to both of them. As for the first, it’s not easy to always quantify Certainly state-space estimation is very widely used. However, one should note that very few theories get used in a pure form in practice, because the theory deals with an ideal world and the real world is different. However, the theory is the foundation for realistic real-world solutions. The fields of radar, sonar, wireless communications, image compression, et cetera, are examples of where this transfer of theoretical results to practice is accelerating. | |

− | + | Here is a recent example from our work in [[Semiconductors|semiconductor]] manufacturing where we are also doing signal processing. It’s in the design of phase shifting masks. We have analytical model-based methods for doing that. Industry is just beginning to use these ideas to extend the useful life of the very expensive (approximately one billion) microlethographic equipment and to break the so-called 0.1-micron barrier. My students have just started a company to translate the theory to useful software. I think the industry is showing a reasonable amount of interest in it, because things that would take them a day or two to do, we can do in minutes. | |

− | + | '''Goldstein:''' | |

− | + | You mean in terms of controlling the manufacturing process? | |

− | + | '''Kailath:''' | |

− | + | No, making a design to achieve a certain goal. Here’s an example. You have a wafer on which you want to imprint a certain pattern of connections of wires and transistors and all that. But for that, you put photo resistor on the wafer and you shine light through a mask, and expose a desired pattern on the wafer. Then you etch away the exposed photoresist. But, suppose you want a certain pattern on the wafer. If you had a perfect source, a pure sine wave of light at one frequency, and a perfect lens to exactly focus your light, then you could just build a mask that was the same as the pattern on your wafer, shine the light through it, and you’d have the same pattern on the wafer. But the light is not perfect and the lens is not perfect. If you make a mask with a certain pattern of lines, what you’ll get at the wafer, because of diffraction and other imperfections, is another pattern. So two lines may be touching because of the defects of the optics, whereas you want them separate. To avoid this, you have to move them sufficiently apart, whereas you want to get them even closer, e.g. from the 0.25 mm we have now to say below 0.10 mm. To do this, you have to solve an inverse problem. You want a certain intensity pattern at the wafer. You’ve got to design something that has intensity and phase, which is the mask, to be different from the intensity pattern that you want on the wafer, so that when the light goes through the imperfect optical system and then through your distorted mask, it gives the correct pattern at the wafer. Well, that’s all signal processing. | |

− | + | '''Goldstein:''' | |

− | + | Right. You can think of the imperfections as a filter, and so what you have to do is work backwards and come up with a signal that accounts for the imperfections. | |

− | + | '''Kailath:''' | |

− | + | Right. The point is that this is a difficult nonlinear inverse problem. We have some approximate analytical solutions to it, whereas what’s done in industry has a little theory and a whole lot of trial and error. But model-based signal processing allows us to get very fast solutions. | |

− | + | '''Goldstein:''' | |

− | + | In tracking the rise of model-based signal processing, you’re saying it’s the mid-’70s? | |

− | + | '''Kailath:''' | |

− | + | Yes, in the sense that more papers (especially on array processing) began to appear in the traditional signal processing journals, especially the <i>IEEE Transactions</i>. But as I said earlier, the actual origins could be traced back to Norbert Wiener’s work in the 1940s, to the WWII work in [[Radar|radar]], et cetera. | |

− | + | '''Goldstein:''' | |

− | + | Let me go back to the [[Milestones:Speak & Spell, the First Use of a Digital Signal Processing IC for Speech Generation, 1978|Speak & Spell]] toy, which I understand had a big impact. | |

− | + | ==== The Speak & Spell toy ==== | |

− | + | '''Kailath:''' | |

− | + | Yes. But, you know, that’s trivial stuff now, compared to the things that are happening, and that are going to be happening. | |

− | + | '''Goldstein:''' | |

− | + | What was so important about its impact? Did it help to close the gap between what was understood at the university level and what you actually saw on shelves? | |

− | + | '''Kailath:''' | |

− | + | I think some other people like [[Oral-History:Ben Gold|Ben Gold]] can tell you much better. But I think the point was processing speech and synthesizing speech was very, very expensive. The [[Vocoders and Voders|vocoder]] was studied by the people at [[Bell Labs|Bell Labs]]; a lot of research was done on it and on related problems. But you needed very expensive systems to work with speech. Now TI used to have (and perhaps still does) an innovation fund, and you could apply for $25,000 to do special projects. With early [[Integrated Circuits|ICs]] and with the theoretical idea of LPC, some TI engineers suggested that you could synthesize a lot of different speech sounds by using a certain structure with ten to twelve numbers that you could set differently for different sounds. These are the so-called reflection coefficients of a lattice filter structure. So they were able to synthesize decent quality speech very cheaply, at least for a child’s toy. But the impact was the discontinuity in the cost of these speech synthesis devices compared to what the existing technology was. It was done by taking a little theory, which the guys knew, and the technology that was available at the time, and mixing the two. So I would say that was an early success of model-based signal processing. | |

− | + | ==== Accuracy and theory in model-based research ==== | |

− | + | '''Goldstein:''' | |

− | + | The thing about the model-based approach is that you need accurate models. Has that been a problem? | |

− | + | '''Kailath:''' | |

− | What was so | + | Yes, this has always been a barrier. A very wide range of tools has been used in model-based signal processing, as a glance at the last 15-20 years of the literature (<i>IEEE Transactions</i>) will show. The other issue is getting good models. What is good is hard to define and identifying good models from data is not necessarily easy. However, the point is that “simple” models often work remarkably well, especially when statistics (noise) is used to cover up high frequency details. It has been proved countless times over (in physics, chemistry, and engineering) that very simple models, which can be quite inaccurate, turn out to have tremendous predictive power. So you don’t really need very fancy models. Simple models will do. Often it’s not actually the model and the solution, it’s the ideas that you get from that way of analyzing a problem, which you later somehow approximate that gives you the final results. But it can take time to achieve all this. When I went to MIT in 1957 there was a big research project there that had I read about in <i>Popular Science</i> in 1949, or 1950. They wrote about how transmitted TV pictures but was a very wasteful process because almost every pixel is the same in nearby pictures. Why not just send the changes? The real information is not all the masses of dots that you’re getting, it’s much less than that. So Information Theory people said, “Yes, the so-called entropy (or information) content of the TV signal is much less than the nominal information content.” To analyze this you need the statistics of images of how people move about, et cetera. It was a big project at MIT, trying to get the statistics of TV signals, or the statistics of music, so that those statistics could be used to compress the data. They gave up by the mid-sixties or so because it was too complicated to do this. Gradually, however, the theory evolved and they found that you didn’t need such accurate models; you could adaptively improve them, so you could start with simple models. Now there are very effective techniques for compressing data, which are used in all computers. |

− | + | Every time you click on a file you use something called the [[Lempel-Ziv Compression Algorithm|Ziv-Lempel]] (or the LZ, or LMZ algorithm. Ziv was a classmate of mine at MIT. LZ a data compression algorithm that doesn’t need any statistical models of the data. However, you would never have thought of Ziv’s algorithm, if you didn’t start with a model-based approach, and based on knowing all the statistics. So, the point of model-based signal processing is that you may be rescued by further development of the theory. But the theory can’t go to the most general problem immediately. You must go through a sequence of steps, gradually reducing the amount of model information that is needed. | |

− | + | '''Goldstein:''' | |

− | + | Were there any physical systems that resisted modeling? You said before "the ionosphere, who can model that?" It sounds like now you’re saying that’s not really a problem. Because you create this simple model. | |

− | + | '''Kailath:''' | |

− | + | There are many processes in nature. There are models of the molecular level, which are very complex; they use all the material flows and connections. Such models can be very useful for simulating the processes. Then there are models for control which are much simpler. They are so-called black-box models. In other words, control engineers model aircraft using second and third order systems, while from a physical point of view an aircraft is a millionth order system. But in control one defines certain limited objectives, and to achieve those, simple models suffice. Of course, there are problems where modeling hasn’t reached an adequate level of sophistication and a lot of manufacturing processes are like that. But as we look further into them, we find that we can make useful simple models of some of these things. | |

− | + | The trouble is there are culture gaps between different people. We found, for example, that when we went to really work with the people in integrated circuits, their culture is very, very different. I have some slides that show that. On one it says "Electrical Engineering," and there’s a big line though the middle of it. On one side is the math- or systems-based approach, deals with circuits, signal processing, control theory and information theory. On the other physics-based side are lasers, integrated circuits, solid state theory, and so forth. We found, at least in integrated circuits manufacturing, they don’t seem to make much use of modeling. Physics is used to invent a processing device. Then they collect masses of data. They have what they call response surfaces and they figure out operative points by trial and error. I am, of course, grossly over-simplifying. But it takes them a few hours to get those settings, by trial and error, and you waste a lot of wafers in the process of getting to them. Whereas in that problem, we made some models and using them. We could get to these settings in ten or fifteen minutes with five wafers, as opposed to a hundred wafers and five hours. | |

− | + | '''Goldstein:''' | |

− | + | I wonder what people would say is the driving force behind creating these models? It sounds like it’s not quite something that would come out of the university or academics, because it’s very applied, very focused on specific problems. I don’t know if that’s an accurate characterization. | |

− | + | '''Kailath:''' | |

− | + | No, it’s not. I think what happens is that the work that is done in the universities just assumes models of different kinds, often without knowing how they arise from physical problems. In control theory, one has equations with certain matrices, ABC. For academics it's just letters that go in to these matrixes. In industry it has to be numbers. You’ve got to work hard to find these numbers. So a lot of model identification from data is actually done in industry. But, many of the theoretical techniques for doing that came from research in the universities. | |

− | + | '''Goldstein:''' | |

− | + | That’s why it’s not so strange you got into model identification. | |

− | + | '''Kailath:''' | |

− | + | Right. Now, we did some theory on that too. We used that theory for these semiconductor applications. We received an IEEE prize from the Transactions and Semiconductor Manufacturing, which is one of the prizes I am proudest of, because I’ve got all the awards from the [[IEEE Signal Processing Society History|Signal Processing Society]]. To have a different group recognize outsiders was important to me, particularly given that it was an applied paper. We used signal processing theory, and control theory, to solve their problem. And they appreciated it. | |

− | + | ==== Student interest in signal processing ==== | |

− | + | '''Goldstein:''' | |

− | + | Another question I wanted to get to had to do with your students. I’m hoping that it will be a measure of where signal processing is to evaluate what it is that the students know about and what they are interested in. Looking back to the '70s, when you first started taking students in this area, can you recall what their training was in, in signal processing, and what they wanted to work on? | |

− | + | '''Kailath:''' | |

− | + | Our lab was called the Information Systems Lab. And my colleagues called our students "ISL types." These are students whose interest is using mathematical techniques to solve electrical engineering problems. Though, as you know, electrical engineering is very broad. People do physiology and call it electrical engineering. Basically, we take ISL types and we look for a problem. | |

− | + | '''Goldstein:''' | |

− | + | You are reading from the RLE special issue, right? | |

− | + | '''Kailath:''' | |

− | + | Yes. I said there, “The process of research is always a challenge. For example, how does one enter a new field, such as VLSI design and semi-conductor manufacturing, from scratch? It’s certainly a test of what we profess to teach. It’s also challenging to take an inexperienced student and have him or her cross the threshold as an independent investigator in four or five years.” The students come as raw material. They are willing to work on anything. First of all, you must encourage that confidence, and say that you’ve got to learn a lot of new things, but of course you can never learn anything completely. You must learn the process of asking questions and doing your own investigations. I think ISL students have a certain inclination to doing this mathematically. That’s the difference from other branches of EE. Such students gravitate to this lab, and we take them and work on these different projects. They don't have any more particular knowledge. | |

− | + | I’m very proud of my students at Stanford. They are very good that way. I’ve had seventy Ph.D. students, and over thirty post-docs. This book, published on the occasion of my sixtieth birthday, has a list of them. You can look and see where they are, and what they are doing. That will give you an indication. At the beginning many of them went to universities, or to IBM or to Bell Labs. But now a lot of them are off in industries, and some in their own companies. | |

− | + | ==== Kalman filtering ==== | |

− | + | '''Goldstein:''' | |

− | + | The other thing I wanted to ask you about was the [[Rudolf E. Kalman|Kalman]] filtering, which you mentioned before, but I don’t have a good picture of when that enters the scene, and what impact it has, as it’s absorbed. | |

− | + | '''Kailath:''' | |

− | + | I’m a big student of Kalman filtering. That came on the scene in the 1960s. Theory and research lead you to explore certain problems. Kalman, as a student at MIT, began to use this different way of studying control systems, called state-space methods, which were opposed to the then universal frequency domain methods. However, state-space methods are not that new, because mathematicians had been using these techniques, and in fact, the Russian engineers had been using them. But Kalman’s mission was to systematically use state-space methods in many fields of engineering. | |

− | + | He told me, for example, that Wiener filtering, which is sort of non-parametric, had been filtering was running into certain roadblocks, and could not solve certain classes of problems. Kalman wondered what would happen if you formulated these problems in form. He got a nice result. The solution also had a state-space form, and so it met with the Kalman filter. He did this in 1958-59, with no application in mind. But 1958-59 was when [[Sputnik|Sputnik]] appeared and suddenly there arose a big interest in tracking spacecraft. They started with Wiener filtering techniques, which were appropriate for these problems, because they were non-stationary and the classical Wiener methods are not applicable. | |

− | + | So, lo and behold here was a theory which appeared on the scene just when some people needed it. Some people at the NASA Ames Lab in Mountain View had to solve the problem of tracking satellites. The Kalman theory came at the right time, and it was just right. For spacecraft, one can get reasonable models, inaccurate but adequate. | |

− | + | I got into Kalman filtering theory through feedback communications. In such problems, one needs recursive updating of estimates. For example, in satellite tracking, California sees a satellite for ninety minutes, then nobody sees it until it crosses somewhere in Australia, and then you get another ninety minutes, and then someone in South Africa gets another ninety minutes. So you’re constantly updating your location estimates. After a few passes around the globe they know pretty accurately where it is. These are the techniques of Kalman filtering theory. But, its main contribution on the theory side is that it introduces state-space structure into estimation theory. Structure is the most important thing in mathematics. Mathematics is the study of structure. Kalman filtering is a great example of what you can do when you use state-space structure. For example, Kalman showed that not only could he solve estimation problems nicely, he could also solve certain circuit problems. We find we can solve certain signal processing problems. I have introduced a different kind of structure called Displacement Structure, which can co-exist with state-space structure. | |

− | + | ==== Displacement structure ==== | |

− | + | '''Goldstein:''' | |

− | + | You had just mentioned your work with displacement structure. Could you tell me about that? How did that get started? | |

− | + | '''Kailath:''' | |

− | + | Yes. I’ve been saying from time to time that there is non-model-based signal processing, which is largely the application of Fourier theory. Then there is what I call model-based, which is largely the state-space and Kalman type theory. The displacement structure fits in-between. It’s related to the LPC kinds of structure. It came up from two different points of view. One was studying certain nonlinear Riccati differential equations, which arose in the Kalman filter theory. But, those differential equations are related to some linear equations called Wiener/Hoff equations, which I knew from Wiener’s theory of prediction. They are related. The Wiener/Hopf equation itself came up in astrophysics. So by putting some of those things together, we isolated a concept in the Kalman filter of displacement structure, and then also for matrices, which came up in the LPC discussion as I said earlier. In LPC you have to solve equations involving a matrix with a particular kind of structure called a Toeplitz structure. | |

− | + | That structure occurs a lot, and when it’s present, you can use it to a great effect and simplify the solution. But suppose you take two structured matrices and multiply them together, it turns out they lose that structure. Now, you need to do such things in applications: you have to multiply matrices or you’ve got to take inverses of matrices and so on. Briefly, many things one does in the course of massaging so-called data destroy any original underlying structure. For example, the “covariance” method of speed processing, for which we found a fast algorithm in our first paper (in 1977) in the Signal Processing Transactions. The Toeplitz structure that people loved was destroyed in the covariance method. We pointed out that the product of two structured matrices may not look structured in original way. Certainly not arbitrary, it must have some structure. Another example is taking the inverse of a matrix, but after all, even though it looks unstructured when you take its inverse, it’s structured in some sense. And that sense is displacement structure. I’ve been working, in addition to several other things, on that theory, for a long time. In fact, I’m going to Israel next week to give a set of lectures to the math department on displacement structure theory. Displacement structure is something that’s grown, and we have found it a very powerful tool for speeding up algorithms in control, communications, signal processing, and mathematics. I like the interplay of these ideas from different fields, and I tend to work in many of these fields still, to varying degrees. It’s my secret weapon! | |

− | + | '''Goldstein:''' | |

− | + | The thing that is interesting is that while it’s certainly apparent that the interplay matters, it almost seems accidental. The way you’re describing some of these things, it’s almost accidental which ideas will be picked up, and cross-fertilized. | |

− | + | '''Kailath:''' | |

− | + | Yes, there is luck involved. In all research there is a lot of luck and accident. I feel I’ve been a very lucky person in many of these things. As I said, I never worried about tomorrow. But I’ve also not made discontinuous jumps. Some people live all their lives and do only non-parametric signal processing, and some people only do control, and they are quite happy. I feel differently; I welcome change. Some people may regard it as discontinuous, but I don’t. The control I did had connections with communications. The communications I did later had connections with control. And so on. So I regard all of them as having some threads in common. I’ve always enjoyed making connections between different fields. | |

− | + | === IEEE and publication === | |

− | + | '''Goldstein:''' | |

− | + | Does one need to be careful in choosing where to publish a paper, particularly since there is so much overlap in fields? | |

− | + | '''Kailath:''' | |

− | + | Yes. It can make a difference to the reception of a paper. I have tended to publish largely in IEEE journals. They are all carefully refereed, which is an advantage. It slows down the publication process, but everyone profits, both the writers and readers. Sometimes we have had papers whose impact may have been a little less, because, say, we published them in the [[IEEE Information Theory Society History|Information Theory]] transactions rather than in [[IEEE Communications Society History|Communications]] or in [[IEEE Signal Processing Society History|Signal Processing]]. But it’s hard to say. | |

− | + | As a former president of the [[IEEE Information Theory Society History|Information Theory Society]], and a member of the administrative committee of the [[IEEE Control Systems Society History|Control Society]], I’m well aware that the most rapid growth has been in [[IEEE Signal Processing Society History|Signal Processing]]. Today, [[IEEE Information Theory Society History|Information Theory]] is one of the smallest groups, with 5,000 members. In the mid-’70s, [[IEEE Signal Processing Society History|Signal Processing]] only had about 5,000 members. But now it has over 20,000. It has grown very rapidly. But I think you will find that there are people, you know, especially the model-based people, who belong to the [[IEEE Control Systems Society History|Control Society]] as well as the [[IEEE Information Theory Society History|Information Theory]] and the [[IEEE Signal Processing Society History|Signal Processing Societies]]. But, that number is small. Much smaller than 20,000. But, you know, we don’t want to spread the word too much, working across fields is our secret weapon. | |

− | + | '''Goldstein:''' | |

+ | Can you give some sense of what goes through your mind when you’re weighing publishing in the Transactions of the [[IEEE Signal Processing Society History|ignal Processing Society]], versus one of the others? | ||

− | + | '''Kailath:''' | |

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | + | I’m not sure that I can answer that. Largely I think it’s the potential audience. The Transactions is undoubtedly the most prestigious signal-processing journal. Some of the stuff we publish in math journals, because the subject is oriented to math. We try to write them more for mathematicians. The percentage breakdown of my papers has been about 15 percent each for Information Theory, Automatic Control, Signal Processing and Math. Then there are a few others, especially in recent years, as we have worked in [[Semiconductors|semiconductor]] manufacturing. | |

− | + | '''Goldstein:''' | |

− | + | Thank you very much. | |

− | + | [[Category:People and organizations|Kailath]] [[Category:Universities|Kailath]] [[Category:Signals|Kailath]] [[Category:Signal processing|Kailath]] [[Category:Corporations|Kailath]] [[Category:Digital signal processing|Kailath]] [[Category:Filters|Kailath]] [[Category:Digital filters|Kailath]] [[Category:Components, circuits, devices & systems|Kailath]] [[Category:Integrated circuits|Kailath]] [[Category:Large scale integration|Kailath]] [[Category:News|Kailath]] |

## Revision as of 15:29, 7 November 2013

## About Thomas Kailath

Thomas Kailath was born in Poona, India, in 1935. He received the BE degree in telecommunications engineering from the University of Poona in 1956 and an MS and Sc.D. degrees in EE from MIT in 1959 and 1961, respectively. He claims to be the first student from India to receive a doctorate in EE from MIT. His master's thesis explored "Sampling Models for Time-Variant Filters," and his doctoral dissertation "Communication via Randomly Varying Channels." From 1961 to 1962 he worked at JPL in Pasadena, California, and concurrently taught part-time at California Institute of Technology. From 1963 to the present he has been a professor in the EE department at Stanford University. He was involved in the development of Stanford's Information Systems Laboratory from 1971 to 1981. In 1981, Kailath became a co-founder of Integrated Systems, Inc., a small company specializing in the development and licensing of high-level CAE (Computer-Aided Engineering) software and hardware products for the analysis, design and implementation of control systems in a variety of applications.

His research has focused on statistical data processing in communications and control. Control theory was his original field, but it led him by the mid-seventies to work on (in today's terminology), model-based signal pressing, including applications such as the Linear Protective Coating (LPC), and Speak and Spell. During the 1980s, his research focused on the algorithms related to speech; VLSI, and antennae array processing. Among his awards are the Outstanding Paper Prize for 1965-1966 of IEEE Information Theory Group; the Outstanding Paper Prize for 1983 of IEEE Acoustics, Speech and Signal Processing Society; 1989 Technical Achievement Award of the IEEE Acoustics, Speech and Signal Processing Society; Society Award of the IEEE Signal Processing Society, May 1991; Received IEEE Circuits & Systems Society Education Award, May 1993; Education Medal of the IEEE, 1995.. He is a fellow of the IEEE (1970) [fellow award for "inspired teaching of and contributions to information, communication, and control theory"] and Institute of Mathematical Statistics (1975) and served as President of the IEEE Information Theory Group 1975.

The interview says little of Kailath’s early years or indeed of his schooling. The interview primarily consists of Kailath discussing the various contributions he and his peers have made to the signal processing field. In addition, Kailath presents a relatively detailed historical perspective of the field of signal processing in general. His research has focused on statistical data processing in communications and control. However, more recently, he has worked in model-based signal pressing, including applications such as the Linear Protective Coating (LPC), and Texas Instrument’s “Speak and Spell.” During the 1980s, his research focused on the algorithms related to speech; VLSI, and antennae array processing. Kailath focuses the interview on his perceptions of model and non-model based research, stating that the gap between theoretical and research work and industrial application is now closing at a faster rate than in the past.

## About the Interview

Thomas Kailath: An Interview Conducted by Andrew Goldstein, Center for the History of Electrical Engineering, 13 March 1997

Interview #328 for the Center for the History of Electrical Engineering, The Institute of Electrical and Electronics Engineers, Inc.

## Copyright Statement

This manuscript is being made available for research purposes only. All literary rights in the manuscript, including the right to publish, are reserved to the IEEE History Center. No part of the manuscript may be quoted for publication without the written permission of the Director of IEEE History Center.

Request for permission to quote for publication should be addressed to the IEEE History Center Oral History Program, 39 Union Street, New Brunswick, NJ 08901-8538 USA. It should include identification of the specific passages to be quoted, anticipated use of the passages, and identification of the user.

It is recommended that this oral history be cited as follows:

Thomas Kailath, an oral history conducted in 1997 by Andrew Goldstein, IEEE History Center, New Brunswick, NJ, USA.

## Interview

Interview: Thomas Kailath

Interviewer: Andrew Goldstein

Date: 13 March 1997

Place: Stanford Universit. Palo Alto, California

### Education

**Goldstein:**

Can you tell me something about your education? How did you become involved with signal processing technologies?

**Kailath:**

I was an undergraduate in India in telecommunications engineering. It was a pretty standard education not very mathematical, but I always had some mathematical interests. I learned about Information Theory from an article in *Popular Science Magazine*. It said something about Shannon and Wiener. We didn’t have Shannon’s book in our college library, but Wiener’s *Cybernetics* and his book on filtering theory were available when I was an undergraduate in the mid-'50s. That fascinated me. I especially enjoyed Wiener’s introductory chapters. That was my first exposure to the mathematical side of electrical engineering. I was very fortunate to have a good professor in India who encouraged me to go beyond the usual boundaries in academics and in my personal life.

A friend of my father’s, Dr. G. S. Krishnayya, pushed me to apply to study abroad. Given our economic status at that time, that was an inconceivable thing to do. Normally one just sought to work for the government for a moderate but secure income. When I graduated in 1956 as a radio engineer, I had a job offer to work for All India Radio. But by then I had already applied to Harvard and MIT, and fortunately I got offers from both. By then I think I’d already begun to read about information theory. I wrote to Shannon at Bell Labs, and he replied that he was going to be at MIT. So I went there. My interest in signal processing really began at MIT where I studied information theory, Wiener filtering, and communication through radio channels. I’ve always felt that my main interest was "signal processing," but I interpreted the term in a broader sense than I think the early IEEE Audio group initially had.

### Entry into signal processing

**Kailath:**

The Audio group really got a new lease on life with the re-discovery of the FFT in the mid-1960s, and the adoption by the MIT group of the FFT as a tool for signal analysis. Actually, that part of the subject began at MIT earlier with Bill Linvill and Sample-Data System Theory, which was the precursor to Digital Filtering theory. That subject began to take off after Jim Kaiser left MIT and went to Bell Labs.

I worked on statistical signals, which come up in Wiener’s theory of estimations and Shannon’s communication theory, both in analog (continuous time) and digital (discrete-time) form. My publications at first were mostly in signal detection theory and information theory and dealt largely with random processes. In the late 1960s, my work moved towards control theory, and dealt with what are called state-space models of systems. This was a very interesting development. Next, in the mid-1970s, a number of developments led me to work on some mathematical topics that turned out to be closely related to Linear Predictive Coding, which was used in TI's very successful Speak & Spell toy. So I began to publish in the Transactions on Signal Processing, first in 1977 and then more extensively after 1982.

The view we took of signal processing is what would today be called model-based signal processing. The other distinction is parametric signal processing versus non-parametric signal processing. The FFT-based approach was largely non-parametric. It took, for example, a thousand pieces of data, and just regarded it as a collection of numbers and processed it efficiently. I learned from Control Theory that often these thousand pieces were really determined by a smaller number, say ten, of so-called state variables whose evolution could be kept track of much more easily.

This model-based approach can be seen in many fields of science, but in control the systematic use and development of models was really the contribution of Rudy Kalman. I had gotten interested in the famous Kalman filter through my work in feedback communications, and I spent some time on it, so I was well-attuned to the power of that point of view. In the 1960s, I started writing a book on detection theory, but I never finished it because I got interested in control theory and state-space theory. I recently wrote a textbook on that subject.

**Goldstein:**

Is that the one on linear systems?

**Kailath:**

Yes. It has a very long preface, which discusses some of my educational philosophy. So, that was my entry into signal processing. I still publish in information theory and communications and control theory, but more and more, the large majority of my papers is in signal processing.

#### Control systems background

**Goldstein:**

Was your exposure to control systems unique? Were there other people in your position who saw signal processing problems in a similar way?

**Kailath:**

I cannot fully say. To some extent the LPC people, including Manfred Schroeder, Bishnu Atal, and Itakura were doing that. But they, or at least Itakura, were using a so-called lattice filter model with a special kind of parametrization. It was sort of in-between parametric and non-parametric. My students and I were among the early ones who introduced the state-space point of view, which is much more the model-based. The only other major name that comes to mind is Alan Willsky of MIT, but he's much younger than I am, and started a few years later.

**Goldstein:**

The way you told the story before made it sound like that development was due to the peculiar history of your training in control systems.

**Kailath:**

In the book it says exactly that. No, I wasn’t trained in control systems. In fact, at MIT I never took a course in control; MIT was the center of information theory in communications. One of the first Ph.D. theses I supervised at Stanford was in the area of feedback communications.

Very briefly, the idea is that information is sent to a satellite that collects data and sends it back. But a satellite has limited power, so it tends to be noisy data. However, you have the possibility of having a lot of power on the ground, so you can send back clean questions to the satellite. You can say, for example, “This data was poor. Re-send it.” This leads to what may be called a recursive scheme. As we get more data, we have to keep updating what we think the satellite is sending.

Communications and signal processing people at that time were not interested in this data updating process. They usually had a fixed piece of data, like a speech waveform. They analyzed it via fast Fourier transforms. If you had one more piece of data, you had to take the whole Fourier transform again. But from the control people at Stanford, and actually through my students who took their courses, I learned about the Kalman filter algorithms for updating. The concept of state that I mentioned is critical to that. So through a Ph.D. student, I began to learn about this method of processing data in order to solve this communications problem. Then I got interested in it for its own sake, and found more problems to study. That’s how I got into control.

### Signal processing research

#### Integrated circuits

**Kailath:**

In the 1960s I had worked mostly in communication and information theory in random processes. In the 1970s I worked mostly in control. In the 1980s I worked mostly in signal processing. That was the evolution. As I mentioned earlier, the analysis of LPC turned out to be close to the mathematics that we had hit upon through our work in control. That was called fast algorithms. At about that time, Stanford, which was very strong in VLSI, was putting together one of the largest university centers for research in VLSI integrated circuits. My challenge was in finding uses for the computational power of integrated circuits.

Now the Speak & Spell chip had implemented the LPC algorithm in a certain non-traditional way for signal processors. It was built in a cascade modular structure, a series of relatively simple and uniform blocks. This was much easier to build with integrated circuits than a random connection of logic. Our math theories had been leading us to studying signal processing in terms of these cascades, and so we now began to look at problems in VLSI signal processing. A DARPA project, jointly awarded to Jim Meindl of our IC Lab, Forrest Baskett of the Computer Lab, and I, helped build a group to do this. I think it was one of my former students, S. Y. Kung, who took the initiative in this, but I was actually one of the founding members of the signal processing technical committee on VLSI.

#### Committees

**Goldstein:**

That was a committee here at Stanford?

**Kailath:**

No. The Signal Processing Society has various technical sub-committees for different areas. Neural Networks is one that has been created recently. Array Processing and Image Processing are others. The VLSI committee was started some time in the mid-1980s.

So by the 1980s I was already heavily into signal processing, working on three kinds of problems. One was fast algorithms for problems in speech and seismic signal processing, the other was VLSI design, and the third was antenna array processing.

#### Sensor Array Processing

**Kailath:**

You have signals coming from different directions, which you have to determine by using an antenna array. The traditional methods for doing that are really equivalent to taking the DFT, or Discrete Fourier Transform of the data. It’s a spatial DFT rather than a temporal DFT, but it’s similar. Incidentally, the FFT was also rediscovered by an antenna engineer, W. Butler, as a way of speeding up the antenna processing. If you have sinusoidal waves in and noise, you would tend to get peaks of the DFT where the sine waves are. Similarly, when you take the spatial DFT of signals coming from different directions, you tend to get peaks at the directions where these signals are coming from. But this is a non-parametric method: it doesn’t take account of the fact, for example, that these are plane waves coming in, or that there are only two or three of them, and they may have a temporal structure.

The DFT doesn’t care about those things. You have some numbers, you take the DFT. Then, a Stanford Ph.D. student, Ralph Schmidt said, “Well, even if you have no noise, that method can’t solve the problem exactly. But, if you properly use the information that there are only, say, three plane waves, then we can get an exact solution without noise.” To do this required nonlinear calculations--finding the eigenvalue and eigenvector of a matrix, et cetera. So that launched a new field of model-based Sensor Array Processing. I supervised six or seven Ph.D. theses in that area. It was timely because those were the days of SDI, and that was one of the funding sources for this because they were interested in tracking missile directions.

**Goldstein:**

Was that work was conducted for a specific project? Was there a specific large-scale project for the feedback communication system you were talking about?

**Kailath:**

No. That was just a Ph.D. thesis that was done in 1966, actually by my first Ph.D. student, Piet Schalkwijk. We were looking for interesting problems and this came up. Communication satellites were already well established by then. I have done a lot of theoretical things, but I like them to be related to problems that other people are interested in. So this seemed to be a nice challenge, and in fact our results got a lot of attention from information theorists.

#### Funding and collaboration with students

**Goldstein:**

How would you become aware of those kinds of problems? What’s the source of your inspiration and funding?

**Kailath:**

<flashmp3>328 - kailath - clip 1.mp3</flashmp3>

Well, that’s a good question. I would say that by and large these different areas I mentioned were first identified as interesting areas to work on given the challenges of the time. Then we got funding for them from basic research agencies, which by and large gave us a lot of freedom to explore what we thought best. The times were easier then for funding.

Another factor was that I had changed from the MIT model I started with. A professor would have a few students working with him, each on a different topic. The professor did his own research and the students’ talked to each other, but not so much about their own work. That’s how I was in my first years at Stanford. But, for example I realized I had to learn about control theory. I went and sat in some courses, but it was easier for me to learn one-on-one. I began to find that graduate students would teach me one-on-one, and then in a few years, we began to follow the pattern that I guess I learned from my solid-state colleagues at Stanford. They often had, as experimentalists, teams of students working together. I began also to realize that it was better to work with students rather than in parallel, as I did until I became full professor. When there was a team of students, and we all worked together in a big area, we got a lot more done, and over time we were able to build up a body of knowledge in different problem areas.

At Stanford there were a lot of very good students and I’m sort of a sucker for good students. I needed to find funding and problems for them. That was one of the driving forces that inspired our coming up with research topics that had some relevance in the outside world, so we could get funding. For example, there was funding for VLSI and SDI. Our ideas seemed relevant, so we got money for them. There never was enough money to cover expenses, so it was always a scramble. The 1980s were a very busy time. At one point, I had a group of twenty-four people: two secretaries, fifteen students, four or five post-docs, visitors, et cetera. Also, our own computer network, named after the Little Rascals! We got lots of things done. In fact, some companies came out of it. The area we were working on, sensor array processing, has become hot again. As we were winding down on the SDI side, we began to develop its potential for cellular and mobile communications.

#### Commercial applications

**Goldstein:**

This is interesting. I’m always interested in cases where the academic research spawns a commercial side. These were companies that came out of the antenna array work?

**Kailath:**

Yes. There was one of them, which was called Array Com. One of my students who was here for many years, Dick Roy, started that along with another associate, Pailraj. I had actually started a company in a control-related field many years earlier that was fairly successful. It's now a public company, Integrated Systems, Inc., doing software for embedded systems.

That took us to about the end of the 1980s. Then, via a circuitous route, DARPA asked us to study for the first time the application of advanced control and signal processing techniques in manufacturing. Manufacturing tends to be sophisticated, but empirical, whereas a model-based approach is more mathematical. A model-based approach means you make a mathematical model of some phenomenon. It is always inaccurate because the world is very complicated, but the models have to be simple. The power of mathematics and statistics is that even simple models allow you to predict things, and you can correct deficiencies in the model by using feedback, which is another key principle of control engineering. We successfully applied that way of thinking to semiconductor manufacturing.

**Goldstein:**

When did DARPA come to you?

**Kailath:**

In 1989.

**Goldstein:**

Let me look back for just a second. You said that at the beginning of the 1980s, Stanford had this VLSI laboratory, and you had gotten involved with them in a signal processing project. Can you tell me how that worked?

**Kailath:**

I was one of five faculty, one from computers, and three from semiconductors: Linvill, Meindl, and Gibbons. Their interest was largely in technology, and it was very expensive to fabricate these VLSI circuits. The issue was, what do you do with all this computer power? What you can do is signal and image processing. So, they called this new laboratory, built with a lot of industry support, the Center for Integrated Systems. I believed in the concept, and we started a successful research group. In fact, one of my colleagues who joined our first this project in this area has already been involved in founding three companies related to VLSI. A couple of my students later formed a company, Silicon Design Experts, Inc., which they have just sold, which came out of our theoretical work. But my early interactions with the CIS, were very minimal at that time. The people there were more interested in the technology, and we were more interested inthe theory. But then when DARPA asked us to work on control and signal processing in manufacturing, any manufacturing, the CIS was the obvious place. They already had a project with TI, which we got involved with, and we worked quite closely with them. So about ten years after my work on the start of the CIS, we began to do some work with them.

**Goldstein:**

I see. Do you know how the work that they did at CIS has spun out into the commercial sector?

**Kailath:**

It happened in many ways. There is a company that was in the news a couple of weeks ago that came out of CIS. They have a very high-speed architecture for chips. Intel has just licensed it, and is going to pay them a royalty of one percent for every chip they sell. So, that’s pretty high impact. But that's only one of several others.

**Goldstein:**

Well, that’s really what I’m getting at. Is the general model that the development might have been patented by the faculty member, and then perhaps that faculty member then has the choice to either commercialize it him or herself, or license it?

**Kailath:**

Yes. Stanford has a mechanism for this. Some of these companies are just people going out, and it’s not necessarily patents. We have a few patents, one or two, which are licensed. But Stanford has an Office of Technology Licensing. It was a pioneering office, actually, that took these patents and attempted to interest outside companies in purchasing them. After a few years when more faculty got interested in starting companies, they would license the inventions back to the faculty companies, and under reasonably generous terms. But the revenues from the licenses are shared by the University, by the Department, and by the investigator.

**Goldstein:**

So, how does it benefit TI to help to sponsor the Center?

**Kailath:**

That’s a very touchy issue, because “Intellectual Property Rights” is a really big issue. I don’t know the details of this, but I think Stanford owns the rights to everything that Stanford faculty and students do. The main benefit that companies get is that they can have their own people in the Center. So they have more knowledge of what’s going on, and will have a few months’ lead over somebody else because of their involvement in the Center. CIS has annual review visits back and forth.

**Goldstein:**

You said that you became involved in signal processing maybe towards the end of the 1970s.

**Kailath:**

Yes.

#### Displacement theory

**Goldstein:**

Can you tell me the state of the art then? What were the interesting theoretical frontiers or applications? How has it changed since that time?

**Kailath:**

Let me first make a separate remark on the pace of technology that I’ve made on other occasions. When I came to this country in 1957, there were no jet planes and it was cheaper to come by sea. So we took a ship to England. We had to spend a few days in London and then sail to New York. I visited Imperial College because a senior classmate of mine was there. I met Professor Denis Gabor. This was before he got a Nobel Prize for inventing holography, but he was a famous guy there. He was working on adaptive filtering, and very proud of it. People were much more accessible in those days. He showed me his scheme for it, which included three or four racks of equipment. He said, “It looks like a lot, but it’s just strings and sealing wax. We don’t have as much money as the Americans. Norbert Wiener and MIT are doing much more fancy things. But look, I’m making these things work.” Then the technology moved from racks to boxes. Then from boxes to cards. Now to a collection of chips on a board and perhaps even to a single chip. It’s amazing. All in less than forty years. It's quite an evolution, and it’s accelerating.

**Goldstein:**

That tracks along with the development of manufacturing technology, or solid-state technology. It’s independent from the signal processing technology.

**Kailath:**

That’s right. But the biggest uses of these chips are going to be for signal processing, and image processing in particular, because of the massive amounts of data that have to be crunched. There’s another side. There are databases to be organized, which means more software development. However, to finally return to your original question.

My first paper in the *Transactions on Signal Processing* was in October 1977 on LPC, linear prediction. Some of our mathematics was related to that. There was an unsolved problem related to LPC. Atal and Schroeder posed the problem so as to introduce Toeplitz equations, which are a particular kind of structure in matrix equations; using this structure helped speed up the solution. But you could get a better physical model by using certain non-Toeplitz equations. People didn’t use the better model, because they thought these non-Toeplitz equations things don’t have the particular nice structure, and so you can’t get the nice implementation of LPC. However, based on some of our math, we said, “No, this other method, the so-called covariance method, has structure too, except it’s not so explicit.”

**Goldstein:**

The covariance method is one of these non-Toeplitz types?

**Kailath:**

It has structures that are not explicit. In fact, we slowly invented a name for it -it’s called displacement theory. I published it first in 1979 in a mathematics journal, but I had been working on the idea for about ten years. Anyway, the October 1977 paper was the first paper that showed that this apparently non-structured matrix could also be solved in a fast way.

#### Speech signal processing

**Goldstein:**

Could you give me an example of what system a non-structured matrix can model better?

**Kailath:**

Speech, for example. It turns out that in speech signal processing, the boundaries are a headache, because you only observe things over a finite interval. To make the mathematics come out right, i.e. simple, you can’t have that sort of discontinuity, so people artificially extended the data out beyond the boundaries. That was done in order to get these structured matrices. They knew this was an artificial extension and that the facts were being distorted. So, what could you do if you just took the data that you had and didn’t prejudice it with assumptions about the missing data. The new model has structure too, but it’s a less simple or less explicit structure than with the first more artificial assumptions. The new model is closer to the physical situation.

This might come up in extending LPC coding to be useful for non-nasal sounds. If you know the terminology, the systems in LPC have only poles and no zeros. Zeros arise when you have two interfering signals. For example, air through the nose and through the mouth interfere and make nulls in the spectrum. LPC can’t model that so well. So our analysis could enable better models of nasal sounds, perhaps. This is one potential application, though I don't know if it has been seriously pursued.

In the mid-1970s, I wasn’t that oriented to the actual applications. We were interested in the math. The next papers in the Transactions were in 1981 and 1982, on adaptive filtering. Then soon after that I began to publish papers on VLSI, and on the antenna array processing community, with special Technical Activity Committees for them. We were the leaders in both those areas.

#### Adaptive filtering

**Goldstein:**

Can you give me some of the background on adaptive filtering? You said that you first encountered it I guess in 1956, at Imperial College.

**Kailath:**

Yes, as I said, in Gabor's laboratory there. He mentioned Wiener at MIT. I went to MIT, but I wasn’t interested in that field. One of the big names in adaptive filtering was my colleague Bernie Widrow. He invented a famous algorithm called the LMS algorithm. He had studied and worked at MIT, but was just leaving MIT when I first met him. He came to Stanford and he invented this LMS adaptive algorithm, which is a very simple way of massaging data. He will tell you more about it. It allows you to deal with data as it comes, without having a lot of models for it. It’s very successful and widely used. It has a certain mathematical structure to it. Professor Widrow is less of a mathematician, much more of an engineer, so he didn’t pursue those directions. But some of the math that we got into through LPC was related, because similar kinds of equations are solved in adaptive filtering. I said to a visitor I had here, Professor Reddy from India, “You know, there’s all this talk about adaptive filtering. Why don’t you see what these guys are doing there, and see whether our techniques can help?” Reddy didn’t know much about adaptive filtering either, but he started to look into it. We had some discussions, and we found that in fact our techniques were applicable. My intuition was right. Our ideas could be used to speed-up adaptive filtering algorithms.

We worked on what is called the RLS, Recursive Least Squares algorithm. It’s a competitor to LMS, except that it is a little more complicated. The LMS algorithm is a very simple algorithm. However, there is a price for this. Suppose it’s used to identify an unknown system. LMS takes a long time to identify that system, it converges slowly. But it had been known from the time of Gauss that there’s a much faster way using what is called Recursive Least Squares. It required a greater complexity. If you had n pieces of data, Widrow's algorithm used about 2n operations, whereas this used n squared. Well, when we came into this field, we showed that you could implement the least squares with, say, seven to ten n rather than n squared. That’s a big difference. Of course, it’s still not 2n, but it’s in the ballpark.

Just two or three years ago, another direction of work that we had gotten into is called H Infinity Theory. It came from the control field, and has led us to do the following result on the LMS algorithm. That algorithm is very widely used, but it’s regarded as sub-optimum. It’s not the best thing to do. The Least Squares algorithm is technically the better algorithm. The reason is that, following Norbert Wiener, we start by setting up a performance criterion and minimizing that. The thing that minimizes the criterion is the optimal solution. Well, Widrow’s algorithm is sub-optimal for the Least Squares criterion. But it doesn’t matter; everyone uses it anyway. Now as I mentioned, we began recently to study a different criterion called the H Infinity Criterion. One of my students, Babak Hassibi, found that Widrow’s algorithm actually is optimal, but not for the Least Squares criterion; for that it is only approximate. But it is optimal for the H Infinity criterion. The H Infinity criterion says, “Suppose you nknew nothing about the disturbances, and in fact that nature was malicious in picking the disturbances, and that you had to protect yourself despite nature trying its best to outwit you, what algorithm would you use?” Well, that’s Widrow’s algorithm. So, it’s pessimistic sometimes, because if you know more, of course, you can do more. But it is very robust, which is also the reason why it has been so successfully used.

**Goldstein:**

Whereas the Least Squares is optimal if the noise is random?

**Kailath:**

Yes, if you have more statistical knowledge. If it’s random and you know a little bit of the statistics, and so on. The noise can be anything. You’ve heard the buzz word Gaussian? But what if the noise is not Gaussian or white, or even random, but deliberately chosen to frustrate you? An example is interference, like another signal from someone else's conversation. If you want to combat that kind of situation, it turns out that LMS is optimal. So, the theory has finally caught up with Widrow’s algorithm! That happens a lot with theories! We’ve kept working on adaptive filtering on and off. One of my students, John Cioffi, is now on the faculty, and he did some more on adaptive filtering. In fact, he has a company called Amati that didn't use exactly those ideas, but it grew out of them. Amati developed the ADSL, or Asymmetric Digital Subscriber Loop technology. It sends data at several megabytes per second over ordinary phone lines, but only in one direction. In other words, a movie company can send you their movie very fast. However, it's asymmetric. You cannot send your stuff back to them very fast. But all you’re sending them is a credit card number or something, so speed is not important. But the point is that you don’t need fiber. You can use ordinary copper lines.

**Goldstein:**

And, it’s coming out of this?

**Kailath:**

It’s not directly related, but some of it is. John Cioffi did a thesis on adaptive filtering and communications, and then he went on into ADSL. So many, many things get intertwined.

**Goldstein:**

I’m wondering why Widrow’s algorithm was used if it was sub-optimal, before people realized it was optimal under the H-infinity.

**Kailath:**

But that was only recently. Unconsciously, it was used for two reasons. One, it was a very simple algorithm, so it was easier to implement. That has a big appeal, because the solution is simpler and so already more robust. It is always true of model-based methods that they work very well when the model is reasonably accurate. But, they can be disastrous otherwise. I think people found that Widrow’s was simple, and seemed to work better in a larger variety of situations. The big success of the model-based method is the Kalman filter, which is largely used for aircraft and satellites. They obey Newton’s law, by and large, so you know a lot about those systems. The aerospace industry uses model-based algorithms and the Kalman filter, because they have models. But what’s a model for communications involving the ionosphere? Or the telephone line? It’s too complicated. So model-based techniques have not been widely used so far, but now results are emerging in that direction, too.

#### Model-based problems

**Goldstein:**

That’s interesting. What models do the aerospace industry use?

**Kailath:**

The dynamics of the satellite or rocket or a ship or automobile.

**Goldstein:**

So what problem are they solving when they have these models?

**Kailath:**

Guidance, for example.

**Goldstein:**

But that’s a signal processing problem?

**Kailath:**

I interpret signal processing more broadly, as including things like estimation and guidance. For example, when a satellite is launched to the moon, a trajectory is chosen, but that’s based on some calculations with models of a spherical earth and a spherical moon, and knowing exactly the distance, being in free space, and of the satellite being spherical, et cetera. These are all approximations. The earth isn’t a sphere, and there’s atmospheric drag. So, if you just launched the satellite and never looked at it again, it would never hit the moon. But, here’s the control theory. Control works on feedback. You look at the ship's trajectory, and then you measure where the thing is. You use the difference between where it is actually measured to be and where you think it should be, and use a so-called feedback error signal by which you change the direction of the ship. But all you’re getting to do all this is some radar measurements of its position, and this is noisy. Because there are disturbances in the propagation part and in your receivers. So you’ve got to extract the information about where the ship actually is, that's the estimation problem.

**Goldstein:**

I see. Well, the thing about it being noisy, that’s a communications problem, right?

**Kailath:**

No, not necessarily. Noise arises everywhere. The origins of estimation theory, at least for engineers, go back to a military application, anti-aircraft fire control. Take, for example, an enemy aircraft. If you aim your gun where you see it, it’s gone by the time the shell reaches that point. So you’ve got to figure out where the plane is going to be at some time in the future based on the speed of your rocket and the speed of the aircraft. So you’ve got to estimate, or predict, the future position of the aircraft, given noisy measurements of the past trajectory. I call that signal processing, because you are extracting information from a random signal. It’s not a deterministic signal. It’s not music, or speech, thought they can usefully be modeled as random, too! That's information theory for you! It’s a random signal. You’re extracting information by processing the measurements, which are related to the useful signals. I say that modern signal processing started with Wiener actually. He formulated the problem of tracking aircraft in a statistical way.

But we can encounter such problems everywhere. Here is an example of signal processing in our semiconductor work. We heat a wafer to 1,000 degrees very fast in a chamber with hot halogen lamps. How do you measure the temperature? You can’t touch the wafer with a probe, because it pollutes the wafer. You’ve got to be indirect. So what you say is there’s radiation coming off the wafer. If you can count the number of photons, then that’s related to the temperature by Planck's law, which says how many photons are emitted at a given temperature. However, the number of photons is random, so you get a count that’s random. You’re interested in something else, which is the temperature of that wafer as it changes in form. You’ve got to infer knowledge about one signal, which is not observable, from another signal, which is observable. Both of these are random (in the latter case, because of the errors in measurement), but there’s some dependence between them, some statistical dependence. That’s signal processing in its fundamental or generic form. It’s extracting information from signals for other purposes. Communications uses signal processing. Control uses signal processing. Some people call it estimation theory. I interpret signal processing quite broadly as processing, working with data.

It could also be binary data as in algebraic decoding theory. We used some of our signal processing work to decode in a new way, what are called Reed Solomon Codes, which were invented in the early days of information theory at MIT around the time I was a student there. It took about thirty years to get them implemented, but every compact disk has them. But here you’re working with ones and zeros. However, much of the mathematics is similar to real numbers. So, I call that signal processing, too. Fourier transforms can be used for such problems too, and in a very nice way, actually. This was shown by Dick Blahut of IBM, and others.

**Goldstein:**

You suggested a very interesting perspective before, about dividing up signal processing problems into model-based problems and ones that aren’t. Can you talk about the history of signal processing from that perspective?

**Kailath:**

Yes. I’m not a scholar of these things, but let me give you my perspective. You see, some of the early signal processing problems and where Wiener got interested was with the subject called time series analysis, which is non-parametric signal processing. An early example was sunspot data. That’s somewhat periodic; every eleven years or so there’s an eruption of solar activity. A man called Yule made a statistical model for trying to fit the sunspot data and predict when the next cycle will be. These are called auto-regressive models.

**Goldstein:**

When was that?

**Kailath:**

In the 1920s. Yule's work was one of the things that spurred an increase of activity in the analysis of random phenomenon. How do you study them? You study them through their statistics. But a useful statistic, or a useful function is the so-called power spectrum. The spectrum is the Fourier transform of the process, and the power spectrum is basically the square of it. But, for random processes, it’s a little fancier, and you’ve got to be careful in how you define it. You can’t just take the Fourier transform and square it, because the result fluctuates wildly with the samples you take. Well, there’s a theorem from the 1930s, known as the Wiener/Khinchin theorem, which says that the power spectrum can be found as the Fourier transform of the covariance or correlation function. But there is a fascinating story here.

In 1914, a meteorologist friend of Einstein’s asked him a question along the following lines, “I have this weather data, and I want to detect some periodicites in it.” Einstein wrote a short, two-page paper, giving some suggestions on how to process this random data. It’s a remarkable note. It has many ideas in it, including the famous Wiener/Khinchin theorem, which was first presented in the 1930s. Einstein didn’t even know all the math required to prove this theorem, because random processes hadn’t been formally defined in 1914. Einstein somehow got the key idea, as a way of computing the power spectrum.

I used to visit the Soviet Union frequently to attend the conferences on Information Theory, and during one of my visits I was told that someone had discovered this paper of Einstein’s and it actually had the Wiener/Khinchin theorem in it. Later, there was a whole Signal Processing magazine issue devoted to a reprint and discussion of Einstein’s letter. That sort of signal processing and the emphasis on power spectra was the initial definition or emphasis of the field of signal processing. Power spectra were being computed in lots of fields, and then came the FFT. I guess you’ve heard all the stories about the origin of the FFT, so I won’t go into it here.

However, the fact is that the rediscovery (it goes back to Gauss) of the FFT in 1965 gave a new impulse to the signal processing field.

A group at MIT led by Al Oppenheim and at Lincoln Lab by Ben Gold and Charlie Rader picked up the FFT and combined it with the ideas of sampled-data or discrete-time systems and defined this as Digital Filtering and Digital Signal Processing. Those activities I would call non-parametric signal processing, in contrast to parametric or model-based signal processing.

Yule’s work in 1927 was, in fact, model-based, because he fitted a model to the sunspot data; he just didn’t take the Fourier transform. He tried to fit a simple (all-pole) model with a smaller number of parameters than the amount of data. LPC was based on the same idea. But I think the systematic and widespread use of model-based ideas was in the controls area, beginning with the work of Bellmann, and especially R.E. Kalman, in the late 1950s.

A lot of the “classical” signal processing doesn’t deal with random things, except fpr (nonparametric) calculation of the power spectrum. The name "statistical signal processing" emerged because there’s more to the signal processing problem than just estimating signals as such. Take radar, for example. A lot of signal processing arises there too. In radar, the question is whether the received data shows if a target is present. Your answer is binary, yes or no. But you process analog data to make this decision. So, we include the study of the radar problem in the field of statistical signal processing. But the newer processing people don’t see it that way. I should emphasize that statistical, model-based signal processing is not necessarily only linear. The antenna array work I mentioned earlier is model-based with non-linear models. The design of special purpose VLSI digital filters is another area which brings in tools from graph theory to solve circuit optimization problems. Nowadays there are many people working on these things. For example, Jerry Mendel at USC got interested in this type of signal processing. He was a control person, and he’s moved to signal processing. Alan Wilsky at MIT is another wonderful example. And there are many others.

**Goldstein:**

What is the impact of the model-based approach? Does it open up new problems that might not have been solvable? Or does it provide an alternative approach to an existing set of problems?

**Kailath:**

The answer to the latter two questions is yes to both of them. As for the first, it’s not easy to always quantify Certainly state-space estimation is very widely used. However, one should note that very few theories get used in a pure form in practice, because the theory deals with an ideal world and the real world is different. However, the theory is the foundation for realistic real-world solutions. The fields of radar, sonar, wireless communications, image compression, et cetera, are examples of where this transfer of theoretical results to practice is accelerating.

Here is a recent example from our work in semiconductor manufacturing where we are also doing signal processing. It’s in the design of phase shifting masks. We have analytical model-based methods for doing that. Industry is just beginning to use these ideas to extend the useful life of the very expensive (approximately one billion) microlethographic equipment and to break the so-called 0.1-micron barrier. My students have just started a company to translate the theory to useful software. I think the industry is showing a reasonable amount of interest in it, because things that would take them a day or two to do, we can do in minutes.

**Goldstein:**

You mean in terms of controlling the manufacturing process?

**Kailath:**

No, making a design to achieve a certain goal. Here’s an example. You have a wafer on which you want to imprint a certain pattern of connections of wires and transistors and all that. But for that, you put photo resistor on the wafer and you shine light through a mask, and expose a desired pattern on the wafer. Then you etch away the exposed photoresist. But, suppose you want a certain pattern on the wafer. If you had a perfect source, a pure sine wave of light at one frequency, and a perfect lens to exactly focus your light, then you could just build a mask that was the same as the pattern on your wafer, shine the light through it, and you’d have the same pattern on the wafer. But the light is not perfect and the lens is not perfect. If you make a mask with a certain pattern of lines, what you’ll get at the wafer, because of diffraction and other imperfections, is another pattern. So two lines may be touching because of the defects of the optics, whereas you want them separate. To avoid this, you have to move them sufficiently apart, whereas you want to get them even closer, e.g. from the 0.25 mm we have now to say below 0.10 mm. To do this, you have to solve an inverse problem. You want a certain intensity pattern at the wafer. You’ve got to design something that has intensity and phase, which is the mask, to be different from the intensity pattern that you want on the wafer, so that when the light goes through the imperfect optical system and then through your distorted mask, it gives the correct pattern at the wafer. Well, that’s all signal processing.

**Goldstein:**

Right. You can think of the imperfections as a filter, and so what you have to do is work backwards and come up with a signal that accounts for the imperfections.

**Kailath:**

Right. The point is that this is a difficult nonlinear inverse problem. We have some approximate analytical solutions to it, whereas what’s done in industry has a little theory and a whole lot of trial and error. But model-based signal processing allows us to get very fast solutions.

**Goldstein:**

In tracking the rise of model-based signal processing, you’re saying it’s the mid-’70s?

**Kailath:**

Yes, in the sense that more papers (especially on array processing) began to appear in the traditional signal processing journals, especially the *IEEE Transactions*. But as I said earlier, the actual origins could be traced back to Norbert Wiener’s work in the 1940s, to the WWII work in radar, et cetera.

**Goldstein:**

Let me go back to the Speak & Spell toy, which I understand had a big impact.

#### The Speak & Spell toy

**Kailath:**

Yes. But, you know, that’s trivial stuff now, compared to the things that are happening, and that are going to be happening.

**Goldstein:**

What was so important about its impact? Did it help to close the gap between what was understood at the university level and what you actually saw on shelves?

**Kailath:**

I think some other people like Ben Gold can tell you much better. But I think the point was processing speech and synthesizing speech was very, very expensive. The vocoder was studied by the people at Bell Labs; a lot of research was done on it and on related problems. But you needed very expensive systems to work with speech. Now TI used to have (and perhaps still does) an innovation fund, and you could apply for $25,000 to do special projects. With early ICs and with the theoretical idea of LPC, some TI engineers suggested that you could synthesize a lot of different speech sounds by using a certain structure with ten to twelve numbers that you could set differently for different sounds. These are the so-called reflection coefficients of a lattice filter structure. So they were able to synthesize decent quality speech very cheaply, at least for a child’s toy. But the impact was the discontinuity in the cost of these speech synthesis devices compared to what the existing technology was. It was done by taking a little theory, which the guys knew, and the technology that was available at the time, and mixing the two. So I would say that was an early success of model-based signal processing.

#### Accuracy and theory in model-based research

**Goldstein:**

The thing about the model-based approach is that you need accurate models. Has that been a problem?

**Kailath:**

Yes, this has always been a barrier. A very wide range of tools has been used in model-based signal processing, as a glance at the last 15-20 years of the literature (*IEEE Transactions*) will show. The other issue is getting good models. What is good is hard to define and identifying good models from data is not necessarily easy. However, the point is that “simple” models often work remarkably well, especially when statistics (noise) is used to cover up high frequency details. It has been proved countless times over (in physics, chemistry, and engineering) that very simple models, which can be quite inaccurate, turn out to have tremendous predictive power. So you don’t really need very fancy models. Simple models will do. Often it’s not actually the model and the solution, it’s the ideas that you get from that way of analyzing a problem, which you later somehow approximate that gives you the final results. But it can take time to achieve all this. When I went to MIT in 1957 there was a big research project there that had I read about in *Popular Science* in 1949, or 1950. They wrote about how transmitted TV pictures but was a very wasteful process because almost every pixel is the same in nearby pictures. Why not just send the changes? The real information is not all the masses of dots that you’re getting, it’s much less than that. So Information Theory people said, “Yes, the so-called entropy (or information) content of the TV signal is much less than the nominal information content.” To analyze this you need the statistics of images of how people move about, et cetera. It was a big project at MIT, trying to get the statistics of TV signals, or the statistics of music, so that those statistics could be used to compress the data. They gave up by the mid-sixties or so because it was too complicated to do this. Gradually, however, the theory evolved and they found that you didn’t need such accurate models; you could adaptively improve them, so you could start with simple models. Now there are very effective techniques for compressing data, which are used in all computers.

Every time you click on a file you use something called the Ziv-Lempel (or the LZ, or LMZ algorithm. Ziv was a classmate of mine at MIT. LZ a data compression algorithm that doesn’t need any statistical models of the data. However, you would never have thought of Ziv’s algorithm, if you didn’t start with a model-based approach, and based on knowing all the statistics. So, the point of model-based signal processing is that you may be rescued by further development of the theory. But the theory can’t go to the most general problem immediately. You must go through a sequence of steps, gradually reducing the amount of model information that is needed.

**Goldstein:**

Were there any physical systems that resisted modeling? You said before "the ionosphere, who can model that?" It sounds like now you’re saying that’s not really a problem. Because you create this simple model.

**Kailath:**

There are many processes in nature. There are models of the molecular level, which are very complex; they use all the material flows and connections. Such models can be very useful for simulating the processes. Then there are models for control which are much simpler. They are so-called black-box models. In other words, control engineers model aircraft using second and third order systems, while from a physical point of view an aircraft is a millionth order system. But in control one defines certain limited objectives, and to achieve those, simple models suffice. Of course, there are problems where modeling hasn’t reached an adequate level of sophistication and a lot of manufacturing processes are like that. But as we look further into them, we find that we can make useful simple models of some of these things.

The trouble is there are culture gaps between different people. We found, for example, that when we went to really work with the people in integrated circuits, their culture is very, very different. I have some slides that show that. On one it says "Electrical Engineering," and there’s a big line though the middle of it. On one side is the math- or systems-based approach, deals with circuits, signal processing, control theory and information theory. On the other physics-based side are lasers, integrated circuits, solid state theory, and so forth. We found, at least in integrated circuits manufacturing, they don’t seem to make much use of modeling. Physics is used to invent a processing device. Then they collect masses of data. They have what they call response surfaces and they figure out operative points by trial and error. I am, of course, grossly over-simplifying. But it takes them a few hours to get those settings, by trial and error, and you waste a lot of wafers in the process of getting to them. Whereas in that problem, we made some models and using them. We could get to these settings in ten or fifteen minutes with five wafers, as opposed to a hundred wafers and five hours.

**Goldstein:**

I wonder what people would say is the driving force behind creating these models? It sounds like it’s not quite something that would come out of the university or academics, because it’s very applied, very focused on specific problems. I don’t know if that’s an accurate characterization.

**Kailath:**

No, it’s not. I think what happens is that the work that is done in the universities just assumes models of different kinds, often without knowing how they arise from physical problems. In control theory, one has equations with certain matrices, ABC. For academics it's just letters that go in to these matrixes. In industry it has to be numbers. You’ve got to work hard to find these numbers. So a lot of model identification from data is actually done in industry. But, many of the theoretical techniques for doing that came from research in the universities.

**Goldstein:**

That’s why it’s not so strange you got into model identification.

**Kailath:**

Right. Now, we did some theory on that too. We used that theory for these semiconductor applications. We received an IEEE prize from the Transactions and Semiconductor Manufacturing, which is one of the prizes I am proudest of, because I’ve got all the awards from the Signal Processing Society. To have a different group recognize outsiders was important to me, particularly given that it was an applied paper. We used signal processing theory, and control theory, to solve their problem. And they appreciated it.

#### Student interest in signal processing

**Goldstein:**

Another question I wanted to get to had to do with your students. I’m hoping that it will be a measure of where signal processing is to evaluate what it is that the students know about and what they are interested in. Looking back to the '70s, when you first started taking students in this area, can you recall what their training was in, in signal processing, and what they wanted to work on?

**Kailath:**

Our lab was called the Information Systems Lab. And my colleagues called our students "ISL types." These are students whose interest is using mathematical techniques to solve electrical engineering problems. Though, as you know, electrical engineering is very broad. People do physiology and call it electrical engineering. Basically, we take ISL types and we look for a problem.

**Goldstein:**

You are reading from the RLE special issue, right?

**Kailath:**

Yes. I said there, “The process of research is always a challenge. For example, how does one enter a new field, such as VLSI design and semi-conductor manufacturing, from scratch? It’s certainly a test of what we profess to teach. It’s also challenging to take an inexperienced student and have him or her cross the threshold as an independent investigator in four or five years.” The students come as raw material. They are willing to work on anything. First of all, you must encourage that confidence, and say that you’ve got to learn a lot of new things, but of course you can never learn anything completely. You must learn the process of asking questions and doing your own investigations. I think ISL students have a certain inclination to doing this mathematically. That’s the difference from other branches of EE. Such students gravitate to this lab, and we take them and work on these different projects. They don't have any more particular knowledge.

I’m very proud of my students at Stanford. They are very good that way. I’ve had seventy Ph.D. students, and over thirty post-docs. This book, published on the occasion of my sixtieth birthday, has a list of them. You can look and see where they are, and what they are doing. That will give you an indication. At the beginning many of them went to universities, or to IBM or to Bell Labs. But now a lot of them are off in industries, and some in their own companies.

#### Kalman filtering

**Goldstein:**

The other thing I wanted to ask you about was the Kalman filtering, which you mentioned before, but I don’t have a good picture of when that enters the scene, and what impact it has, as it’s absorbed.

**Kailath:**

I’m a big student of Kalman filtering. That came on the scene in the 1960s. Theory and research lead you to explore certain problems. Kalman, as a student at MIT, began to use this different way of studying control systems, called state-space methods, which were opposed to the then universal frequency domain methods. However, state-space methods are not that new, because mathematicians had been using these techniques, and in fact, the Russian engineers had been using them. But Kalman’s mission was to systematically use state-space methods in many fields of engineering.

He told me, for example, that Wiener filtering, which is sort of non-parametric, had been filtering was running into certain roadblocks, and could not solve certain classes of problems. Kalman wondered what would happen if you formulated these problems in form. He got a nice result. The solution also had a state-space form, and so it met with the Kalman filter. He did this in 1958-59, with no application in mind. But 1958-59 was when Sputnik appeared and suddenly there arose a big interest in tracking spacecraft. They started with Wiener filtering techniques, which were appropriate for these problems, because they were non-stationary and the classical Wiener methods are not applicable.

So, lo and behold here was a theory which appeared on the scene just when some people needed it. Some people at the NASA Ames Lab in Mountain View had to solve the problem of tracking satellites. The Kalman theory came at the right time, and it was just right. For spacecraft, one can get reasonable models, inaccurate but adequate.

I got into Kalman filtering theory through feedback communications. In such problems, one needs recursive updating of estimates. For example, in satellite tracking, California sees a satellite for ninety minutes, then nobody sees it until it crosses somewhere in Australia, and then you get another ninety minutes, and then someone in South Africa gets another ninety minutes. So you’re constantly updating your location estimates. After a few passes around the globe they know pretty accurately where it is. These are the techniques of Kalman filtering theory. But, its main contribution on the theory side is that it introduces state-space structure into estimation theory. Structure is the most important thing in mathematics. Mathematics is the study of structure. Kalman filtering is a great example of what you can do when you use state-space structure. For example, Kalman showed that not only could he solve estimation problems nicely, he could also solve certain circuit problems. We find we can solve certain signal processing problems. I have introduced a different kind of structure called Displacement Structure, which can co-exist with state-space structure.

#### Displacement structure

**Goldstein:**

You had just mentioned your work with displacement structure. Could you tell me about that? How did that get started?

**Kailath:**

Yes. I’ve been saying from time to time that there is non-model-based signal processing, which is largely the application of Fourier theory. Then there is what I call model-based, which is largely the state-space and Kalman type theory. The displacement structure fits in-between. It’s related to the LPC kinds of structure. It came up from two different points of view. One was studying certain nonlinear Riccati differential equations, which arose in the Kalman filter theory. But, those differential equations are related to some linear equations called Wiener/Hoff equations, which I knew from Wiener’s theory of prediction. They are related. The Wiener/Hopf equation itself came up in astrophysics. So by putting some of those things together, we isolated a concept in the Kalman filter of displacement structure, and then also for matrices, which came up in the LPC discussion as I said earlier. In LPC you have to solve equations involving a matrix with a particular kind of structure called a Toeplitz structure.

That structure occurs a lot, and when it’s present, you can use it to a great effect and simplify the solution. But suppose you take two structured matrices and multiply them together, it turns out they lose that structure. Now, you need to do such things in applications: you have to multiply matrices or you’ve got to take inverses of matrices and so on. Briefly, many things one does in the course of massaging so-called data destroy any original underlying structure. For example, the “covariance” method of speed processing, for which we found a fast algorithm in our first paper (in 1977) in the Signal Processing Transactions. The Toeplitz structure that people loved was destroyed in the covariance method. We pointed out that the product of two structured matrices may not look structured in original way. Certainly not arbitrary, it must have some structure. Another example is taking the inverse of a matrix, but after all, even though it looks unstructured when you take its inverse, it’s structured in some sense. And that sense is displacement structure. I’ve been working, in addition to several other things, on that theory, for a long time. In fact, I’m going to Israel next week to give a set of lectures to the math department on displacement structure theory. Displacement structure is something that’s grown, and we have found it a very powerful tool for speeding up algorithms in control, communications, signal processing, and mathematics. I like the interplay of these ideas from different fields, and I tend to work in many of these fields still, to varying degrees. It’s my secret weapon!

**Goldstein:**

The thing that is interesting is that while it’s certainly apparent that the interplay matters, it almost seems accidental. The way you’re describing some of these things, it’s almost accidental which ideas will be picked up, and cross-fertilized.

**Kailath:**

Yes, there is luck involved. In all research there is a lot of luck and accident. I feel I’ve been a very lucky person in many of these things. As I said, I never worried about tomorrow. But I’ve also not made discontinuous jumps. Some people live all their lives and do only non-parametric signal processing, and some people only do control, and they are quite happy. I feel differently; I welcome change. Some people may regard it as discontinuous, but I don’t. The control I did had connections with communications. The communications I did later had connections with control. And so on. So I regard all of them as having some threads in common. I’ve always enjoyed making connections between different fields.

### IEEE and publication

**Goldstein:**

Does one need to be careful in choosing where to publish a paper, particularly since there is so much overlap in fields?

**Kailath:**

Yes. It can make a difference to the reception of a paper. I have tended to publish largely in IEEE journals. They are all carefully refereed, which is an advantage. It slows down the publication process, but everyone profits, both the writers and readers. Sometimes we have had papers whose impact may have been a little less, because, say, we published them in the Information Theory transactions rather than in Communications or in Signal Processing. But it’s hard to say.

As a former president of the Information Theory Society, and a member of the administrative committee of the Control Society, I’m well aware that the most rapid growth has been in Signal Processing. Today, Information Theory is one of the smallest groups, with 5,000 members. In the mid-’70s, Signal Processing only had about 5,000 members. But now it has over 20,000. It has grown very rapidly. But I think you will find that there are people, you know, especially the model-based people, who belong to the Control Society as well as the Information Theory and the Signal Processing Societies. But, that number is small. Much smaller than 20,000. But, you know, we don’t want to spread the word too much, working across fields is our secret weapon.

**Goldstein:**
Can you give some sense of what goes through your mind when you’re weighing publishing in the Transactions of the ignal Processing Society, versus one of the others?

**Kailath:**

I’m not sure that I can answer that. Largely I think it’s the potential audience. The Transactions is undoubtedly the most prestigious signal-processing journal. Some of the stuff we publish in math journals, because the subject is oriented to math. We try to write them more for mathematicians. The percentage breakdown of my papers has been about 15 percent each for Information Theory, Automatic Control, Signal Processing and Math. Then there are a few others, especially in recent years, as we have worked in semiconductor manufacturing.

**Goldstein:**

Thank you very much.

## Contents

- 1 About Thomas Kailath
- 2 About the Interview
- 3 Copyright Statement
- 4 Interview
- 4.1 Education
- 4.2 Entry into signal processing
- 4.3 Signal processing research
- 4.3.1 Integrated circuits
- 4.3.2 Committees
- 4.3.3 Sensor Array Processing
- 4.3.4 Funding and collaboration with students
- 4.3.5 Commercial applications
- 4.3.6 Displacement theory
- 4.3.7 Speech signal processing
- 4.3.8 Adaptive filtering
- 4.3.9 Model-based problems
- 4.3.10 The Speak & Spell toy
- 4.3.11 Accuracy and theory in model-based research
- 4.3.12 Student interest in signal processing
- 4.3.13 Kalman filtering
- 4.3.14 Displacement structure

- 4.4 IEEE and publication