Oral-History:Karen Spärck Jones
About Karen Spärck Jones
Karen Spärck Jones was born on August 26, 1935 and grew up in Huddersfield, England. She attended the University of Cambridge, earning a B.A. degree and a Ph.D. degree. Spärck Jones joined the Cambridge Language Research Unit, where she received her introduction to computing, in 1957. Moving to the University of Cambridge Computer Laboratory after a few years, Spärck Jones continued to work on natural language processing and information retrieval for the next five decades. Among her most important contributions is the concept of inverse document frequency (IDF) weighting in information retrieval. Introduced in a 1972 paper, this technique is now a standard feature in Web search engines. Although Spärck Jones retired as Professor of Computers and Information in 2002, she continued to work in the Computer Laboratory until shortly before her death on April 4, 2007.
Apart from her own work, Spärck Jones consistently promoted research in her field, both within the United Kingdom, such as in her Alvey Coordinator role in the 1980s, and internationally, most notably as President of the Association for Computational Linguistics (ACL) in 1994. Moreover, she was a Fellow of the AAAI, ECCCAI and the British Academy, of which she was Vice President from 2000 to 2002. Spärck Jones also received many awards for her work, including the ACL Lifetime Achievement Award, the Lovelace Medal of the British Computer Society, the ACM SIGIR Salton Award, the American Society for Information Science and Technology’s Award of Merit, the ACM-AAAI Allen Newell Award and the ACM Women’s Group Athena Award.
In this interview, Spärck Jones talks about her distinguished research career. After briefly outlining her educational history, Spärck Jones discusses her introduction to computing at the Cambridge Language Research Unit. She explains how she became interested in natural language processing and information retrieval while working there, as well as how these research interests developed over the next four decades. Here she describes each of her major research projects from the 1960s, 1970s, 1980s and 1990s. In addition, Spärck Jones discusses her nontraditional career path, including her lack of formal training in computer science. She talks about being at the Computer Laboratory's margins for much of her career, describing both the disadvantages and advantages of her position. In reflecting upon her own career, Spärck Jones also comments on more general topics such as the status of women in academia and the evolution of the computing field.
About the Interview
KAREN SPÄRCK JONES: An Interview Conducted by Janet Abbate for the IEEE History Center, 10 April 2001.
Interview #593 for the IEEE History Center, The Institute of Electrical and Electronic Engineers, Inc.
Copyright Statement
This manuscript is being made available for research purposes only. All literary rights in the manuscript, including the right to publish, are reserved to the IEEE History Center. No part of the manuscript may be quoted for publication without the written permission of the Director of IEEE History Center.
Request for permission to quote for publication should be addressed to the IEEE History Center Oral History Program, IEEE History Center, 445 Hoes Lane, Piscataway, NJ 08854 USA or ieee-history@ieee.org. It should include identification of the specific passages to be quoted, anticipated use of the passages, and identification of the user.
It is recommended that this oral history be cited as follows:
Karen Spärck Jones, an oral history conducted in 2001 by Janet Abbate, IEEE History Center, Piscataway, NJ, USA.
Interview
INTERVIEW: Karen Spärck Jones
INTERVIEWER: Janet Abbate
DATE: 10 April 2001
PLACE: Spärck Jones's office at the University of Cambridge, Cambridge, United Kingdom
[Notes courtesy of interviewer Janet Abbate]
Background and Education
Abbate:
It’s April 10th, 2001. I’m speaking with Karen Spärck Jones. To start out, can you tell me when and where you were born?
Spärck Jones:
I was born in Huddersfield, in Yorkshire, which was a textile manufacturing town, in 1935.
Abbate:
You grew up there as well?
Spärck Jones:
Yes, I spent nearly all my childhood there. I went and lived somewhere else for two or three years during war, because my mother was Norwegian, and she went and worked for the Norwegian government-in-exile in London; and my father was a Lecturer at a technical college and he was also doing evening work, and he couldn’t look after me; so I went and lived with a family in the country. So that was not completely standard; but basically I lived there all my childhood.
Abbate:
What did your father do?
Spärck Jones:
He was a chemist. He’d worked for quite a long time in the technical college as a chemistry Lecturer, and he’d worked in industry, before, as a chemist.
Abbate:
Did he encourage you to have an interest in maths or science?
Spärck Jones:
He was very concerned that I get a good education. My mother was, too, although she’d had a rather uninspired sort of education they gave girls in Norway, in Bergen, in the beginning of the twentieth century; it was not a very advanced education, and she never went to university. They both believed in education, and my father was very keen that I should go to university. I liked school; I was interested in academic life and things like that. I wanted to go to university, and he was very keen on it.
I did a whole range of subjects at school—I mean in science. I was on the science side, because you had to choose in U.K. schools, then; you were either on the arts side or you were on the science side, so I chose the science side. But I got a good range of humanities subjects as well. I wasn’t terribly good at maths, because it wasn’t very well taught. That’s been a regret to me.
Abbate:
What sort of secondary school did you go to?
Spärck Jones:
It was what’s called a grammar school. It was a select, competitive-entry secondary school. That was from age 11 to age 18, so I was there for seven years. We had major exams at the end of five years, and then two years in the so-called sixth form, in which you were doing advanced specialist work, so I was only doing three subjects then.
Abbate:
Was that all girls, or was it mixed?
Spärck Jones:
It was girls. It was a completely classical type of school that had been set up by legislation at the beginning of the century—and they were good schools: they taught well, and they had a good style of doing things, and so it was okay; I liked it! As I say, my parents were very keen that I should have a proper education, and encouraged me in thinking about things like wanting to go to university. I was an only child. My father was older, because his first wife had died, and so he’d married mother as his second wife. He was very pleased: you had to take a competitive examination to come to Oxford or Cambridge then, and I think he was very excited that I actually got in to Cambridge—because my school, unlike some of the private schools, which trained people to come to university, didn’t train people to go to Oxford or Cambridge. People went to university, but mostly on the basis of the ordinary exams that they sat at the end of their sixth-form life at school: A Levels, the main state exam. But Oxford and Cambridge then had their own specialized entrance exams, so you had to do them. I’d said I wanted to come to Cambridge from about the age of 12, and I did!
Abbate:
What made you want to come to Cambridge?
Spärck Jones:
I don’t know; I just said, “I want to read history.” My father, I think, was disappointed I didn’t want to be a scientist. But I said, “I want to read history at Cambridge,” and I just set about doing it!
Abbate:
So your degree was in history?
Spärck Jones:
That’s right. The degree subject you had here then was extremely specialized; I did virtually nothing but history for three years. That was the way you did it. In the sixth form at school I’d read history, English, and French as the three main subjects, and then, [at Cambridge] I read history—with little bits of things like political thought round the edges, but essentially it was a one-subject honors degree.
Abbate:
When did you graduate?
Spärck Jones:
I graduated in 1956. The women’s colleges in Oxford and Cambridge then were single-sex colleges, and the university was almost entirely male. The women’s colleges had been set up in the nineteenth century as part of the movement for higher education for women, and when I was here there were two women’s colleges, Girton and Newnham, and the ratio of female students to male students was about one in nine. It basically was a masculine university with two women’s colleges sort of stuck there. The women’s colleges, although they’d started in the nineteenth century, had only become fully autonomous, independent colleges—able to entirely govern themselves and get full degrees and everything like that—in 1948, I think it was. They’d been around for a long time. Initially they weren’t even allowed to take examinations or anything; then they slowly crept forward, and finally they were recognized as fully autonomous. But there’s been huge changes since then, because in the ‘60s they founded graduate colleges, which were mixed from the start; and then the undergraduate colleges decided that they were going to go mixed, too; and so now virtually all the colleges are mixed, so the actual balance of the sexes is much better in the university.
But I have to say, I think there was something to be said for being in an all-girls’ school and being at an all-girls’ college. You were very serious about your education. It was a privilege to come to Cambridge; you didn’t mess around. This was something that you were grateful that you’d actually succeeded in doing, so you took it very seriously.
Abbate:
Were there many women doing science degrees at the time?
Spärck Jones:
Oh yes, there were. I had some friends among the scientists. As far as I know, there were probably as many people doing sciences as humanities, but I don’t really know the numbers. There certainly were a lot of scientists, and I knew some of them.
Introduction to Computing: The Cambridge Language Research Unit
Spärck Jones:
Essentially I got into computing completely by accident. After I’d finished my bachelor’s degree I got interested in philosophy, and so I asked to stay on to read the third year course of the philosophy tripos. (We call it a tripos here; that’s what your degree subject is: your tripos.) I wanted to read what was then called the “moral sciences tripos,” which means philosophy. Originally, you see, there was a contrast: there was natural sciences and moral sciences. Natural sciences were what you might think, and the moral sciences were all the other things. Anyway, things like history and such broke away, and so what was left of moral sciences was philosophy. Anyway, so I read philosophy for a year. I enjoyed it very much; I don’t know that I was very good at it, but I really enjoyed it.
And then I went away and was a school teacher—because there wasn’t very much that women could do back then. I’m envious of younger people now, because I think they’ve got far more opportunities. When I was looking for a job, you could be a school teacher, or if you wanted to go into industry—except if you were technically qualified, in which case you might work in a lab—you could be a secretary or you’d go into personnel. There was no route to management. The idea that you might run anything as a woman was completely alien; it really was. Industry was totally anti-women. Quite a lot of people tried to get into the Civil Service; that offered quite a good career opportunity. Of course, some people went into the law, but not many. When I was in my fourth year, there were two women law students amongst hundreds of men, and they were very isolated. And basically, a lot of people went into teaching. Many of my friends went into teaching, because that’s what you did. I didn’t think I was particularly suited to do teaching, but I thought, “Well, I’ve got to earn a living!” Because my mother was a widow; my father had died in my first year, so my mother was a widow and not at all well-off, and she was living abroad, and so I had to find something to do. You know, you have to earn your living. So my director of studies in college said—I think it was completely irresponsible, actually!—she said, “Well, [there’s] a school that one of my former pupils was in; she’s just left the school, she’s going to a have a baby or something, and they’re looking for a teacher. Would you like to go along?” It was a school like the sort that I’d been educated in myself. I think it was completely bad to send somebody into school without any proper training or anything, just because she’d come from the right college! So, I think that was actually not very good. I didn’t like the teaching very much. It wasn’t something I wanted to spend my life at.
And then I was just lucky—I’ve had some things that were lucky in my life. I already knew Roger Needham, whom I later married; he read maths here and then philosophy. In fact, I’d known him right from my first year, just as part of a group of friends. He’d begun to interact with a little research unit that had started up in Cambridge, which was completely outside the university, because the university didn’t like attached research units. It didn’t believe in things like that; it didn’t want to have things clustered around it; I think it thought it might be invited to take them over, and didn’t want to commit itself to doing this. So there was this little body called the Cambridge Language Research Unit, which had been started by Margaret Masterman, who was the wife of Richard Braithwaite, who about that time was elected one of the philosophy Professors. He was really an analytical and logical philosopher. She’d had a rather exotic career, and she’d got interested in language. Really, it’s slightly surprising: she’d formed some ideas about the basis upon which you might do translation—because in the ‘50s, work had begun on machine translation. (A very interesting book has been published recently, edited by John Hutchins, of memoirs of these pioneers at the time.) She’d got interested in translation and in the idea of doing automatic translation. She and her colleagues were initially in a kind of discussion group—it was an ad hoc discussion group, with people who were actually earning their main living in some other way, but they came together to talk about the problems of language translation, and the representation of meaning, and things like that. Some very well-known people were involved, like Michael Halliday, who is a well-known linguist. Anyway, they put in some papers for a conference on machine translation in the U.S. in ‘56, and they were accepted, and they wowed people with their ideas, and so the National Science Foundation gave them a grant—because the NSF could give grants externally then. So then, of course, they could start actually employing some people to work on machine translation.
Roger was reading—We had a diploma in computing; it was called the Diploma in Numerical Analysis and Automatic Computing, at that stage. We started it in ‘53 here in the lab, and as far as we know it was the first course on computing that there was in the world. It was very early, and it’s still running; it’s a very successful one-year course. Roger had got interested in computing, and so he took that course. He took that course one year, from ‘56 to ‘57, while I was doing Part Two Moral Sciences. I would come to Cambridge at the weekends when I was teaching, and I’d go in to find out what the Language Unit was doing, because I thought that they were rather interesting; and Margaret—who was a very strange and interesting woman, rather exotic—said, “Why don’t you come work for me?” So, I said, “All right! Why not?” This was Margaret Masterman; she would always refer to herself professionally as Margaret Masterman, even though she was actually Margaret Braithwaite.
Abbate:
So when did you actually first use a computer?
Spärck Jones:
Well, I didn’t use it directly, you see. I began to work at the Language Unit, and we began to work on how to do machine translation, and also we got interested in information retrieval; because we said that a thesaurus is a way of representing concepts, which can be used both for translation purposes, as kind of inter lingua, and also as a kind of classification structure for information retrieval—because people use thesauri for information retrieval. And they were built manually, and we said, “We think they could be built automatically.” So we were working on these kind of things. I was in particular interested in the idea of building a thesaurus automatically—primarily, at that stage, in connection with machine translation.
So anyway, what happened was, we did computational experiments, but we didn’t have our own computer. This Language Unit had this funny little establishment out in the suburbs of Cambridge that didn’t actually have computers; it had punch-card machines, redundant punch-card machines that were given them by British Computers and Tabulators. So we were doing computing, but in this rather primitive mode.
Roger had already begun to work on his doctorate, and he was using the University’s EDSAC II to do experiments in information retrieval: automatic classification to do things like constructing thesauri automatically, for retrieval purposes; and he did some other things as well for his thesis. He got his Ph.D. in the early ‘60s. And then I was working at the Unit, and I did all kinds of things for the Unit, but at the same time I was registered as a Research Student, and I began to work on my own work. Actually, my first contact with computing was via Roger, because he ran the experiments that I wanted to do. We used his programs—my data was an application for his programs, you see. So I was able to exploit my husband [laughs] to actually run the experiments for me. He was doing experiments himself; he did experiments in automatic classification of things like medical data, as well as in information retrieval, and some other things like that. And I got interested in the particular way that you might characterize linguistic data as this kind of raw input to a clustering program, and then he would actually run the classification experiments for me.
So that was the way I was doing computing, in some sense slightly indirectly, for my thesis, which I finished in ‘63. I got [the degree] in ‘64.
Abbate:
So you would structure the problem, and someone else would code it?
Spärck Jones:
I was interacting with him and things like that, but I was using the ideas about automatic classification that he’d developed, and that he’d got programs for. What I was doing was thinking about how to provide the initial data that captured what I wanted to capture about basic facts about language—about the way words are used, in a very fine-grained way—so that you could then push that into this clustering program, to build clusters of words that behaved in the same way. I’d find a few little bottom-level facts about words, and then, on the basis of that little fine-grained low-level information about words, we demonstrated that you could build thesauric-like classes of words that tended to behave in the same sort of way, and so therefore had similar meaning. Essentially, what I was trying to demonstrate with the thesis was that there was a well-founded procedure for deriving a thesaurus—namely a way of classifying groups according to the concepts that they represent—automatically. So that was what all that was about.
Abbate:
Interesting.
Working on Information Retrieval
Spärck Jones:
But then I decided I really did want to . . . I mean, I used to come down to the lab and see what happened with the programs and stuff like that, and I decided I actually wanted to learn how to program. I didn’t think I could do the diploma, because there was quite a lot which was actually more mathematical with the content, and I couldn’t have done that because I’d stopped doing maths at 15, so I hadn’t really got the qualifications to do that. So what I did was, I essentially learned how to program myself, with assistance from Roger.
After I’d done my thesis, I began to work on some problems in information retrieval. There was quite a lot of interest in information retrieval. It was more easy to get money for it; you could get grants for it in a way that you couldn’t for language work.
Abbate:
Because of the technical aspect?
Spärck Jones:
I don’t know what it was. A little bit later on, the whole of language work was very badly hit in the U.S. by the ALPAC Report; this was this report that said everything that was done on machine translation was mistaken and ill-conceived. But it seemed that it was easier to get money to work on retrieval.
So what I began to do was to think about doing some experiments in classification for retrieval. It was the same kind of idea that I’d had in my thesis. My thesis supervisor was Braithwaite, but he was a rather remote supervisor; I was to go see him once a term, and I’d tell him what I was doing, and he’d sort of struggle to understand it, and he’d say, “Yes, that’s all right; perhaps you ought to read so-and-so.” But essentially my thesis was a self-driven thing; I had to really do it myself. It was quite hard work, because I was actually working as a research assistant at the Language Unit as well, and doing whatever I had to do there, which was doing some other things, you see. So it was quite complicated.
But I decided I wanted to learn how to program, because I was going to start doing these experiments, which were associated with the Unit, on retrieval thesauri—an indexing thesaurus: how you could categorize documents, you see, so you could retrieve them.
So I learnt to program, and I was learning to program for the new machine that came up in ‘64, which was the Titan, so I learned essentially to program in a language which was much like assembler. There wasn’t a higher-level language to program in. And I think it’s fair to say that the first program I wrote, which was an enormous data-processing program to take some data supplied from somewhere else and get it in [the computer] in order to work on it for these retrieval experiments we were going to do: it was probably the largest program I’ve ever written! The data came along in gigantic paper tapes that broke when you read them in, because they were so enormous, you know; they were too heavy. It was a real saga!
I went to a summer school in Oxford, which was where they had lectures on computing. They were rather high-level lectures on computing—not very practical lectures, but rather interesting lectures by some very good people—but basically, I was just learning on the job, and I became a user. This was ‘64. Then in ‘65 I got a research fellowship at Newnham, which gave me more independence; I didn’t have to work full-time for the Unit. I still had a connection with the Language Unit, but the Unit was a little bit difficult, because Margaret Masterman was a very interesting woman and very stimulating, but she was quite difficult to work with, and I wanted to have some more independence. So what I did was, I worked as a Research Fellow at Newnham and, at the same time, was responsible for running a retrieval project that was funded at the Language Unit—so I was doing both of these things. I was trying to pursue my more theoretical interest in semantic classification—synonymy and semantic classification, which had been my thesis work—and at the same time, I was doing this more practically oriented experimental work on “Can I build an automatic thesaurus for retrieval purposes?”
So I was doing those in parallel.
The research fellowship was marvelous. The women’s colleges—well, in fact, the men’s colleges, too—have research fellowships. They’re essentially for post-doctoral level, and they really are something that’s great to have, because you’re very unconstrained. You know, you write maybe half a page once a year to say what you’ve been doing. They’re very competitive; there’s an application competition, just like these sort of things are nowadays. But it was a wonderful opportunity, because it gave me a freedom and a status to do what I wanted that was very nice.
Then I was a practical computing person. The project I had for retrieval had a research assistant to do quite a lot of the heavy stuff, because some of the heavy stuff I couldn’t have done, because it was more mathematical than I could do, and it just required [someone] properly trained as a programmer—somebody who’d done a whole year’s course, and knew really how to do it. I mean, I like programming; I enjoy writing programs; but I’ve not been trained to do it, so it was slow. But I enjoyed doing it, and I thought it was useful that I could do it, so we shared the work, and I did quite a lot of it. And the way I used to work was, I had a room in Newnham—like Virginia Woolf, you know, the room of your own—and then in the lab I had a desk. There was a kind of communal user area for users who were very intensive users and hadn’t really got anywhere else much to do their computing work—because of course we had to come to the machine then; there was nothing in your office or anything like that—so I sort of colonized this desk there, along with a motley collection of other people, and had my little stacks of computer printout, and I’d be doing submission of jobs and things like that. By that stage you mostly weren’t submitting them yourself, as with EDSAC; you were putting your paper tapes into little plastic bags with a job card—and you had different sorts of job cards according to whether it was a little job, or a big job, or a very, very big job—and then you hung these plastic bags on a little rack with hooks on, and the operators would come collect them and schedule them. So that was the way it worked.
The fellowship was three years, and the project I had was three years, too, and then at the end of that I got a research fellowship, a Royal Society research fellowship. They had some fellowships, the so-called Scientific Information Research Fellowships. That was in ‘68. That brought me to the lab full-time, because if you got a Royal Society fellowship like that, you had to be accepted as an inmate of some department—you know, you had to have somewhere to work. These things are quite standard: if you have these fairly prestigious kind of fellowships, you go roost in some department, and usually the people in the department (if they’ve got space) like it, because such fellows are good to have. So I stopped my connection with the Language Unit—partly because they were working on things that I didn’t find so interesting at all by then; I think they’d moved in directions I didn’t find so interesting. I moved to the lab, and I became full-time based in the lab.
I was working on my research, and that funded project must have finished around ‘69. I wrote my book then; I wrote a book about some of the retrieval experiments I’d done. I did work by myself, and I had odd people doing bits of work for me because they were interested. I had some work done by somebody who was working as an operator at the C.A.D. [computer-aided design] center, who was completely underworked there and was interested in retrieval. He said, “I’m interested to do some experiments for you, because they really don’t give me enough to do.”
Abbate:
Wow!
Spärck Jones:
So he did this. It was quite useful; he did things for me.
The fellowship was five years, which was nice, and I continued to be interested in language, but there was simply no funding of any sort for work on what we then would have called “computational linguistics”—natural language processing. I tried to maintain an interest in it, and I went to a few meetings, and during the mid-’60s I had some papers and things, but essentially ALPAC had really done a lot of damage to that whole machine translation environment. A lot of the projects were cut. In ‘66 I went for six months to the States: Roger went to work at Rand Corporation, and I went to System Development Corporation, so I was there for six months. That was when I was working mainly on the language stuff. But basically, in the late ‘60s it got very difficult, because a lot of projects were finished. AI was starting up, and they were getting interested in language processing, but in a very different point of view; the whole field looked very different. So if you wanted to do some research, and do things that actually were quite interesting, on the whole it had to be in retrieval. There was some funding in the U.K. for work on retrieval. British Library had a Research and Development department, which funded some good projects. They were sort of manna in the wilderness to everybody else—because another problem was that the U.S. agencies had got to a point where they wouldn’t fund outside the U.S., so that was difficult. The Language Unit had had money from the [U.S.] Air Force, and from the Office of Naval Research, and all kinds of things, but it got more difficult.
Abbate:
What did you do at SDC?
Spärck Jones:
Basically, it was like a sabbatical leave. I talked to the people there about all the things they were doing; there were various people who were doing various things in language. I was actually trying to write up some of the stuff—develop some of the ideas I had in my thesis, and I was interested in applications: thinking about what semantic classification ought to be like and how you might use it for language processing and things like that. It’s hard to find a big thing that came out of it, but I was just thinking about it, you see what I mean? It turned out, in the late ‘60s, to be a rather difficult line to pursue, because that was the period in language when everybody was dead-keen on syntax; it was all the Chomsky period, you see.
Abbate:
Oh, yes.
Spärck Jones:
And so if you weren’t terribly focused on syntax, and you didn’t think that Chomsky was the greatest thing since sliced bread, nobody would really take any notice of you; so it was very difficult. If you wanted to do semantics—meaning, semantic classification, the lexicon—they just didn’t want to know. It was like a religion! [laughs]
I remember going to the Summer Institute of Linguistics, or whatever it was called, at UCLA in 1966, and Chomsky was giving lectures at it. It was just when Aspects[1] had been published, and people went round with it under their arms, and if you didn’t—if you dared to criticize or raise any questions about Aspects, people looked as if there were something wrong with you! [laughs] So there was very much a feeling that syntax was where it was.
I was actually doing a lot of things. There were a lot of miscellaneous things I did in the ‘60s. I gave some lectures, and had some interaction with people in the linguistics department, and did some classification in things like archaeology and stuff like that; there were some interesting questions there. I did a variety of things. But I didn’t really—which I was a bit sorry about—I didn’t really develop the ideas I had in my thesis and make a book out of them. I was very isolated, actually. There wasn’t anybody much to talk to. I mean, there was no . . . Linguistics in the University, at that stage, had a Lecturer in phonetics, and they had a Lecturer in general linguistics, and that was all. There wasn’t any environment; there was nothing much to interact with. Now, of course, I might have been a bit more determined and pressed ahead with writing this book, but it’s actually quite hard, in the absence of any kind of interactive environment.
So I tended to focus on retrieval, and there were some very interesting problems in there, and so all during the ‘70s, I had projects: one in the first part of the ‘70s and one in the second part, with research assistants, doing a lot of experiments on how to do information retrieval. During that period I published some stuff, with colleagues and by myself, that has really stood the test of time. There are some papers which are very much cited. That was something that went okay. And the experimental work we did: we were trying to do systematic experiments, not just have an idea and try it out in a sort of two-bit way, but actually try to do some controlled and systematic [experiments], rather like you’d do in the natural sciences. Do some proper experiments: design it properly, do the comparisons properly, do the performance measurement properly, and do all those kinds of things.
Abbate:
So you were comparing different algorithms?
Spärck Jones:
Yes, that’s right. Supposing you say, “Well, we think we’ve got index terms, and we think we might weight them according to different weighting criteria.” That’s the sort of thing, you see. And it’s actually quite demanding, because what you’re doing in retrieval experiments is: You’ve got a collection of documents, and you’ve got some users. I never dealt directly with users; this data had been gathered by other people. There was a very famous project in the ‘60s in the U.K. called the Cranfield Project, which had gathered some primary data. So users had put queries, and then somebody had retrieved some documents. In fact, in principle, if you had a small file, they would look at every document in the file and say whether it was relevant to the query or not—to the user’s initial request; relevant to the need in their head, if you see what I mean. Then what you had was: you’d got the document file; you’d got these user needs, as expressed in natural-language (English) requests; and you’d got their judgments of which documents were relevant to their needs. So what you’re then trying to do with the retrieval system is design a system that will deliver the documents that are relevant to those requests. And you choose different indexing strategies, different whatever-it-is. That’s what retrieval’s all about. That’s what all these Web search engines are about, you know: How do you index? How do you search? How do you do all of these things?—in order to deliver the documents that will satisfy the need that’s in the user’s head, and which he’s expressed extremely badly with a one-word query! [laughs]
Abbate:
Those search engines don’t work all that well.
Spärck Jones:
Well, it’s not surprising. The fact is, when people say they don’t work very well, in part it’s not surprising, because the web is full. It’s got 1.3 billion pages, and an awful lot of it’s dreck, and if you put in a one-word query, it typically is not going to work very well. It’s just a fact of life. I mean, if I put in the query, “weather,” or even “Clinton”— even a name—there’s tons of material there. I do searches for my I.R. [information retrieval] lectures; I do some practice searches so I can explain to students what happens when you do this sort of thing, illustrate the problems; and I was doing searches on the development of logic and George Boole: and even with four terms, relatively concrete words, you get an awful lot of stuff that really isn’t any good. But that’s just what life is like! [laughs]
You can get precision, which means most of what you get is okay, but at the cost of recall—namely, that you miss most of what your ought to be getting. That’s a kind of fact of life.
Anyway, I was doing all these experiments, and that was a very interesting time, in the ‘70s, in many ways. We thought about how to do experiments, and I had some ideas—I did a collaboration with Steve Robertson on “relevance weighting”—which worked quite well. On the whole, I was still doing programming, but I had these two research assistants, Graham Bates and Chris Webster, and they’d been properly trained in computing. We hadn’t got a full three-year degree course, but they’d done the diploma or they’d done the third-year computer science specialist option, and they’d got maths background. So they were much more able; and also, that was their job full time—I had to do other things as well, but they were doing it full-time; and it’s much more sensible to have people like that programming for you than to do it in a slightly amateur way yourself. But I always thought of myself as a computer-motivated scientist. You know, I’m doing things automatically, because I think that’s interesting. It’s not only practically interesting to do things computationally—you know, it would be nice to have all the data retrieval systems and that sort of thing—but there are really interesting intellectual challenges about “Can you capture this process in such a way that it can be automated?” That’s why computational language processing—natural-language processing, as it’s often called nowadays—is really interesting.
Abbate:
It’s a discipline.
Spärck Jones:
Yes! It’s a way of thinking about it. I mean, it really is. You can make a generalized statement: All words in a natural language are ambiguous; they have multiple senses. How do you find out which sense they’ve got in any particular use? Well, obviously it’s got to be something to do with context; we can all see that. The question is, how does that work? That’s the thing: you want to say, “What is it? How do we characterize the properties of the words in the lexicon and the syntax of the language?” And also we have a process, which enables us to actually process this string of words in some way, so that we’ll get its syntactic structure; we’ll get its semantic structure. In the process of that, we’ll find out what are the meanings the words have. And that’s what the challenge is; it’s thinking about how you actually capture a process. It’s not merely that you’ve got to describe things in the right way in a static sort of way, but this is a process—language understanding is a process. Summarizing—and I’m very interested in summarizing, automatic summarizing: this is an amazing challenge. I mean it’s really very, very difficult. It’s very difficult—and it’s fascinating!
So I was working on these, and I worked on various things. I worked on word relations. I was always interested in the language side as well as the I.R. side; I’ve always had this connection in my mind. One of the things that was interesting was trying to figure out why things weren’t working. You know, you said, “X ought to work like this”; “You ought to be able to do retrieval in this sort of way”—and then you’d find it wouldn’t work, and “Why didn’t it work?” is a very interesting question.
I’m trying to think whether there’s anything else that was worth mentioning about the ‘70s. . . .
On Being without a Proper University Job
Abbate:
Were you in the—I don’t know if it was called the Computation Lab at that point . . .?
Spärck Jones:
It was called . . . We changed it: at some stage it became the Computer Laboratory as opposed to the Mathematical Laboratory. It started out as the Mathematical Laboratory.
Yes: what happened after my research fellowship was finished [was that] I was paid off these grants. The grants actually had a nominal principal investigator who was somebody else—the sort of front person—so then I was allowed to be paid off them, and I had the status of what’s called a Senior Research Associate. I was not a member of the faculty in the full sense—an established university officer.
Abbate:
Did you have to write the grants yourself?
Spärck Jones:
Oh, yes. I mean, it was all me; the front person was really a front person, if you see what I mean. It was just the mechanism you had to have.
I sort of became accepted here. It was partly because Roger was here; he’d had a university appointment from about ‘62 onwards. I mean, Professor Wilkes would have had to boot me out if there hadn’t been enough space; but I think he thought the work I was doing was all right, and there was enough space, so he could make a home for me. But it was a difficult position in some ways. In some sense it was useful, because Roger was working here. During the ‘70s he was made a Reader, which is a promotion from a Lecturer on the basis of personal research. I can’t remember the exact dates when it happened, but he was a member of the lab’s not-very-large staff, and he was doing pretty good work and all that kind of thing. But I did suffer from the problem that the lab was very small. If you read the history in the yellow book[2], you’ll see that they had not very many established faculty. It was expanding its teaching gradually, and things like that. But Cambridge was, in many ways, not user-friendly, in the sense of women-friendly. It was a masculine university. There were more women in the humanities subjects, but still not a lot. There were still people around who just thought that women couldn’t do it, you know. And of course, it would have been very difficult for me to compete for any actual jobs that there were, because I hadn’t got a proper background in computing. I was not a professional computer scientist in that sense; I was in a strange sort of related area.
So I was in a sort of fringe position, but because Roger was here, it was difficult for me to think, “Well, I could go and get a better and a proper position somewhere else!” It’s the usual sort of stuff. I mean, we didn’t have any children, so we did at intervals think about, “Should we move to somewhere else where we could both get a job?” We thought about it quite hard at various stages. It probably would have meant going to the U.S. Roger had been consulting there right from the beginning of the ‘70s, so we had a lot of connections there, and that’s where we infallibly would have gone; and we did look at it at intervals. But it got a bit more complicated, because of my mother, who was getting quite elderly; in the sixties she came and lived beside us, and so it all got rather difficult. On the whole, Roger was doing okay; he always had some good projects; and I was doing okay, in the sense that—I had no job security, but I was able to get these research projects that I was interested in doing. All those ones in the ‘70s were funded by the British Library R&D department. They were very enlightened. They were okay.
So on the whole, looking back on it, I think that it was difficult that one didn’t have the security and the status of having a proper university job. Nevertheless, one was able to get by. It had one upside, which was that you could do the research and didn’t have to do anything else, so if you didn’t want to teach, you didn’t have to.
Abbate:
Did you do any teaching?
Spärck Jones:
I gave ad hoc courses—lectures—intended for graduate students. It was like four or six lectures, you know, about language processing, things like that. In fact, I did them at intervals all through the ‘70s, I guess, and I began to have research students in the ‘70s. What was interesting was, I had my first research student in ‘75, and by the end of the ‘70s I’d got three or four, and they were all in language processing, because I didn’t think I.R. was a good subject for doctoral research. I had some very good students. That was fortunate.
Returning to Language Research
Spärck Jones:
But then at the end of the ‘70s I said, “You know, I’m really fed up with doing I.R., because I’ve done a whole lot of incredibly taxing experiments.” Systematic, comparative experiments are hard work, you know! So I said, “I don’t want to do any more of this. I want to get more into the language area.” Because language was coming back into the foreground again, after having been sort of disappeared; it was rising up again in the ‘70s, and people were figuring out how to do things, and it was getting kind of interesting. So I said, “Well, I want to do this; I want to do some more language research.” I’d edited a book on information retrieval experiments, which I thought it had some pretty good stuff in it, so that was sort of, “Okay, that’s my lot for I.R. for a bit; I’ll do some language work.”
So I got a project. By that stage, the U.K. Research Councils were sort of vaguely recognizing that they might fund some stuff in this area, so I had a project, which was funded by the Science and Engineering Research Council, on natural language front-ends to databases, which was a popular subject then. A lot of people were working with it. They thought it was a sort of tractable application of natural language processing. In fact, it turned out to be much more difficult than we thought—but anyway, that’s what we did! [laughs] My first research student, Bran Boguraev, who now is at IBM in Yorktown Heights—he was my research assistant on that, and we had a three-year project for that.
Again, that was a thing which was fronted, and so I was able to be paid for it, too. That was an interesting project! We had a lot of fun with that.
I was continuing to have research students, and then quite a lot of things began to happen in the ‘80s. The U.K. government— You know, everybody was frightfully steamed-up about the Japanese Fifth-Generation Project . . .
Abbate:
Oh, yes.
Working on the Alvey Program
Spärck Jones:
. . . and so people said, “My God, we’d better do something.” So in ‘82, ‘83, the U.K. decided it was going to have an initiative—you know, the ground swell came up and said, “We’d better have an initiative.” It turned into what was called the Alvey Program; it was named after John Alvey, who was the chairman of it. I got involved in this because the Science and Engineering Research Council had been thinking independently, just before this, of having a so-called “specially promoted program.” It would be funding in a particular area—they wanted to have a particular push in it—and it was going to be in “intelligent knowledge-based systems,” which would cover language and a whole lot of AI-type stuff as well. They didn’t want to call it “artificial intelligence,” because there was a lot of hangover from artificial intelligence and intelligent machines and all these things being sort of bad vibes, so they called it “knowledge-based systems.” It was meant to be things that industry might use, but actually it was going to fund a lot of perfectly good theoretical research—basic research.
Abbate:
Were there a lot of people doing AI here?
Spärck Jones:
Yes, Edinburgh had quite a lot of activity, and there were other people scattered around. Not in Cambridge; we didn’t have AI activity in Cambridge; the lab was still quite small then; but, there was quite a lot of activity scattered round in the U.K.
Anyway, so there was a proposal to have this specially promoted program, and I got involved in a committee that was thinking about how this should be designed, and then that sort of got rolled into Alvey. What they wanted: Alvey had a number of sectors—very large scale integration, and various other things like that, and then they were going to have IKBS [intelligent knowledge-based systems]—and so I got taken on as a short-term contract to essentially do the dog-work to develop the background for what this intelligent knowledge-based systems sector should be. You know, what sort of thing it should be working on, and what the problems were, and all this sort of stuff. So I did a lot of groundwork on this three- or six-month project. I was kind of seconded from the other things.
That was quite interesting, and that was the beginning of a period of about ten years in which I did what I describe as heavy-duty public service. Because I was then on the Alvey IKBS Committee, which had a lot of money to spend, you see, and so we were looking at project proposals and giving grants and stuff like that; and then as well as that, within that, we had “fields,” and one of them was natural language processing, and I was the gauleiter for the natural language processing thing, which meant that I had budgets for having workshops and for generally encouraging people to do NLP, and we had newsletters and all this sort of stuff. So basically I was pushing along all the time, trying to get natural language processing work, and encourage them [to get] off the ground and come to workshops and put in grant applications and all of these other sort of things. It was quite exciting, but it was very hard work. The Alvey Program lasted for five years—’83 to ‘88—and they didn’t have a proper follow-up, but there was some follow-up program which was essentially oriented towards promoting collaborative work between industry and academia, and I was involved in some of the committee work for that. So that was running until about ‘92, I guess; ‘91 or ‘92.
Abbate:
Was the Alvey Commission paying you to do this?
Spärck Jones:
No. Well, they paid me to do that particular sort of preparation job for the original program. No, no, no! All of this stuff is just public service; it’s the way you do it. It’s like being on NSF committees or something; you get your expenses paid. You just do it.
Abbate:
So you were just doing that in your spare time?
Spärck Jones:
That’s right, yes. You think it’s a worthwhile thing to do, and so you do it. And looking back on it, I think, “My god, I really worked hard on some of that stuff!” But it was kind of fun, because there was actually quite a lot of money around, and it was like manna in the wilderness for people who were already being battered by Mrs. Thatcher’s attacks on the universities, and no funding, and things like that. And here was something where you could actually . . . And people from different areas, and the whole of AI and language processing—they got together, and there were these joint workshops, as well as just NL ones. It was actually quite a positive time. I interacted with quite a lot of people in AI that I’ve not seen very much of in more recent years, and things like that.
But the NL: we had some grants for getting natural language processing tools. We said, “Look, it’s idiotic that everybody should build their own parser, and build their own grammar; you don’t do it very well on a small grant.” So we had some tools projects, which were for the community. We got a morphological analyzer, and a parser, and a grammar together, and a lexicon environment, and things like that; and they were tools that the whole community could use! They were good.
There’s another aspect of this which was very nice, which was that the lab was able to expand—in fact, other universities were, too—because they decided to have some earmarked money to boost the staffing of U.K. universities. Roger had become head of department here; Maurice [Wilkes] retired in ‘80, and Roger had become head of department then, and the lab had got, I don’t know, eight or ten staff—and he got five new posts! Five! Five posts to take people on board. It was really great! It was a wonderful time. It’s mentioned in the history of the lab, you know. It was a wonderful time, because you could do things, and you weren’t having somebody looking over your shoulder all the time to see whether you were A) doing the right thing and B) doing it in a cost-effective way. [laughs] So, it was a very exciting time.
But then what we decided to do was, we thought that the time was right for . . . This was an initiative, actually, originally taken by Frank Fallside, who was a Professor in Engineering; he’s dead now, unfortunately. They do speech work in Engineering—they’d always done speech processing; I think it just grows out of signal processing—and he said, “Isn’t it about time we got together and had a master’s course in computer speech and language processing?” Because the Research Council was also interested in funding courses. You know, “Let’s improve the manpower supply issue, so let’s have courses”—all this sort of game. And so we got together a joint course on computer speech and language processing. They couldn’t fund it the first year, because they said “It’s not a conversion course (or something); but we like it so much that we’ll try and fix things so that you can have it.” So the next year, we got it, and that was very good, because we were able to get posts—both in engineering and here—dedicated to natural language processing. So we hired Steve Pulman here (Steve is now a Professor at Oxford) and Steve Young in the Engineering Lab. The course began in 1985; we had the first intake of students in 1985. It’s a one-year, full-time master’s course in computer speech and language processing, and it’s always gone down very well. We have very good students—they supply us with R.A.s and research students—and they it’s always had a very good press. It has people from abroad; it’s not only the U.K. It’s really good.
A Range of Projects in the 1980s
Spärck Jones:
So that was the first time I began to do any significant amount of teaching. I was doing some AI courses jointly with some other people in the lab in the early ‘80s, because we thought we ought to have a bit more AI, and so I actually managed to get together a lecture on robotics, and those sort of things. I quite enjoyed that; it was fun, you know! But then, with the M. Phil., I was lecturing on it—we had Steve, and I was lecturing on it, and I had to teach properly for that: examine, and admit students, and all of those sort of regular things.
But at the same time, during that period—I still hadn’t got a faculty position—from ‘83 to ‘88 I had a fellowship funded by GEC. GEC is General Electric Company; they used to be a humungous company; they were a mega-company. They’ve become less “mega”—in fact, they’ve disappeared now; they’ve broken up. But basically, they were a bit like—they were the U.K.’s General Electric. And I’d got to know the director of research, and he seemed to think that I was a sufficiently good thing that when I hadn’t got anything to live on he came up with a fellowship for me. That was quite handy. So I was doing research, and then I had some projects that were follow-up to the database front-end thing.
[TAPE 1, SIDE 2]
Spärck Jones:
So I had a project also which was doing some rather interesting things with language processing, actually in connection with I.R.; it was a sort of exploratory project. I had quite a range of activities in the 1980s. I was also editing the Morgan Kaufmann readings with Bonnie Webber and Barbara Grosz on language processing[3], and teaching on this course, and doing all this public service work. So there was quite a lot of things going on; I managed to maintain a variety of activities. I had a project which was technically a service project: I said the only way to boost activities in departments is to not only have specific projects but [also] have infrastructure support, so in some places you might actually support a programmer. And they bought this idea, and so I was able to get a programmer! [laughs] But that was very good, because I’d never had any women; up until that time I had only men students. Well, I had Anne Copestake, who’s now a Lecturer here. She went to the States, and then she came back here as a Lecturer, and she was my R.A. on this project. We were doing things like inference in support of database access, and we were interested in knowledge representation for this and inference for this kind of purpose. So that was quite interesting.
Abbate:
She was the first research assistant you’d had who was a woman?
Spärck Jones:
Yes. I’d had men before that.
As I say, there were all these things going on, and it was quite an interesting time, and I got interested in a lot of things. Then I got interested in user modeling. It’s a bit opportunistic: people come along with these interesting challenges, or they ask you something, or something arrives, and you say, “Hey! That’s interesting!” In fact, I got interested in front-ends to expert systems because I knew some other people working on natural language front ends to expert systems, which was a bit more tricky in principle—because it’s like inquiry or advice, you see; it’s not just database access. And there were very interesting problems raised in that. If somebody asks a question, you can only answer it if you know why they’re asking the question; so can you infer why they’re asking it from the way they phrase the question? So it’s all this issue of user modeling. So I had a big spasm of interest in user modeling. I didn’t have a funded project on it, but I had some interest in user modeling in the late ‘80s, which led to some papers and things like that.
And then the lab became a home for Julia Galliers, who got a post-doc research fellowship, I think it was—EPSRC post-doc research fellowship. She’d done some very interesting work on modeling changes of belief. If we have a dialogue with one another, all the time you’re changing your beliefs: about what it is the other person’s saying; what the state of the world is; all of these kind of things. And what you want to do is to model what beliefs people have got and how they change, and what you’re particularly interested in is which ones they change, and why they change them. Supposing somebody says something to me, and it can be interpreted in different ways, and some of these ways will be compatible with one lot of beliefs, and the others will be compatible with another lot of beliefs. Which ones do I choose, and what are the general criteria I apply for choosing one rather than the other? So this was the kind of thing, and she’d done some rather interesting work for her thesis—elsewhere, not in Cambridge—on that. So we thought we’d have a project which was going to put these ideas to work in the context of information retrieval, because if you have a dialogue with a librarian: [Let’s say] I come along and I say, “Have you got any books on Michelangelo?” and the librarian says, “The books on sculpture are over there.” Well, you see she’s made some assumption, on the basis of what she knows and what she believes I’m interested in, that I’m interested in Michelangelo’s sculpture. In fact, I might be interested in the “Last Judgment”; so we’d go through a whole lot of this dialogue. And we thought it would be a rather interesting test context for this, because you can constrain it a little bit; also, there were some people at City University who’d done some rather interesting work on the sorts of dialogues that people have with librarians—task analysis, you know, and stuff like that. Anyway, then she got this research fellowship, so she joined me as P.I. on this project, and three of the Research Councils had had a joint initiative on supporting research in cognitive science, so this was funded under this cognitive science initiative; this was at the beginning of the ‘90s. It was quite interesting; it was a rather difficult project, [but] it was rather interesting to do. Julia eventually left and went away and did something else, but it was kind of interesting at the time.
On Getting a Proper University Job
Spärck Jones:
Now in the meantime, I’d become respectable, because in 1988, people around the lab said, “You know, it really is a disgrace that Karen hasn’t got a proper job!” [laughs] People sort of stirred slightly; and of course Roger was in a very difficult position. It was actually quite difficult while he was head of department; we had to have kind of protocols about how to behave, you see—arm’s length, you know—and we had a lot of rules about not talking about things that it was improper to talk about, and stuff like that. We had to be very careful about it.
Abbate:
Is that why you used your maiden name?
Spärck Jones:
I always think it’s a good thing to do anyway; it maintains a permanent existence of your own, if you see what I mean. So I always did use my maiden name for publication.
Abbate:
Did you use it in personal life?
Spärck Jones:
No, I don’t—because at the time, when I was young, you just assumed that when you got married, you took your husband’s name; I mean, that was the way it was. It has actually been a nuisance to me, because it meant that when I did my Ph.D. I was registered as “Needham” at the university. The university was sort of rather old-fashioned, you see, and it keeps logs of you and follows through all your status. And when I was at Newnham for my research fellowship, they also thought of me as [Needham]. I wasn’t really alert enough about the ramifications of this. It was very difficult to have a professional existence. You could publish under a name, but the idea of having a complete professional existence under some name other than what you might describe as your legal name . . . And you don’t have to take your husband’s name when you’re married in this country, but everybody thinks you have, you see. Like my passport name, and all of that sort of stuff: that was “Needham.” And so I had that for my personal [life], even in official environments, and I only published under the old name. But, it was only sometime in the later ‘60s, I think, that the university first appointed a woman under a professional name. They said, “We appoint as Reader Polly Hill—bracket [‘Mrs. Blah’] . . .” [laughs] And now they’re used to it, but it was . . . And I regret at the time I wasn’t more tough about it, but it would been very difficult at the time; you know, the assumptions were quite different.
Anyway, so in the ‘80s, people said, “It’s about time you did something about Karen,” and so they actually made me a so-called Assistant Director of Research—which was just a research-oriented position; it doesn’t mean I’m the Assistant Director of Research. So that was okay. So then I became, as it were, completely respectable in the university’s eyes, because I had a proper established . . . Well, it wasn’t a tenured job—it only lasted for five years; it was still not quite as good as a Lectureship—but anyway, it was okay. So then I was sort of part of the faculty, and that sort of went ahead fine. I was doing this research on the belief revision; the M. Phil. was going well; I stopped doing public service.
Serving as President of the ACL
Spärck Jones:
I guess the next thing that was significant was that I became President of the Association of Computational Linguistics. That’s the international body; most of its members at that stage, which was in ‘94, were in the U.S. It turned out to be rather an energetic thing, because the long-term Secretary-Treasurer had died, and so I had to take over the whole administration—not all of it, but quite a lot of the significant stuff—while they found a new one, which turned out to take quite a long time. So that was rather a scene.
I’d always had lots of connections with the U.S.; I know lots of people there. I go there a lot, and go to conferences, and take part in meetings, and that sort of thing.
Abbate:
Did you have a particular agenda as President, something you wanted to do? Or just keep it running?
Spärck Jones:
The agenda was to make sure the whole thing . . . There were a lot of uncertainties about the finances. I mean, there were subordinate agendas, like broadening its outlook in various ways, but basically the agenda was to make sure that the show went ahead in good shape. So it was very important to get a proper Secretary-Treasurer, and of course we had commitments to conferences and a whole lot of things like that. And also, we were changing, because there’d been an assumption that Don Walker, who was the Secretary-Treasurer, was going to stop doing quite as much of it, and his wife was going to stop assisting him, and we were going to have a full-time office manager. But I was doing all the negotiations with the incoming office manager; she couldn’t start for a while, you see, and so it was a very difficult transitional period. And it was very unclear what the state of the money was; I mean, we had no idea. I was also trying to set up quite a lot of new practices, because Don was kind of—he kept a lot of it in head, you know, and I said, “We’ve got to have a proper basis for the finances.” You know, cost centers and stuff like that; I did lots of documents on things like how to do a proper cost center analysis of the ACL. Oh, it was terrible: they had somebody who wasn’t very senior running the conference—and again, I had to monitor, because I anxious about the money, because I really didn’t know where the money was—and it was my nadir, I think, when she sent me a signal saying, “Somebody says, ‘Can they bring their child to the conference banquet, and is it all right for them to pay half price?’” And I thought, “My life has got better things to do than this!” [laughs] Anyway, we did it.
Working on Automatic Summarizing
Spärck Jones:
But I had lots of other things going on. Barbara came here as a short-term visitor in ‘92; Barbara Grosz. And then we had the belief revision project, and then we got interested in . . . It’s still been quite difficult to have a purely natural language project. That’s not quite true; I did have a project on automatic summarizing, which did some interesting work. We weren’t able to do automatic summarizing; we were doing more foundational work. You know: What is it about the structure of a discourse, or a mono-text—what structure has it got that you can exploit to summarize it?
Abbate:
So, this is sort of abstracting?
Spärck Jones:
Yes: abstracting or summarizing. I mean, a text—a long text—has got a structure. You use the structure; the structure does signal to you what’s important about it. So there are many different candidates for what sort of structure you could have: it can be a structure about things in the world; it can be a structure of the intentions of the writer; it can be a sort of linguistic structure—things like that. Things like listing something, saying “first, second, third”: that’s a presentational structure, you see; a rhetorical structure. It was an interesting project; I wanted to follow it up—I still want to follow it up. And so what we were doing was about: If you analyzed things—and you couldn’t do it automatically at that stage—if you analyzed for different sorts of structure, how you might use that structure to tell you what was the important information for summarizing purposes. So it was a rather basic research exploratory project. I had it jointly with Steve Pulman. We had difficulty in getting research assistants for it, so it was a bit interrupted, but it was quite an interesting project, and I’d like to do more on it. Steve had a lot of other things to think about at the same time, because he was running the SRI Lab in Cambridge (an offshoot of SRI in Menlo Park, CA).
Working on Spoken Document Retrieval
Spärck Jones:
So I had that, and then . . . I always had research; I like doing research—it’s fun! Then I got together with the engineering people. I think the stimulus came actually from Andy Hopper, who was running what’s now the ATT Lab. He’d had an interesting project of video interaction and video mail, and he’d been recording all the speech; he’d been recording it all, and of course he got all the speech. So then you say, “How can I retrieve something? I know that somewhere there’s a video message about how to do cost centers for the ACL (or something): How can I retrieve that in the speech?” This was something which nobody had really looked at; there were one or two little beginning things in the early ‘90s, thinking about how to retrieve from recorded speech. Basically, people were getting interested in it because there was much more recording than there’d been before, and speech recognition was improving, so you could begin to get a reasonable quality of transcripts—since you were doing retrieval on transcripts, you see.
Abbate:
Ah, I see.
Spärck Jones:
Because of all the things like the DARPA initiatives, and also lots and lots of improvements in processing power and storage capacity and things like that, people were beginning to make some progress with speech recognition. I mean, this is not speech understanding; this is simply recognition in transcription.
So I got together with Steve Young in Engineering, and we’ve had two successive projects now—three-year projects—on spoken document retrieval, which has been very interesting. They’ve been leading-edge projects, and it’s been a very interesting thing to do. On the whole, you can do pretty well. If you’ve got good quality of transcription, then you can get retrieval performance which is almost as good as from original text. You would calibrate it by doing—For example, you have human transcripts of the material, and then if you say, “Well, we’ll retrieve from all that as the true text; and then we’ll have our own transcriptions, which we’ll get from our automatic speech recognizer, and we’ll retrieve on them; and if we can retrieve with as good retrieval performance on them as on the human transcript—okay!” That’s fine, isn’t it? Even though the transcriptions we produce are not as good as the human ones. Retrieval’s quite a coarse task, so it’s all right.
Participating in the DARPA/NIST Text Retrieval Conferences
Spärck Jones:
So we’ve been doing that. Well, I also got involved, about ‘92, in the DARPA/NIST[4] text-retrieval conferences. These are one of these huge research initiatives that they—NIST has put a large amount in it; it’s not led from DARPA, it’s led from NIST. And Donna Harman at NIST has done an absolutely fantastic job in running this whole program. It’s the most exciting thing in retrieval there is; it’s a huge research program in information retrieval, which we’ve been running for ten years. We have annual cycles with evaluations. Basically, it’s a set of I.R. tasks. Originally it was just one or two tasks, and then gradually we’ve branched out into other tasks, which generate supporting tracks, as they’re called. The main task for about eight years was just doing mainstream ad hoc retrieval—you know, “Give me documents on whatever-it-is.” But the main thing about it was, it was being done on a much larger scale than anybody had been able to do before, because you have this central funding to do this business of getting your requests and doing relevance assessments. Now, you can’t examine 350,000 documents to see if each one is relevant to your request. You have to have humans to say, “Is this document relevant to this request?” And you can imagine, on that scale: we have something like 50 requests and two thousand documents are assessed per request, so that’s quite a lot of human work somewhere in there.
Basically, there’s the whole program design, which is led from NIST—first Donna Harman and then Ellen Voorhees; and there’s all the data provision; and we have a conference every year; and there’s a competition. Essentially what you do is, you have training data, which is usually last year’s stuff—you get some training material, and then you get documents to work on, and things like that. Then you get the new set of test requests, and you run them, and you send your results in. They tell you later how well you did! [both laugh]
Abbate:
Now, if there’s all these different groups working on this huge project, have you developed some sort of standard methods or software? It seems like there would need to be in the community some sort of . . .
Spärck Jones:
Well, essentially what’s happened is that there are various . . . Some of the DARPA competitions are very, very heavily competitive, because the people are funded to take part in them, and so they want to justify their funding by doing better than everybody else; but this has been much more open, because a lot of people aren’t funded—they’re not funded by NIST or DARPA. NIST just essentially provides the central infrastructure, and then a few people may be funded independently by DARPA to take part, but that’s just independent. So basically people are taking part because they can get some funding from somewhere and they’re interested in doing it. So it’s been a very large community. Over a hundred teams have registered for TREC [Text Retrieval Conference] this time: a hundred teams. They don’t all finish, but you could have something like 60 or 70 teams taking part in TREC. And the technology has advanced a lot, because if you have that number of people trying out ideas, what works and what doesn’t work becomes fairly obvious. What you see after a while—It’s the same as with speech recognition: You get a convergence on a particular technology after a while, because you find it really works quite well. Now, I.R. is a more complex thing than speech transcription, so there’s not so much convergence, but there are some technologies of which people have said, “These work pretty well. They’re not necessarily the best you can do, but they’re easy to do, and they work pretty well.”
Abbate:
Interesting . . .
Spärck Jones:
So the technology, the ideas, get adopted. Somebody says, “Well, so-and-so team: for two years running they’ve been doing that, and it seemed to work pretty well! Maybe we should take it on board.” And so people do that. On the other hand, because so many people participate . . . And it’s not just “you’ve got to win the competition”: we have rules. You’re not allowed to go around boasting about winning and things like that. In fact, we had a lot of fracas once, because somebody actually engaged in some sort of publicity that said, “We won the TREC competition,” which A) wasn’t true and B) you’re not allowed to do it.
But the other thing is that if you have a lot of people taking part, a lot of different things will be tried. I mean, an individual—like the experiments that I was doing: I could only try as many things as me and one research assistant could try. Well, here we’ve got fifty teams with five people each and a whole lot of machine resources, all trying things at once. You can get a lot more [techniques] compared. The comparisons are controlled, because there are standard measures: you are all using the same data, and there are standard performance measures, so you see you really get a picture about what’s actually happening. And there’s a lot of sharing; people will form collaborations between the teams. There’s [also] some software which is actually public domain; we can all use it (or it’s quasi-public domain, in the sense that you could have it for the cost of a CD-ROM or something). [For example], there’s a standard retrieval engine you can get from NIST, and it’s quite a good retrieval engine. If you haven’t got one, you can use it—that sort of thing. People do quite a lot of shipping around of things; they’ll share resources. They’ll say, “We’d like to try this” and “Joe, will you lend me your morphological analyzer?” or something like that. Of course, there’s competition there, but there’s a lot of community cooperation. It’s been a very rewarding experience, and we’ve learned a lot.
You see, it’s full text retrieval. Earlier retrieval experiments were really doing things like working on abstracts; not much on full text. Well, this is full-text stuff. I used some of it; I did some experiments in the last few years, with Steve Robertson, to write a major paper on our ideas, and testing on TREC; and we were working with, you know, hundreds of thousands of documents and a hundred and fifty test requests, and doing this great series of systematically controlled experiments. So it’s been very interesting. It has an international program committee, which I’m on, and it’s just fun to be part of it! I didn’t take part in it directly in the early years, myself; I was an advisor to the City University team, which was via my colleague Steve Robertson, and that was interesting. And then in the last few years they’ve had a spoken document retrieval track, which we’ve taken part in as a part of our project. So I’ve been involved in a lot of the design work for these things.
Abbate:
For the competitions, you mean?
Spärck Jones:
Yes, that’s right. I’m on the program committee, and then I’ve also been involved in some of the individual initiatives, like the spoken document [retrieval]. How do you specify what should be done? I mean, the design and evaluation is very difficult. I’m interested in system evaluation; I’ve written books about evaluating NLP systems. It’s a very, very interesting question, an interesting area; and so thinking about how you can do something which is a feasible test . . . How do you evaluate a summarizing system? We’re all grappling with this at the moment, because now I’m on the Advisory Committee for the DARPA TIDES program (Translingual Information Detection, Extraction, and Summarizing). It’s very interesting. There’s very exciting research going on in summarizing.
Abbate:
That’s summarizing and translating at the same time?
Spärck Jones:
Translingual: no. Well, it could be translating, [though] there’s not very much translation work at the moment. You could perfectly well think of having a system that would summarize documents in different languages and then translate the summaries. TREC has already had stuff on cross-lingual information retrieval, because the model is—for example, in Europe, where people know many languages, they might not want to formulate their query in a lot of languages, so you put it in in one language, translate it into other languages, and get back documents in other languages. I can read a French document, even if I can’t be bothered to think about how to formulate the query in French. That’s the model. You see, it’s quite an interesting idea. They’ve got, also, in TREC and TIDES, these things about question-answering. Not document retrieval: question-answering! You know, “Where is the Taj Mahal?” There’s lots of really exciting stuff going on there; it’s great! I’d like to have more research projects on it.
On Becoming a Professor
Spärck Jones:
And I’ve been very interested in summarizing. I did a lot of stuff following up the summarizing project. I actually began writing a major work on it, and then it got pended because of all these ACL things; and then I had put together a lot of readings while we were with Peter Willett, readings on I.R.; and then I was doing EDSAC 99—I was in charge of EDSAC 99, and that was a lot of work. But I’m on leave at the moment; these two terms I’ve been on leave, and I’ve been trying to get together summarizing again.
Abbate:
A book?
Spärck Jones:
Yes. I mean, I’ve been mostly reading, because—you see, in some sense it’s very frustrating, because it’s such an exciting area that people are working in it like mad, so there’s something new to read about every ten seconds! [laughs] Oh, and then some other things came along. I was made a Reader in ‘94, and that was quite a nice personal promotion; and then eighteen months ago I was made a Personal Professor, which was quite nice.
Abbate:
What is that?
Spärck Jones:
A Personal Professor. Professors in Cambridge basically come in two sorts: they come in ones which are established—you apply for, and you are elected to it—or it’s a promotion, a personal promotion. That’s to say, there is no election; you are just thought to be worthy of being [promoted]. It’s a full professorship, just the same as the others, but it’s just that it’s for you, personally. The professors of Cambridge: they’re all, in principal, of the same type; they’re a full Professor, and they’re tenured, and all those sort of things. But in fact they come in two models, in practical point of view, namely that some of them are long-term established, either from endowments or because the university’s committed to funding them, and when there’s a vacancy they advertise it; but then they may also promote people individually to full Professor, because they think that they’re worthy of it. So I’m a full Professor. But I have to say—thinking about women in computer science and all that—that only about five percent of the Professors in Cambridge are women, which is depressing.
Abbate:
In all disciplines?
Spärck Jones:
Yes. Something like that.
Serving as Vice President of the British Academy
I’ve also become involved, more recently, in the British Academy. We have two major academies in the U.K.: the Royal Society and the British Academy. The British Academy is humanities and social sciences, and I’m there, I think, because it does cover language work. I’m a Vice President of that currently, and it’s quite interesting.
Abbate:
What does it involve?
Spärck Jones:
Well, it’s mostly a formal position, but I do some committee work, too. They are the U.K.’s National Academy in the humanities and social sciences, and it gives research grants, and does a whole lot of other various kinds of things—public relations-type stuff, also running symposia, and it has publications of various sorts, and supports various kinds of research projects: individual grants, things like that.
But it’s interesting: I’d been thinking about myself as not really a computer scientist—because I’m not really a proper computer scientist—but what’s happened is that things have moved. When I began, a) I’d never been trained as a computer scientist, and b) this stuff about information retrieval, and even language processing, was sort of way out over there. And in some sense it still is an applications area; but as computers have got more powerful—and particularly with the Web, with all this stuff around—thinking about how you handle information, including information as expressed in natural language, seems to be more important to computing. I mean, it’s still in some sense an area of application, but it’s an area of application which is so pervasive now that you can’t separate it from computer science. It’s woven into the fabric of computing. Okay, it’s not like formal theory of computing, or semantics of programming languages, or all these things which I think of as core computer science; but handling information expressed in language has got a more central role in many things that humans do—and use computers for. And I’ve actually also thought that some of the techniques that are used for natural language processing can be applied to computational objects like programs—because they’re written in languages, you see. They’re expressing information in languages, and they express it redundantly, even though computer programming languages are not as redundant as natural language, so some of the same processing techniques can be used. I wrote a paper for an artificial intelligence journal in which I said: information retrieval basically extracts information—that is to say, ‘meaning’; it doesn’t directly say what meaning it is, but you grab things that are meaningful indirectly—and it’s able to do this because of redundancy. And there’s a lot of things that are in the area of interest of artificial intelligence that have got exactly the same properties. I mean, visual processing has got large amounts of redundancy; things like that. So some of the “shallow” knowledge-processing techniques that were used in information retrieval do have a style about them that makes them applicable to other areas too. A very simple example is, if you say, “This word is used unusually frequently in this document”: well, you don’t know what it means, but it’s clearly an important word for the document, and you can use that fact. That’s so simple—you might say it’s obvious—but you can still use it; you can do things with it.
On Getting More Women into Computer Science
Spärck Jones:
Another thing I’ve been doing a little bit in recent years is, I’ve been trying to think a little bit—but it’s very dispiriting!—about how to try to get more women into computer science. On the whole, everybody who thinks about this is depressed, because we’re going backwards rather than forwards. I took part in Grace Hopper; that was fun. Grace Hopper in ‘94, the first Grace Hopper Conference. It was interesting, actually. I enjoyed that. I enjoyed meeting the other people, the other speakers, and meeting all these younger women. It was successful, because it brought a lot of younger women in. The audience had a lot of younger people at it. I’ve not been to any of the subsequent conferences, because I haven’t had the opportunity, but I thought that was a good initiative.
But we had some . . . For example, in the lab here, we don’t have very many women students. And as far as I can tell, when I talk to people in the U.S., it’s the same: that the proportion of women doing computing is going down. We can think of all kinds of explanatory reasons as to why this is happening; this is very depressing! [laughs] I think in some ways they think computing is too nerdy. You know: who wants to compete with what they see as their brothers and fathers, sort of hunched over, playing games all evening? You know, this is not attractive.
Abbate:
Do you think computer jobs have become less open to women than they used to be? Or less appealing?
Spärck Jones:
It’s curious, actually, because in the early years, it was very open, you know. There was no required background, you see. I mean, you couldn’t say, “You’ve got to have a computer science degree,” because there weren’t any computer science degrees. And also, people got into it at different ages, too; some people got into it when they were a little bit older, because they just were doing it. So, it was, in that sense, up for grabs. You could just get with it; go on a training course, if you could find one of them. There wasn’t, also, an established social perception about the sort of person who’s [in computing]. If you say, “Who are people who work in I.T. or computing?” “Oh, they’re those sort of people.” You know, there was no social image of a computing person. I mean, there might be a sort of social image of a few strange boffins, you know, but basically there wasn’t an image.
But now I think that what’s happened is: I think there are a mixture of factors. I think one of them is that there’s this image of a lot of computing being boring, because it means war games, and being hunched over the computer screen, impervious to real life! [laughs] The “nerdy” image, you see. And the nerdy image is that people are somehow socially deprived—they’re voluntarily socially deprived people. They’ve got something wrong with their heads which causes them to go and hunch over this thing all the time and only interact with other people at a distance, on computer bulletin boards or something like that. The other one is that all of these people are interested in things that aren’t worth being interested in, like all these frightfully boring details about—well, you know the sort of jargon that people speak. We get it among our students sometimes. There was a charming girl from Oxford who came on our diploma course one year, and I asked how she was getting on, and she said, “Oh, God! These male students, they’re so intimidating. They say, ‘Oh, well, if you set the three-wire bit to do so-and-so, and then you flip it over, and you write it in that little bootstrap routine, and you pull it up, and you do a backwards-forwards-sideways collect, you know, and that’ll give you the whatsit!” I said to her, “Forget it! Most of what they know isn’t worth knowing.” [laughs] But they are like that, and that’s a different sort of image—a sort of terrifying competence about something that seems inaccessible to other people—and what you have to say, all the time, to the women, is: It may be a terrifying competence, but you shouldn’t be terrified, because it isn’t a competence that really matters most of the time.
I think the other thing is that computing has become invisible. A lot of high-powered women in things like finance, or City companies, or things like that: they can use computing, in the sense that they can drive a spreadsheet program, or they have other people working for them; so they can use it if they want it. I remember talking to somebody who worked in a bank; I don’t think she did a lot of it herself, but she knew perfectly well what it would do for her, so she could exploit it. Another reason is that there are more attractive things for high-powered women, and they think, “I can run a company; or I can go to management now”—which, of course, twenty or thirty years ago wasn’t open to them. They can set their business up, or go into management, or have a career. The money today: okay, there still is some money [in computing] if you have your own start-up—[though] there aren’t many women in start-ups—but basically, they can see that if you want to make money, be successful in the media, or be successful in the City, or things like that, and you’ll get paid far more than other ways. I mean, if you do a start-up, you might make a lot of money in the end, but it doesn’t always happen—and also, you’ll work every hour that God gave.
Abbate:
So computing’s not well paid here?
Spärck Jones:
Quite a lot of computing, I think, isn’t terribly well paid. It’s probably better paid now, but there are an awful lot of what you might describe as routine IT appointments. (Academia isn’t very well paid, but that’s different.) If you look at the job ads—you know, IT manager for some local council—it’s 60K or something like that. I mean, it’s perfectly okay, but it might not look terribly attractive, that sort of job. That’s not an exciting job.
So I think there are many different causes. I mean, I really don’t . . .
Maybe—you see, again: they’ve put computers in schools, and sometimes the girls are pushed off the machines by the boys. All of this rather boring stuff is very . . .
The other thing is, I think that the spread of word processing to secretaries has meant that people think that computing is about using a word processor. And I think teaching IT skills in schools, when it only means knowing how to use a word processor, is not actually a very —I mean, from a practical point of view, they’re quite useful, but it’s not a good way to cause people to think that computing is exciting. When I give talks on this, I try to tell them, “You know, it’s really terribly intellectually [exciting]. There’s lots of intellectually exciting things that computing’s all about, including what you’re applying it to.” Again, you can appeal to women by pointing out the many applications which are actually interesting from a social point of view . You know, they matter to people—which is what a lot of girls like: jobs that matter to people. But it’s difficult for them to get exposed to that when they’re at school—early enough, you see. For two years running, we ran a sort of weekend meeting for school teachers—not the kids; we thought what we need to do really is to get at the teachers. So we carefully got the teachers from the top-ranking girls’ schools—the ones that had very good exam results, that in principle ought to supplying us with students—and we got them in, and we told them about what computing is really all about: how exciting it was, how socially relevant it was, what interesting and flexible and well-paid and movable and every-other-kind-of-good-thing jobs there were—and quite a lot of them had really not much idea about it. I mean, it was quite interesting. Even those who were IT teachers really didn’t have a very good idea about how interesting computing could be. But the point is, you’ve got to convey that enthusiasm to the kids. You’ve got to get them hooked by the age of 13, and then you’ve got to keep them hooked, all the way through to the age of 18—and that’s hard. That’s what they said was really hard. Get ‘em hooked early on, and then keep ‘em hooked.
It was interesting, because we’ve just had a little book produced, which my husband got because he’s in it; it’s called Cambridge Entrepreneurs or something, and it’s a bunch of interviews with Cambridge entrepreneurs. You know, we’ve got this “Cambridge Phenomenon”: high-tech and every other thing like that. It was a bunch of interviews with people who started up these tremendously successful companies. It’s interviews with about 50 people, and 45 of them are with established people: I think that might include one woman. (They’re not all in computing; some of them are in biotech and things like that, but it’s the Cambridge Phenomenon–type stuff.) And then they had some interviews with 5 “possibly winners of the future,” and I was interested in that because they’d actually got three women there—three out of five or six. But, not one of them was a mainstream Brit: one of them was Canadian; one of them was Australian; one of them had a British father but some foreign mother and had been born abroad.
Abbate:
Interesting.
Spärck Jones:
And I thought, “There you go!” [laughs]
On the Commercial Applications of Research: IDF Weighting
Abbate:
I hadn’t heard of the “Cambridge Phenomenon.” What is that?
Spärck Jones:
Oh! For heaven’s sake.
Abbate:
Or I hadn’t heard that phrase. This is a high-tech hotbed, is that the meaning?
Spärck Jones:
Yes. We are surrounded by an enormous . . . I mean, Cambridge is not only the university: the university, of course, is a key part of it, but we have a lot of other things. We have a huge medical center here, with the Medical Research Council Molecular Biology Lab and all these things like that. We’ve got world-famous research here. And, it’s been hiving off—for a long time now; twenty or thirty years, I guess—things like start-ups, and also high-tech companies opening research labs here, or having branches here, or things like that. It’s been going since the ‘60s, actually.
Abbate:
So kind of like Stanford in Silicon Valley?
Spärck Jones:
That’s right, yes. Or Route 128 at MIT.
For example, in computing, an awful lot of the computing companies are spin-offs of the University. There were some people called Segal Quince Wicksteed, a company; they actually did an analysis of it in the ‘80s, doing kind of family trees, you know. So you would get somebody from the lab, and he has an idea, and he goes and does a start-up; and then after a while he gets big and he’s successful, and then some of the people working for him get bored with being in a big company, and they go and do a start-up! So you have a sort of proliferation, you know, like rockets—you send up one, and a whole lot more come all the time! [laughs] And there’s a science park; Trinity devoted a lot of land to having a big science park, and St. John’s have an Innovation Center, and there’s just tons of stuff going on here! We have some very big companies. ARM [Advanced RISC Machines] is a Cambridge company: that’s a big company now. Autonomy is a Cambridge company. There’s lots of others. I was just reading about Muscat recently; Muscat was started up by—it originally stemmed from my first research student! I mean, in a rather indirect way. But there’s lots of stuff going on here.
Abbate:
There must have been commercial spin-offs of some of your work . . . ?
Spärck Jones:
Not really. No, it’s not. I’m not a money maker. I’ve never thought of making money. I mean, I was brought up as a woman; I didn’t know how to make money! [laughs]
Abbate:
But nobody has applied that to commercial products?
Spärck Jones:
Oh, yes! The ideas, yes. Most of the Web engines: if you talk about index-term weighting, I guess anything that does index-term weighting using any kind of statistical information will be using a weighting function that I published in 1972. It was just public domain stuff. Quite a lot of people use—I mean, whether they use these things, you can’t tell—but certainly the stuff I’ve done in retrieval; those ideas have, one way or another . . . People may not know that they originally came from me, because if you’ve had a good idea which becomes established, then it gets into textbooks, and people cite the textbook.
But, no, I’ve never spun off a company myself. I’ve never thought of having a company myself; it has just not seemed to be my scene. I’ve always been brought up on a kind of academic publishing model, if you see what I mean.
Abbate:
Did the British Library use your work? Did they have a system based on that?
Spärck Jones:
No, they didn’t. No, they were just . . . The British Library: it was their R&D department, but it wasn’t only for them; they were just supporting R&D in the field. It was a slight accident that it was part of British Library; it was a slight anomaly that they were part of British Library. It wasn’t the British Library doing it for itself. In fact, they’re mostly concerned with mainstream cataloguing, and stuff like that, or how to digitize. They don’t run a retrieval service, in that sense. You can do catalog searches, that’s the model. You know, it’s like the Library of Congress.
I think it’s true that some of the approaches to doing retrieval that I worked on for a long time have become very well established. I mean, a large part of that is due to the success of Steve Robertson and his team, doing the City University stuff in TREC and things like that. We published a paper back in ‘76, and that’s very widely cited.[6] But that’s for doing relevance feedback—and in fact, people do do that kind of operation. I don’t know how many search engines do feedback like that—the kind of research feedback strategies we did. As far as I know, the only search engine on the web which uses something which is very distinctive, which you couldn’t have done in the way that they do it before the Web, is Google: because they use hyperlinks. (I.e., they look at how many pages are linked to a particular page as a measure of its relevance.) You can do citation-linking, if you see what I mean. It’s all based, as far as I know, on stuff that was published in the open literature about how to find key nodes in a network. Part of it’s the algorithms, you see. I mean, it’s not a particularly novel idea that if you have a paper that everybody cites, it’s a good paper, and various other notions like this. The question is, if you’ve got a thing the size of the Web, how do you actually find the nodes? So it’s algorithms: it’s graph algorithms; it’s all that.
Abbate:
Determining relevance, or weighting relevance, according to how many links there are to it?
Spärck Jones:
That’s right. You see, they have a whole underpinning; as I say, the idea of hyperlinks, in the perfectly classical form, is citation links, and people did work—in fact, I’ve done it myself—on how you might do retrieval using citation information; but the form which it takes is distinctive to the Web. Whereas many of the other things that are applied to the Web are simply classical ideas about how to index documents and how to retrieve. There’s nothing wholly new. Whether they’re derived ultimately from old-fashioned libraries, in some cases—you know, Yahoo, and some of these things with subject categories, is old-fashioned librarianship. They’d die rather than to admit it, but that’s what it actually is! Or it’s using more modern ideas, as developed in research by people like me and Steve and Gerry Salton in the last three decades. All the statistical weighting: that comes from research in the last two or three decades. But the Google stuff, with the hyperlinks: you could only do that with the Web, because that sort of hyperlink didn’t exist before. It exists implicitly, because —I mean, if I have a section in a paper in which I’m discussing somebody else’s work, there’s more than just a citation reference: there’s also the discussion in the text, but nobody can interpret that automatically to figure out exactly how to use it.
So, a lot of wheel-reinventing is going on. [laughs]
On Her Nontraditional Career Path
Spärck Jones:
Well, I’ve talked! Have I covered all your questions?
Abbate:
I have some broad questions . . .
Spärck Jones:
I did feel in some sense I was sort of slightly out of the computer science mainstream. But then, everybody—other women in the field of my generation: It’s hard to find someone who is a real proper, kosher, fully trained computer scientist, like the younger people would be.
Abbate:
Well, that’s true of a lot of the men, too; it’s just that period.
Spärck Jones:
That’s right; that’s right. But the other thing is, everyone talks about personal careers and things like that. In some ways, I was able to sort of make my way, but there wasn’t any very obvious path. You had to just take advantage of whatever there was, and do it. Certainly it was the case that, both for women in general and also in Cambridge, the job opportunities in earlier years were very much more limited. I look at the younger women and I’m terribly envious. There’s not many have got a “can-do” mentality, but they’ve got the opportunity to be can-do, which we really didn’t have then. So I’m quite envious!
Abbate:
Did you have any mentors or role models when you were starting out?
Spärck Jones:
No, not really. I think it’s fair to say that . . . I mean, Margaret Masterman never had a proper position; she just ran this little unit herself. She did convey the importance of tackling intellectual research interests, and doing it, but she wasn’t in any sense a mentor or a role model like that. She was just such a wowsky individual, if you see what I mean.
Abbate:
I’m not sure what that means.
Spärck Jones:
Well, sort of eccentric. I think the point is, you couldn’t say she was a mentor—she had a very strong personality—and you couldn’t say she was quite a role model in many of the ways she did things, but I think the thing she did do was that she was willing to employ people if she thought they could do something that was worthwhile, without bothering too much about whether they’d got the right formal qualifications. She was kind of opportunistic in that way, and she did provide a guide in the fact that she said, “If you want to work on something, it’s worth working on it.” So in that sense. . . But it was really much more—I mean, if you say, “Who was a support to me?”: well, it has to be my husband Roger. He was very supportive. He said I could do things and it was okay. And he always took seriously the fact that if we were going to move anywhere, I had to have a job, and not just him. I mean, he wouldn’t have had any difficulty; he had lots of good jobs, you see; but I just didn’t want to be touting around in some second-rate non-tenured position while he would have some prestigious top-of-the-range! [laughs.] So, so he’s been very . . . We actually collaborate on things. We write papers together occasionally, and stuff like that.
Now he’s working at Microsoft. I don’t know so much in detail about what he did, but he went there when he gave up being head of the lab and Pro Vice Chancellor and those sort of things, and he’s running the Microsoft Cambridge Research Lab. It’s a new life for him, and it’s all fun!
On Discrimination against Women in Academia
Abbate:
Did you encounter a lot of overt discrimination, in terms of attitudes or salaries?
Spärck Jones:
No . . . It’s hard to tell, because in some ways, the kind of training that you had [at Cambridge] . . . It’s an interesting point. I mean, people used to comment about the self-confidence that Mrs. Thatcher had, although she’d come from a not-terribly-inspired background; but for a woman, one of the things that going to an Oxbridge college gives you is self-confidence, because you know you had to really work to get there. You know, you’ve got to have something going for you. I don’t want to sound arrogant, but you really had to—particularly if you hadn’t come from a flash school where they train the kids to come. So you’ve got something going for you, and it did give you [confidence]. And then also, Oxford and Cambridge itself, it’s perfectly true: it truly does give you self-confidence, because they’re good universities and they have good people, and if you do well there, that shows that you’ve got something. So it does instill self-confidence, whether justifiable or not: it’s simply a proven fact it instills self-confidence!
But I don’t think I encountered any . . . I mean, after all, when I applied for things, on the whole, I got them. I think the discrimination is more that it didn’t seem surprising that I should be living on soft money for so long. I didn’t mind it so much, because I was making the research projects myself. I mean, I wasn’t just an employed research assistant; I was the person who was formulating the work, guiding the work, doing it. I wasn’t just like a hack research assistant—even though they may do large parts of the work and make the major contribution. I mean, there’s a terrible culture here of contract research workers, which is—because of financial problems the universities have—getting worse all the time; people are having worse contracts. But it’s true: women are not as well paid, you know, on the whole. The job payments formally are the same. I mean, if you’re appointed to “X” position in the Lecturer scale, at age so-and-so, if you’re a man or a woman you get paid the same. It’s egalitarian in that sense. But it is the case that in general women are not paid as much, if you look at it, as instantiated by the fact that more of the Professors are men . . . And in many walks . . . In the U.K., women’s pay is about 70% of men’s or something.
There have been around—I didn’t encounter them very much, but there were people around who thought it was perfectly okay to say, “Oh, well, higher education isn’t really a sort of thing you should let women have.” You know, people were still saying that! My friend Mary Hesse, when she came here from London in the 60s, went to dinner in some college, and there were these sort of old-fashioned dons who used to think it was terribly amusing to make remarks like, “Oh, well, you know, women shouldn’t really be in higher education.” And this is talking to a young woman who is a distinguished worker in her field, and they think it’s okay to say things like that. I think it’s appalling! But it’s going out.
Part of the problem is whether you can make employment-friendly conditions, and whether the women can have the self-confidence and . . .
[TAPE 2, SIDE 1]
Spärck Jones:
What I was saying was: The critical time, for academic or research life, is post-doc. It’s more competitive now; everything is happening so fast. In my own field, I just look at the—more conferences, more journals, more papers, stuff on the Web; if you haven’t pulled it off the Web the minute it’s been put there, you’re behind. And post-doc is a very difficult time. Quite a lot of women are research students; that’s not the problem (though we don’t have as many female research students in our lab as we would like, and they’re mostly foreigners). But basically, the time when you consolidate, if you want to get an academic or research-type position, is publishing the stuff that was in your thesis—particularly in the U.K., where typically you’re only taking three or at most four years to get your Ph.D. It’s much more concentrated than in the U.S. So you want to consolidate: you want to get some good publications in leading journals; you want to get to the conferences; you want to get a research fellowship, or start up a research project, or something like that. That’s the time when it’s really difficult. Then you suddenly take three years out to—when you’re being distracted by having infants or stuff; or you’re marred to some guy like yourself, and he’s got a job offer over there—that’s the time when it’s very, very difficult. And yet if you fall behind for two years, and don’t publish much, people just say, “Oh, well . . .!”
And we reckon the lab: Roger said he was very proud of the fact that whenever a woman applied for a job in the lab, she actually got it! There haven’t been many women applicants, but they got them. And in fact, currently we have three women staff, out of something like 25. But it’s hard, and I think in the whole of academic research laboratories, which is where I’ve spent all my life, it is hard, and it’s getting harder, because you can’t take so much time on things now.
Yes, but discrimination: as I say, I think it’s—I mean, it is a sexist place. I’ve never been involved with undergraduate colleges; I’ve been in graduate colleges, which is different—they’re much more civilized. But the undergraduate colleges: I think their fuddies are going, but they’re still there. And they set the tone of the place, you know; there’s a sort of drag. The tones of colleges are set by people forty years ago, at least—if not four hundred years ago! [laughs] Something like that.
I think part of it is that you just . . . I remember once, when in Alvey, I was invited to come and talk about the Alvey program to some weekend of some company. They’d got some company people or something together, and the chap who was my host was saying to me—we were having dinner that evening, and he said, “Don’t you feel it’s odd to be one woman in a whole room full of men?” I said, “No, I’m used to it!” Because I was! I mean, that’s what any of us do, right from the time I came to Cambridge. It’s just that I was used to it, so it didn’t faze me.
Abbate:
Interesting.
Spärck Jones:
But you have to think about it like that; you just have to say, “Well, that’s the way it is.” You’ve just got to assume that if you’re dealing with colleagues, if you’re interacting with colleagues, well, you’re just a colleague. You’ve got to do it that way, and not be fazed by it. I wouldn’t say I never think about these things, but basically you’re thinking about the work, or about how you get the job done, or how you interact with the people as people; never mind about whether they’re men or women.
On the Rewards of a Career in Computing
Abbate:
What have you found to be the most satisfying aspects of working with computers?
Spärck Jones:
I think it’s the challenge. It really is the challenge they present, to think about how . . . . Well, there’s two things. First of all, thinking about how you model something: if there is some process which is currently done by humans, how you can model doing that same thing. Not necessarily in the way that they do it; we don’t have much idea about how humans produce abstracts, for example, but nevertheless, they do produce abstracts. So what is interesting is if you say, “Can I build a program which will produce abstracts?” We’ll have to model the essential features of what abstracting involves—because otherwise it wouldn’t deliver an abstract—but it doesn’t follow that the mechanics are going to be the same as the way that humans do it. So there is something very satisfying about the fact that you can take something which is a difficult thing to do and automate it. And nearly everything about the way we use language is enormously rich and complex, and really not very well understood—everything about how you use language to communicate content, to understand content, to communicate with people, to do all of these things. You know, being able to say, “I’ve got a program that will do that, that will index a document as well as a human can index it . . .” Indexing’s a simple task by comparison with things like translation, or some of these other things like summarizing. But nevertheless, I can index a document automatically as well as a human can do it, if we measure effectiveness in terms of whether the right documents are retrieved when some human asks for them. What you do is, you have these test environments, and humans come along with a question and say, “I want documents on Michelangelo’s prowess as a sculptor” or something; and I can deliver by my automated indexing—which is using the kinds of techniques for indexing which humans will not use, because they use statistical data and mathematical models of importance—I can deliver documents to that guy just as good as the human who’d sat there and done whatever he’s done to index the documents. So there is a nice feeling about that. And similarly, if you can answer database questions, or if you can do summarizing, or things like that; if you can understand about . . .
So part of it’s this business of capturing something which is difficult. I mean, what is discourse structure? I mean, linguists write lots and lots of books about discourse structure, but you can’t just pick them up and apply them. It’s actually quite difficult. You’ve got to turn this into something very concrete, to be able to capture it.
Abbate:
Do you understand a linguistic process better after you’ve made a computer do it?
Spärck Jones:
Yes. In intellectual terms, it’s a kind of feedback process. So one thing that’s exciting is being able to capture something—particularly things that humans do which are difficult, like summarizing; there’s something very nice about that—being able just to say, “I’ve got a model of this process.” Because in order to write a program, you have to have a model. You can’t just put one order after another, you know; you’ve got to have some idea of “What is it? What is the essence of this thing?” So that’s nice.
And the other thing is actually sometimes building systems to do things that nobody has ever done before. In that sense, you’re being creative. The idea that computing is creative is not something that many people realize—but in a way, you are doing something that no one’s ever done before. I mean, of course, many of the things are done by humans, but there are also things that are [novel]. I mean, the details are novel; as I say, if you do it in a way that humans don’t do it, then it’s new in that [sense]. But I think it’s really that you’re building a system: that’s the novelty: You’re actually building something. It’s like, we might have ideas about bridges, but nevertheless we’re actually making this thing—it’s there; it’s doing things. That’s very exciting, that you can actually build a system to do something. It’s like engineering. But then people say, “Oh, engineering’s boring.” You know: “Compared with pure science, engineering is a very dull thing.” But engineering is creative, because you are making something.
Think about building bridges. Of course, nature builds some bridges; you know, a tree falls down, or humans might say, “Well, let’s do it this sort of way; we’ll cut down a tree and we’ll put it over the stream.” But much of what’s done in bridge building is new; it’s never been done before! Nature doesn’t do it that way. You may say, “Well, of course nature does it that way, because you’re exploiting the natural properties of materials, like elasticity, or tension, or blah blah blah.” But that’s like saying, “Everything is physics, really”—that’s not very interesting! [laughs.] It’s more interesting to say, “Look, I made a bridge, and I made it by catenary suspension (or whatever it is), and that’s not been done before; this is new.” And that’s what computer systems are. That’s what’s really exciting about them; you’re actually making something which wasn’t there before (although they may have analogs, and you may in some sense be trying to model things that humans do). I mean, think about the Web. It’s completely new! It didn’t exist before; it’s new. So that’s exciting.
I don’t want to say that the world should be taken over by computers. I get very, very depressed sometimes by the feeling that you will wire up your entire house, which will do. . . These people come along and say, “Wouldn’t you love it if your house, the minute it heard you stir in bed, it said, ‘Ah, she must be getting up. Now I will turn on her favorite music.’” It makes me feel absolutely ill! [laughs] I can’t bear it, you know. And the notion that life will be wonderful if we sit in front of our screen in glorious Technicolor: I think this is dehumanizing and horrible. So that’s not what I think at all; but there are many things that computers are good for, and it’s interesting that you can do these things—as well as, of course, doing some chores.
On Changes in Computing
Abbate:
How do you think computing has changed since you began in the field?
Spärck Jones:
How has it changed? Well, I wasn’t a direct user, of course, of the EDSAC I or the EDSAC II. Roger did these experiments for me on the EDSAC II, and you’ve been to talk to some of the people who did actually use them, like Margaret [Marrs] or Lucy [Slater].
Abbate:
But even as an end user . . .
Spärck Jones:
Well, I think now—and again, remember, I don’t write my own programs now, and my R.A.s write them in languages that I don’t know; they can hack up a PERL script in ten seconds flat, which I can’t do at all! I think basically the change now has been that there is so much power there in the computer that you can think of doing things that you could never imagine doing before. Of course, there are two aspects to this. First of all, you can search Web pages—you may not be doing it in real time, because what you’re doing is actually looking at the results of the other guys who’ve indexed them—but essentially, you’re searching 1.3 billion documents. That’s unbelievable! 1.3 billion documents on the net, or 1.4 billion now: that’s a lot. That’s really something. So there’s all that power there, and that’s really changed from what I was doing. I mean, when I was first programming, I used to spend a lot of time thinking about efficiency. You know, “How do I do this to cut the time?” It’s not just writing reliable code, but it’s actually cutting the machine time, and saving store, and all that sort of stuff. Of course, nowadays the programming languages you would use, or the tools you’d use, may be much more profligate, so you may still have to think a little bit about efficiency. It’s interesting with speech processing, because we found with the speech processing projects that we were dealing with something like 50,000 short documents, and processing the speech for that is a big deal, even now. We had to start to think carefully about what we were doing, because we couldn’t afford to re-run it, because that would be taking another month or whatever it was; whatever number of weeks, or hours, or days that it took. So one is still constrained; but there’s so much more power there, there’s so much more storage capacity, that it really makes a difference.
It also enables you to do your own things faster. It’s not really that the languages are better, because in some ways they’re not better; I mean, C++ is not a very inspired language. I think it’s that you can afford to try things in a quick and dirty way now, because you don’t have to worry so much about efficiency. You see, in the old days, you practically couldn’t try anything without thinking a little bit about code efficiency, because otherwise you would just never do it; but now you can run something up very quickly.
I was looking at some language processing stuff recently, some conference papers, and what struck me about it was how, once a field like computing gets well-established, you get a whole lot of resources which are available to people. So somebody could do a whole trial system for something that was in the summarizing area, and they could just do it. They could say, “Well, I’ve downloaded X’s parser, and Y’s morphological analyzer, and Z’s classifying thing; and okay, I wrote some stuff myself, I wrote a box here”—but essentially they could pull all this stuff down, glue it together, and they could do a trial. They could do an initial experiment, which would tell them whether what they were thinking about doing was even faintly a sensible thing to do. Of course, if they wanted to do it properly, then they would have to pay much more attention; but you could try something out, and that you couldn’t do in the old days. I mean, of course we had library routines; you didn’t have to write that thing which read an integer. (We used to have programs where we were trying to read integers, you know!) [laughs] So I think there’s just so much resource there, in the sense of the machine capability; and also there’s beginning to be so many of these other sorts of resources—let’s call them utilities—that enable you to, to try things out. So it just feels completely different. But of course, I don’t do the programming myself now.
Advice for Young Women Who Are Considering a Career in Computing
Abbate:
Do you have any advice for young women contemplating a career with computers?
Spärck Jones:
Yes! I mean, I think my advice would be what I said a little bit earlier: Don’t be put off by nerdy images, or the notion that computers are used in inhumane ways. You know, the inhumanity notion: “They’re used for nasty things like warfare and stuff like that.” It’s an area of real intellectual interest, both in terms of what computers themselves can do . . . You can operate at two levels. One of them is thinking about how to build better computer systems—and not too much hardware, but software, operating systems; and nowadays, of course, everybody thinks about security: how you build security into distributed systems. This is computer science research, and those are real intellectual challenges; they’re really interesting. There’s lots of interesting things to do. At the same time, there’s lots of interesting things to do in applications areas. Think about systems for managing health data: they can be bad, or they can be good. Why not write good ones? Think how valuable they will be!
So I think my view is: Don’t be put off by the surface. There’s lots of opportunities in computing itself. I don’t just mean using it as some black box that you use as part of your job, because many people do that now; like working in finance: they’ll have something that will deliver some numbers to them. But actually thinking about how and what the computing itself is doing—how you should write it and what it should do. We’ve got a lot of research effort in security, and this is a very good example, because it’s absolutely intertwined with all of our daily lives. You know, think about the security of your personal data, about your financial data, about all these kind of things. How do you design good systems which will let that stuff be seen by the right people, will let it travel round with you in your life, will stop the wrong people getting at, and you will stop it getting corrupted? I mean, there’s just endless stuff there, which is both socially important and intellectually extremely challenging. The technical challenges of security are very great indeed; it isn’t just cryptography. People think security is cryptography, you know, and that it’s for people who can whiz out sums: that’s a very small part of it. The real part is how you design good systems, and systems design is a fascinating thing.
So the advice to the young women is: There’s a lot to do there, and don’t be put off by the bad side of it! And what’s more, if you are put off by the bad side, it will taken over by people who are complete nerds, and the world will be a worse place rather than a better place! [laughs] Because I think the women actually are capable of looking at some wider aspects of computing, and not just getting hooked entirely on the technology. So don’t let the men take over the technology and make the world a worse place! [both laugh] That’s my advice.
Abbate:
Well, thank you very much!
Notes
1. Noam Chomsky, Aspects of the Theory of Syntax. MIT Press, 1965.
2. EDSAC 99. Commemorative book (with a yellow cover) produced by Cambridge University Computer Laboratory in 1999.
3. Barbara Grosz, Karen Spärck Jones and Bonnie Webber (eds.). Readings in Natural Language Processing. Morgan Kaufmann, 1986.
4. U.S. Defense Advanced Research Projects Agency and National Institute for Standards and Technology.
5. Karen Spärck Jones and Peter Willett, eds. Readings in Information Retrieval. Morgan Kaufman, 1997.
6. Stephen E. Robertson and Karen Spärck Jones. “Relevance weighting of search terms.” Journal of the American Society for Information Science 27, 129-46 (1976).