Milestones:DIALOG Online Search System, 1966
- Date Dedicated
- 2019/05/23
- Dedication #
- 195
- Location
- Mountain View, CA
- IEEE Regions
- 6
- IEEE sections
- Santa Clara Valley
- Achievement date range
- 1966
Title
DIALOG Online Search System, 1966
Citation
DIALOG was the first interactive, online search system addressing large databases while allowing iterative refinement of results. DIALOG was developed at Lockheed Palo Alto Research Laboratory in 1966, extended through contracts with NASA, and offered commercially in 1972. Its speed, ease of use, and wide range of data content attracted professional users worldwide including scientists, attorneys, educators and librarians. DIALOG preceded major Internet search tools by more than two decades.
Street address(es) and GPS coordinates of the Milestone Plaque Sites
- Site 1: Computer History Museum, 1401 N. Shoreline Blvd., Mountain View, CA 94043 US (GPS: 37.414757, -122.077679)
- Site 2: Lockheed Martin Advanced Technology Center, 3251 Hanover St., Bldg. 245, Palo Alto, CA 94304 US, (37.410542, -122.144100)
Details of the physical location of the plaque
- Site 1: On the inside face of the front patio brick wall, near the Main Entrance.
- Site 2: On a wall in the front lobby.
How the plaque site is protected/secured
- Site 1: Building security; 24/7 access.
- Site 2: Building security; viewing by invitation only.
Historical significance of the work
Until about 1969, most scientists, engineers, attorneys, educators, librarians and others who wanted to research what was known and published in a particular discipline were required to physically locate and then visually search materials published in books, journals and other printed materials. This was a time-consuming and imperfect process. See the video at slide 10 in [Ref1: Berg IEEE Presentation].
When DIALOG became available in 1966, it was able to automate research work for scientists and engineers at NASA, and later at other government agencies. By 1972, its commercial introduction extended these capabilities to all professions by allowing online access to large collections of digitized materials by way of a command language that allowed the searcher to iteratively refine results.
DIALOG's Ability to Allow Iterative Refinement of Results
DIALOG's major technical innovation is reflected in its name: it enables a conversation between the searcher and the computer. Thus, DIALOG has an interactive user interface that allows for iterative searches to be performed in what is in effect a dialogue between a user and the DIALOG system.
"Search at its best is a conversation ... an iterative, interactive process where we find we learn." (Search Patterns, 2010, Peter Morville, p. 9.)
DIALOG's language allows for this interactivity, as shown in the following excerpt from Roger Summit's 1967 ACM paper [Ref2: ACM Paper]:
"There are five important characteristics of the DIALOG language:
• The search question is constructed at search time (rather than at index time as is the case with a manual system).
• DIALOG is designed for nonspecialists; i.e., the users themselves, and thus avoids one communication barrier.
• The command language is independent of the particular data it searches.
• As an on-line system, it allows continual redefinition of the search question, based on examination of intermediate results.
• Control of the process lies with the user; the computer merely serves as a data-processing extension of the user."
Overview of DIALOG's Language
The following overview is distilled from Roger Summit's 1967 ACM paper [Ref2: ACM Paper]:
The DIALOG system provides a number of commands with which the searcher interacts with the computer. A search consists of (1) identifying and (2) selecting terms and phrases that reflect the user's interest, (3) combining descriptors into search expressions, and (4) examining retrieved citations and modifying search expressions. Each of these functions is accomplished with a particular command. The four principal commands are EXPAND, SELECT, COMBINE and DISPLAY.
EXPAND with a term provides a list of synonyms, related terms and similar but misspelled terms found in the database allowing the searcher to home in on the exact combination of words defining his/her interest. The listed terms are numbered to allow the searcher to SELECT a list of terms (term a, term b, term c), a range of terms (term a - term d), or a list of ranges and terms. The result of a SELECT command with a list of terms is a numbered Set representing a subset of the database documents containing these terms. COMBINE with a Boolean expression of Set numbers provides a numbered Set of database documents corresponding the the Boolean specification. DISPLAY with a SET number calls up and allows the searcher to successively display documents contained in the resultant Set allowing the searcher to determine the success of the search so far. Based on this feedback from the database, the searcher may continue to develop additional sets recursively or simply print out desired resulting documents.
The Origins and Early History of DIALOG
The paragraphs in this section are from pp. 72-74 of Elliot King's Free for All: The Internet's Transformation of Journalism, 2010 [Ref3: King Book]:
"The idea that companies could put their computer expertise to work for others had many ramifications. One possibility that presented itself was that efficient, centralized computers could manage access to and retrieval of information from vast storehouses of information. In 1960, Roger Summit, a doctoral student at Stanford University, took a summer job at Lockheed Information Sciences Laboratory, where he was assigned to work on problems of information retrieval under the supervision of E.K. Fisher, the director of information processing. The central issue was how to locate and retrieve stored information in a cost-efficient, timely manner. At the time, according to Summit, the feeling was that it was often easier to redo scientific research than it was to determine if it had been done before.
In the course of his assignment, Summit encountered the work of H. Peter Luhn, a researcher at IBM who had invented two significant schemes for the large-scale management of information — Key Word In Context (KWIC) indexing and Selective Dissemination of Information (SDI). In 1964, at Summit’s urging, Lockheed established a laboratory to study the application of these technologies. A project team of six led by Summit set out to create a technology that could facilitate efficient information retrieval. Among the criteria he established were that the system had to be usable by end users without the intervention of computing staff and it had to be interactive and recursive so that searchers could immediately see their results and modify their queries accordingly. Finally, researchers wanted to include an alphabetical list of searchable terms near a desired term and the number of items in the database containing that term.
By 1965, the team developed the prototype of what became the DIALOG Information Service. To test the system, Summit submitted an unsolicited proposal to apply DIALOG to NASA’s Scientific and Technical Aerospace Reports (STAR) ..., a database with around 250,000 citations ... that was in great demand. NASA had been established by the Space Act of 1958 to spearhead America’s drive into space, and part of its mandate was to disseminate information about its activities and findings as widely as possible. From its inception, the agency aggressively indexed books, reports, and research concerning aerospace, and in 1962, NASA's staff, working with a contractor, started entering the bibliographic citations into a computer.
When Summit discovered a contract had already been awarded to a competitor, he proposed a smaller, less expensive parallel project as backup if the competitor failed. For the test, Summit leased a data line from the Lockheed offices in Palo Alto, California, to the NASA Ames Research Center. The test was conducted in January 1967. The turnaround time for a query was cut from fourteen hours when conducted at NASA headquarters to just a few minutes using the DIALOG system.
Based on that success, Lockheed won a [competitively bid] $180,000 contract from NASA to build what was called the Remote Console Information Retrieval system, or NASA RECON. This was followed by contracts to install DIALOG at the Atomic Energy Commission (AEC) and the European Space Research Organization (ESRO) and, in 1969, a contract to provide the U.S. Office of Education (USoE) with a retrieval service on the Educational Resources Information Center (ERIC) database."
Much of the above is also discussed in [Ref1: Berg IEEE Presentation] (including the NASA-RECON video in slide 14), [Ref4: Summit Thesis], [Ref5: World Encyclopedia], [Ref6: AIIP Newsletter] and various of the Roger Summit documents cited by Google Scholar [Ref7: Summit Citations].
In addition, transcontinental use of DIALOG was first possible in June 1970 by way of a satellite link that connected Paris and Oakridge National Laboratory in TN with Lockheed in Palo Alto, CA, for access to the AEC and NASA databases. The plan for a satellite link is noted in the ESRO video at slide 16 in [Ref1: Berg IEEE Presentation].
DIALOG's commercial availability in 1972
In 1972, Lockheed launched the world’s first commercial online service as DIALOG Information Retrieval Service, named after its language. Because interactive access to bibliographic databases of scientific and technical information is of great value to many organizations, the initial service provided users in Europe and the U.S. with access to the ERIC (Educational Resources Information Center) and NTIS (National Technical Information Service) databases, as well as the PANDEX science citation index. At its launch, Dialog had six customers." [Ref3: King Book], [Ref5: World Encyclopedia], [Ref8: History and Heritage] and [Ref9: History of Info Science]
DIALOG became the most comprehensive online information service in the world by 1985
• “By 1985 DIALOG had become the most comprehensive online information service in the world, with more than 200 separate databases in business and economics, chemical, patent and trademark information, science and technology, medicine and the biosciences, news and current events, education, directories, energy and the environment, law and government, computer science and microcomputers, books, the social sciences, and the humanities.” [Ref5: World Encyclopedia]
• “By 1985 DIALOG Information Services, Inc., with Summit as president, offered more than 100 million records on many subjects from more than 200 different databases to its many customers in several countries (Camp, 1985).” [Ref9: History of Info Science]
DIALOG's Use in Libraries
DIALOG dramatically expanded the research capabilities of libraries; it changed the outlook, careers, and perspective of the library and information professionals who used the service; and it provided expert searches for their constituents. As its popularity grew, the world's significant libraries (including the National Library of Medicine, or NLM) were among the first to integrate DIALOG into their research and reference offerings:
“West Coast research centers such as the Rand Corporation, the System Development Corporation, and Lockheed Missiles and Space Corporation, as well as some universities, entered the mainstream of online retrieval through their research projects and by providing leadership to the national establishments such as NASA and the NLM. Two of our featured specialists come to mind: Roger K. Summit of Lockheed’s DIALOG, who was instrumental in applying Lockheed’s techniques to the NASA-RECON online bibliographic retrieval system; and Carlos A. Cuadra of the SDC, who administered the ORBIT II in the NLM’s AIM/TWX experiment.” [Ref9: History of Info Science]. See also the NASA-RECON video in slide 14 of [Ref1: Berg IEEE Presentation].
DIALOG's Use Throughout the World
• Virtually all business segments have used and continue to use DIALOG for searching, including libraries, investment banks, consumer companies, chemical, pharmaceutical, medical, engineering, biology, social sciences, humanities and aerospace companies, government agencies, patent offices such as the European Patent Office (EPO), the Japanese Patent Office (JPO) and the US Patent and Trademark Office (USPTO), and academia. Researchers, executives and professionals of all types were exposed to the speed, precision, and depth of DIALOG searching.
• The vast amount of data from various sources led to unique database formatting to accommodate bibliographic, directory, and specialized intellectual property searching. As many of these database producers organized their tools via sophisticated controlled vocabularies, DIALOG was a pioneer in creating a value-added online approach to controlled vocabularies (metadata/taxonomies/ontologies). Some of the most important vocabulary schemes were loaded on the DIALOG system, including the CAS Registry, the Medical Subject Headings (MESH) of the National Library of Medicine, plus the controlled vocabularies of tools varying from education (Educational Resources Information Clearing House or ERIC) to technology (the INSPEC database) to mention a few.
• The voluminous repository of data available for searching as of the late 1990s is shown by the many hundreds of database names in a pair of documents that organize these names both alphabetically and by subject area. [Ref10: DIALOG Databases Late 1990s]
• DIALOG, operating as ProQuest Dialog as of 2018, is part of the US Patent Examiner's Toolkit. [Ref11: USPTO Databases]
• The databases supported by ProQuest Dialog as of 2018 remain extensive, and include the patents of nearly 30 countries. [Ref12: ProQuest Dialog Databases]
DIALOG and the Advent of Internet Search Engines
DIALOG retained its usefulness even with the widespread availability of free internet search engines such as Lycos, Infoseek, AltaVista, Yahoo! and Google starting in 1993, as shown in these 1998 observations by Roger Summit as excerpted from [Ref8: History and Heritage]:
“With the rapid growth of the Web, some have been predicting the demise of traditional online services. I don’t agree. Recently, I was doing some research in preparation for a speech I presented in Stockholm. I determined that DIALOG contains more than twenty times the total amount of information accessible through the Web. Furthermore, the two have grown at roughly the same rate over the past year, based on AltaVista statistics.
In addition to comparing the quantity of information on DIALOG and the Web, I compared the quality of search results for several topics using DIALOG and the AltaVista search engine. I’m sure it will come as no surprise that the DIALOG results were highly relevant, while the AltaVista results were, to be generous, somewhat encyclopedic in nature. I found that it was difficult and often impossible to do a comprehensive and in-depth review of a particular topic on the Web.
It’s somewhat ironic that with the phenomenal growth of the Web and concomitant advances in interface design, Web search engines lack even the most rudimentary features that were basic in the first online retrieval system we designed thirty years ago—such features as field specification, display of index terms, or options to allow one to refine a search.”
DIALOG has changed ownership over the years, but it remains an important research tool
Over the years, DIALOG has undergone changes in ownership and name. The following chronology is shown in slides 29-36 of [Ref1: Berg IEEE Presentation]:
• Lockheed spun off DIALOG in 1981 as the wholly-owned subsidiary Dialog Information Services, Inc., with Roger Summit as President and CEO until 1992.
• Knight-Ridder acquired DIALOG in 1987 for $353 million via a Goldman-Sachs auction, and it was operated as Knight-Ridder Information until 1997.
• London-based M.A.I.D., LLC purchased DIALOG in 1997, and it was operated as The Dialog Corp. until 2000.
• Thomson acquired DIALOG in 2000, and it was operated as Thomson Dialog until 2008.
• ProQuest acquired DIALOG from Thomson Reuters in 2008. The service has been marketed as ProQuest Dialog™ since that time. [Ref13: ProQuest Dialog Brochure]
Obstacles (technical, political, geographic) that needed to be overcome
An early obstacle was the usual reluctance of scientists, engineers, attorneys, educators, librarians and others to trust the results of DIALOG searches. That reluctance faded quickly with time and experience.
Other obstacles included difficulty in obtaining necessary administrative support as well as obtaining access to the necessary computer hardware in the days when such resources were expensive, in particular for access by multiple users and within a government entity.
In addition, users were not used to interacting with a computer. As described above, the DIALOG language and system allowed for this interactivity, and thereby provided the means for the "iterative refinement of results." This interactivity and the ability to iteratively refine search results is of course why internet search is so widespread today.
A case study on the use of DIALOG to search the Educational Resources Information Center (ERIC) "document file" (database) was performed at Stanford University in 1969. [Ref14: Stanford Study] DIALOG was used "to see if individuals could sit down at a terminal and, with little preliminary instruction, use such a system to locate relevant educational research documents." Stanford Study at PDF p. 5. Upon first using the system, a Stanford professor stated "... to have right there at your fingertips all the volumes of Research in Education, rather than having first to find the right volume and then the right number -- just the physical juggling of those cumbersome volumes is obviated by this, so you can work a lot faster." Stanford Study at PDF p. 3.
How DIALOG allowed NASA to overcome the problem of finding documents within its huge data catalog is described in an August 26, 1969 newspaper article titled "Computer tells NASA scientists where to find space data" which states that DIALOG "helps the scientist refine his query as he makes the search." [Ref15: NASA 1969 Story] Note that this story was published about five weeks after the first lunar landing by Apollo 11.
Programming Challenges DIALOG was first developed using one of the first-produced IBM 360/30 computers. There were five major programming challenges:
- Designing user commands (these are discussed briefly below, and are detailed in [Ref2: ACM Paper]
- Designing file structures (these are described below, and are detailed in [Ref16: Large Databases]
- Writing batch-processing programs to convert databases of various formats as received from producers into the DIALOG internal file 3structure (described below)
- Writing real-time programs for the user to interface and operate on the internal files to accomplish a search
- Writing telecommunications programs to drive display devices and the server computer via the telecommunications controller
The Most Commonly Used Commands
The following is from [Ref6: AIIP Newsletter]:
By 1965, the team had developed a small, working prototype of DIALOG incorporating the design priorities into the following simple commands:
- BEGIN (file number/s) - specifies the file/s to be searched.
- EXPAND (term) - provides a display of alphabetically near terms to the germ entered.
- SELECT (term; set) - creates a Boolean-defined subset of the search file(s) corresponding to the terms and/or sets specified
- TYPE (Set number) - outputs an item or range of items from the set indicated.
Searching DIALOG is as simple in concept as remembering: B E S T. This conceptual design was a model for several later systems such as the IBM Stairs system and the American Chemical Society STN system.
An Internal File Structure to Facilitate Searching DIALOG's design and language requirements were met by a file system consisting of major files which each had a covering index:
- Linear Index and Linear File (LX and LF)
- LX contains a list of accession numbers, each with a pointer to its master record location.
- LF contains the master file of documents to be searched, each identified by an “accession number”
- Inverted Index and Inverted File (IX and IF)
- IX contains an alphabetically sorted list of all the terms found in the master file, with a pointer to the associated IF entry
- IF contains the term-associated lists of accession numbers for each term in the IF
- User Set Index and User Set File (UX and UF)
- UX contains a numerically-sorted list of Set Numbers resulting from user searches, with a pointer to the associated UF entry
- UF contains the set-associated lists of accession numbers for each set in the USX
File organization of the inverted and linear index used an indexed sequential access method (ISAM) arrangement, and the inverted file itself utilized a basic sequential access method (BSAM) organization, as is also used in the linear file.
Recursive searching was possible because IF and UF and their associated indices were in the same format and could therefore be intermixed and used interchangeably.
Features that set this work apart from similar achievements
DIALOG was the pioneer in on-line database literature searching and retrieval. It provided the first valuable tool and experience for scientists, engineers, attorneys and research librarians to examine literature quickly and at minimal cost. DIALOG contributed to the key technical structure and feature aspects of online information retrieval, including Boolean, proximity, field structure and search, specialized inverted and linear indexes, large scale telecommunication front-ends, multiple state-of-the-art processors, and massive storage. [Ref2: ACM Paper] and [Ref3: King Book]
DIALOG's Predecessors
Several computer-based search activities preceded DIALOG, but none of these went substantially beyond an experimental phase. The following excerpts from [Ref5: World Encyclopedia] describe several representative experiments:
• 1951-1954: “Charles Bourne observed that ‘an investigation of online bibliographic searching was first made by Bagley in 1951’ with the development of a program for a computer at the Massachusetts Institute of Technology ‘to search encoded abstracts.’ Bourne noted that ‘application of the computer to bibliographic searching was first demonstrated in 1954 in the form of batch searching.’ “
• 1954-1964: “Over the next 10 years, many research and development efforts culminated in the development of ‘batch’ searches of bibliographic databases offered by a limited number of special libraries. Search analysts coded requests sent to them for literature searches. Several searches were then batched, or run consecutively, to make the most efficient use of the computer’s time. Several weeks generally passed before the requestor received any result. One batch retrospective search service, the Medical Literature Analysis and Retrieval System (MEDLARS) of the National Library of Medicine (NLM), was made available to the general public in 1964.”
• 1960: “Systems Development Corporation (SDC) demonstrated the first interactive online system, Protosynthex, developed by Robert Simons and John Olney, in 1960. Using a terminal wired directly to the computer, Protosynthex allowed access to the full text of the Golden Book Encyclopedia with the ability to search for the occurrence of terms in proximity with each other and to search for truncated forms of words, but not to combine terms with the use of Boolean logic.”
• 1964: “Another online retrieval system was developed at SDC in late 1964 by Harold Borko, H. P. Burnaugh, and W. H. Moore. The system, Bibliographic Organization for Library Display (BOLD), was developed for browsing literature citations on magnetic tapes. It was first publicly demonstrated about a year later and was one of the first systems capable of displaying an online thesaurus. In November 1964 SDC first demonstrated an online system that nearly achieved the interactive capability today’s users enjoy, Language Used to Communicate Information System Design (LUCID), developed for SDC by E. Franks and P. A. DeSimone.”
• 1965: “ ‘The first demonstration of an online retrieval network, on a national scale,’ according to Bourne, ‘was probably made in 1965 by SDC in an experiment ... to provide 13 organizations with access to some 200,000 bibliographic records on foreign technology.’ This work was done by SDC-Dayton for the Foreign Technology Division of Wright-Patterson Air Force Base, Ohio.”
Developed in parallel with DIALOG were the following non-commercial and non-interactive services:
• 1967-1972: “SDC was instrumental in the development of NLM’s online information service, MEDLINE (MEDLARS ON-LINE). In late 1967 NLM experimented with SDC’s Online Retrieval of Bibliographic Information Timeshared (ORBIT) retrieval language to search NLM’s database of 10,000 citations on neurology. In May 1970 SDC began operating the Abridged Index Medicus (AIM)/TWX online information system on behalf of NLM. In October 1970 NLM introduced MEDLINE as a free service on its own computer facilities with a database of more than 400,000 citations while allowing the AIM/TWX service to continue with SDC. In February 1972 NLM utilized TYMNET, the first public telecommunication network, for access to MEDLINE.” [Ref5: World Encyclopedia]
Significant references
Lockheed submitted an unsolicited proposal to NASA in June 1965 [Ref17: Lockheed Proposal], and subsequently received the NASA STAR database of around 250,000 citations. This database was used for much of the testing of DIALOG, which became fully operational with this database in November, 1966. NASA awarded a contract to Lockheed based on its proposal in 1966, and that led to NASA's first remote use of DIALOG to access the STAR database over a leased data line in January 1967.
1. [Ref14: Stanford Study] at PDF p. 14: "Completed at the end of 1966, the DIALOG language was implemented on an IBM 360/30 computer [and later on a 360/40]."
2. [Ref6: AIIP Newsletter] at PDF pp. 5-6: In 1965, NASA created the Scientific and Technical Aerospace Reports (STAR) database of around 250,000 citations. Lockheed used this database as part of the development of DIALOG. "We were awarded a contract from NASA in 1966 and were operational in January of 1967."
3. [Ref17: Lockheed Proposal]: Cover of Lockheed's proposal to NASA for remote search of STAR database
4. [Ref2: ACM Paper]: submitted in 1966, and published in 1967
5. [Ref3: King Book] at pp. 73-74: "When Summit discovered a contract had already been awarded to a competitor, he proposed a smaller, less expensive parallel project as backup if the competitor failed. For the test, Summit leased a data line from the Lockheed offices in Palo Alto, California, to the NASA Ames Research Center. The test was conducted in January 1967. The turnaround time for a query was cut from fourteen hours when conducted at NASA headquarters to just a few minutes using the DIALOG system."
6. [Ref9: History of Info Science] at p. 89: "By 1967, Lockheed was well established in the area of interactive retrieval from their then few databases."
Supporting materials
[Ref1: Berg IEEE Presentation]
[Ref2: ACM Paper]
[Ref3: King Book]
[Ref4: Summit Thesis]
[Ref5: World Encyclopedia]
[Ref6: AIIP Newsletter]
[Ref7: Summit Citations]
[Ref8: History and Heritage]
[Ref9: History of Info Science]
[Ref10: DIALOG Databases Late 1990s]
[Ref11: USPTO Databases]
[Ref12: ProQuest Dialog Databases]
[Ref13: ProQuest Dialog Brochure]
[Ref14: Stanford Study]
[Ref15: NASA 1969 Story]
[Ref16: Large Databases]
[Ref17: Lockheed Proposal]
Dedication Ceremony
Dedication Event Slide Presentation
Map
Map