├── README.md └── announce.txt /README.md: -------------------------------------------------------------------------------- 1 | # History of Open-Source IR Systems 2 | 3 | **January 26, 2025** 4 | 5 | Cleaning out personal files, I discovered this gem. 6 | Thought it'd be neat to share with the world. 7 | Below exchange was hidden away in an email thread with Chris Buckley, Ellen Voorhees, and Donna Harman, in the context of the [Lucene for Information Access and Retrieval Research (LIARR)](https://liarr2017.github.io/) at SIGIR 2017. 8 | 9 | Exchange here posted with the permission of all partcipants. 10 | 11 | --- 12 | 13 | **From Me:** 14 | 15 | I was wondering if you could provide us with a history lesson? What what I understand, SMART was the first open-source IR system made widely available for researchers? When was this, and is there a citation I can use (not of SMART in general, but of sharing the system as a toolkit). 16 | 17 | --- 18 | 19 | **From Ellen:** 20 | 21 | Ed Fox, Cornell CS Technical Report 83-09 "Some Considerations for Implementing the SMART Information Retrieval System Under UNIX" https://ecommons.cornell.edu/handle/1813/6400 22 | 23 | Chris Buckley, Cornell CS Technical Report 85-05 "Implementation of the SMART Information Retrieval System" https://ecommons.cornell.edu/handle/1813/6526 24 | 25 | My belief is that the first SMART system to be shared widely was the later of these two systems (Chris' 1985 version). But Ed's version was the version that moved SMART to UNIX which made the sharing possible (or, at least, vastly easier). But I have only very fuzzy knowledge of SMART pre-1980. Donna, did the earlier versions of SMART itself get shared outside of Cornell? 26 | 27 | --- 28 | 29 | **From Me:** 30 | 31 | > Chris Buckley, Cornell CS Technical Report 85-05 "Implementation of the 32 | > SMART Information Retrieval System" 33 | > https://ecommons.cornell.edu/handle/1813/6526 34 | 35 | Yes, I dug this open and skimmed it - however, it says nothing about people being able to download it... 36 | 37 | There are various places with the following FTP site: 38 | 39 | ftp://ftp.cs.cornell.edu/pub/smart/ 40 | 41 | Which, of course, does not existing anymore... 42 | 43 | Some more digging - searching Google scholar for the URL above, the earliest reference I could find was from 1995: 44 | 45 | http://dl.acm.org/citation.cfm?id=215327 46 | 47 | > "we used the unmodified SMART v11.0 software (available from Cornell at ftp://ftp.cs.cornell.edu/pub/smart)" 48 | 49 | --- 50 | 51 | **From Donna:** 52 | 53 | In 1967 the SMART system was spread between Harvard and Cornell, with Harvard still producing the indexing and Cornell the searching. But the system was not shared outside, nor was it shared as late as 1973 so I think that Ed’s move to UNIX was the beginning as Ellen says. 54 | 55 | --- 56 | 57 | **From Chris:** 58 | 59 | I attach the [original announcement](announce.txt) for the release of the 1985 version of SMART. The file date is May 24, 1985 so that's probably about the right time for the initial release. I don't have a citation for it - it got announced via usenet (netnews) as well as other places, but the usenet archives are incomplete for that era and I haven't been able to find an archived copy. I think there were roughly 120 sites that got SMART version 8 directly from us (the final copy of the official mailing list had 113 names on it), and a good number that got it indirectly. 60 | 61 | The history of SMART releases as I remember it: 62 | 63 | The IBM version of SMART at Cornell was worked on in the 1960s-1970s. There was at least one attempt elsewhere to get it running; I think it was a southern US utility company (Louisiana?) that didn't work out. Around 1979-1980, Ed Fox started to re-implement it on the CS department mini-computers and I came along a year later. We decided to use the INGRES database system as our database store - that got things working reasonably quickly, but wasn't a good long-term solution since there was too much overhead getting data in and out of the relations. Ed started the process of taking critical procedures out of Ingres, and I continued the process after Ed graduated. There weren't any public releases in that era, though a couple of sites were running it (Ed took a copy with him, obviously). I finally ditched Ingres altogether, and put together the SMART Version 8 release in 1985. 64 | 65 | SMART was never officially open source. A couple of years after Version 8 was out, we were contacted by a company called Individual, which included ex-Salton student Harry Wu, that wanted to use SMART. Salton and I told them they would eventually have to contact Cornell University to get approval. Cornell didn't have any idea what they were doing, and entirely on their own, negotiated a license with Individual - I never had any contact with Cornell at all! The first I knew anything about it, Salton forwarded a copy of the signed license that Cornell had sent him. That gave Individual an exclusive right to use SMART for routing (SDI) for 10 years (and rights to use it thereafter)! I was not happy. 66 | 67 | I had many long arguments with Cornell Research Foundation over the years trying to get them to release it or at least have a standardised flat fee license for commercial use. Their position was always that if people wanted to use SMART, there must be money in it for Cornell, and they always wanted a percentage of income (they apparently had no other non-patent protected software that had actually made them money.) 68 | 69 | So throughout the years, officially SMART had to be gotten from us after a license was signed saying it was for research purposes only. Sometime after SMART Version 11 (large changes to get SMART to work on TREC sized collections) was available, I put up SMART Version 8 for ftp, but I didn't think I could do more than that. Aside from the first couple of years of SMART Version 8, we never charged for SMART - it was much easier once the Internet could handle distribution instead of having to use tapes! But it was never officially public domain or open source. 70 | 71 | --- 72 | 73 | **From Donna:** 74 | 75 | Just to complete the loop, the early Cornell SMART system was documented in the SMART book, but also in ISR-10 which is in the SIGIR museum. Here is the link to it: 76 | https://sigir.org/files/museum/pub-10/I-1.pdf 77 | -------------------------------------------------------------------------------- /announce.txt: -------------------------------------------------------------------------------- 1 | Announcing the availability of the SMART information retrieval system 2 | from Cornell University! The SMART system offers a natural language, 3 | indexed text approach to information retrieval that is suitable for 4 | both experimental and practical applications. 5 | 6 | In a typical interactive use of the SMART system, a user fills in a 7 | skeleton query with a natural language statement of the user's 8 | information need. This query is indexed using a collection dependant 9 | dictionary and then compared against a collection of indexed documents. 10 | The system judges which documents are likely to be the most useful to 11 | the user and offers the user a chance to view those on-line documents. 12 | Examples of collections of documents currently in use at Cornell include: 13 | 1. Abstracts of all CACM articles (1958 to 1979) 14 | 2. Unix documentation (Volumes 1 and 2) 15 | 3. Netnews archives (eg. net.bugs) 16 | 4. Archives of large electronic mail-boxes. 17 | 18 | The present implementation is the latest in a long line of SMART 19 | implementations dating back to the early 60's. This, however, is 20 | the first that has offered interactive use in addition to being an 21 | extensive test-bed for information retrieval research. 22 | 23 | To give an idea of the scope of SMART, the major modules are 24 | 1. Indexing (includes stemming, stop-word removal, a number of 25 | choices of parsing methods) 26 | 2. Retrieval (includes several types of sequential retrieval, 27 | inverted-file retrieval, extended Boolean retrieval (although 28 | the interactive portion of SMART cannot yet make use of this)). 29 | 3. Display (giving a menu of titles for the user to choose among). 30 | 4. Feedback (automatic construction of a new query given the old 31 | query and judgements of usefulness of previously displayed 32 | documents). 33 | 5. Evaluation (multiple means of judging the effectiveness of 34 | retrieval upon EXPERIMENTAL collections). 35 | 36 | Who should be interested in SMART? First, anybody who wishes to 37 | experimentally investigate information retrieval and who hasn't 38 | developed their own system. Second, anybody with a serious 39 | medium scale information retrieval problem. (Medium scale here 40 | means roughly from six to a couple hundred megabytes of documents.) 41 | 42 | The SMART system is a large system. Source code and binaries are 43 | over 5 megabytes themselves, with umpteen more megabytes taken up by 44 | both the text of the documents and the indexed form of the collection. 45 | During the design of SMART, whenever there was a tradeoff between 46 | time and space, space lost. This allows reasonable response time 47 | to an interactive query (eg., 2 to 3 CPU seconds on a 10,000 document 48 | collection) but means substantial disk space overhead (eg., an 49 | additional 70% of the space taken up by the original documents). 50 | SMART is not for those with small disks! 51 | 52 | The SMART system is written in C, and runs under Berkeley 4.1 or 4.2 53 | UNIX. It is known to run on the various Vaxen and on SUNs (at several 54 | sites); it will be ported to Pyramids and Goulds in the next couple of 55 | months. With the exception of the top-level shell scripts (written for 56 | csh), it should port to System V fairly readily. (Any volunteers?) 57 | 58 | ------------------------------------------------------------------- 59 | To get the SMART information retrieval system 60 | 61 | 1. Fill in the form below, make a hard copy of it and sign it. 62 | 2. Make out a check to "Cornell University" for $150 or a 63 | purchase order for $175. 64 | 3. Mail it and the form to: 65 | 66 | The SMART Project 67 | c/o Chris Buckley 68 | Dept of Computer Science, Upson Hall 69 | Cornell University 70 | Ithaca, New York 14853 71 | 72 | 73 | Upon receipt of your forms, we'll send you a tape containing the SMART 74 | source, binaries, sample collections, and documentation. We'll also 75 | send hard copies of the documentation. An electronic mailing list will 76 | be established for bug reports. 77 | 78 | --------------------------------------------------------------------- 79 | 80 | The form to mail to us is: 81 | 82 | 83 | In exchange for the SMART information retrieval system, I certify the 84 | following: 85 | 86 | a. I will not use any of the SMART system in a commercial product without 87 | obtaining permission from Cornell first. 88 | b. I will keep all copyright notices in the source code, 89 | and acknowledge the source of the software in any use I make of 90 | it. 91 | c. I will not redistribute this software to anyone without permission 92 | from Cornell first. 93 | d. I will keep Cornell informed of any bug fixes. 94 | e. I agree that Cornell offers no warranties or guarantees of any kind 95 | concerning the use of the SMART system and Cornell will not be liable 96 | for any direct or indirect damages resulting from the use of the SMART 97 | system. 98 | f. I am the appropriate person at my site who can make guarantees a-e. 99 | 100 | Your signature, name, position, 101 | phone number, U.S. and electronic 102 | mail addresses. 103 | 104 | We'd also be interested in anything you want to tell us about your plans 105 | for the system. 106 | 107 | --------------------------------------------------------------------- 108 | Tapes will be sent in "tar" format at 1600 bpi unless otherwise arranged. 109 | If you have any questions, problems, etc., send mail to me. 110 | 111 | Chris Buckley (Usenet) {vax135, ihnp4}!cornell!chrisb 112 | Dept of Computer Science (Arpanet, 113 | Upson Hall, Cornell University CSnet) chrisb@cornell 114 | Ithaca, New York 14853 (Bitnet) chrisb%crnlcs 115 | --------------------------------------------------------------------------------