├── Citation ├── ALWD Citation Abbreviation Formats.pdf ├── Cite Unseen, 2007, Gallacher.pdf ├── Neutral Citation, Martin.pdf ├── Proposed Course of Action for UniversalCitation.org, Martin.pdf └── aall_citation_formats_committee_report.pdf ├── CourtListener Studies ├── Michael Lissner │ ├── Paper │ │ ├── Citations │ │ │ ├── Paul Hellyer - Assessing the Influences of Computer-Assisted Legal Research, 2005.pdf │ │ │ ├── Robert Berring - Legal Research and Legal Concepts.pdf │ │ │ └── William G. Harrington - A Brief History of Computer-Assisted Legal Research.pdf │ │ ├── Final_Report_Michael_Lissner_2010-05-07pdf │ │ ├── Outline.txt │ │ ├── README.txt │ │ ├── appendixI.tex │ │ ├── bibliography.bib │ │ ├── features.tex │ │ ├── future.tex │ │ ├── future.txt │ │ ├── intro.tex │ │ ├── paper.tex │ │ ├── problemdefine.txt │ │ ├── solutiondesign.tex │ │ ├── solutiondesign.txt │ │ ├── techdecisions.tex │ │ └── techdecisions.txt │ └── Poster │ │ ├── Search.png │ │ ├── email.png │ │ ├── results.png │ │ ├── save-alert.png │ │ └── v1.svg ├── Roywn McDonald and Karen Rustad │ └── mcdonald_rustad_report.pdf └── Sarah Tyler │ └── sarah_tyler_dissertation.pdf ├── Entity Extraction (UIMA) ├── An Interface for Rapid Natural Language Processing Development in UIMA.pdf ├── Automated Concept Extraction to aid Legal eDiscovery Review.pdf ├── Opinion Mining in Legal Blogs.pdf ├── U-Compare Share and Compare Text Mining Tools with UIMA.pdf └── UIMA - An Architectural Approach to Unstructured Information Processing in the Corporate Research Environment.pdf ├── Judicial Situation ├── Lawyer_Demographics, ABA.pdf ├── PACER - Additional Functional Requirements Group Final Report.pdf ├── Rewiring Old Architecture - Why U.S. Courts Have Been So Slow and.pdf └── twelve.tables or 7-11, Malamud.pdf ├── Keyword Extraction ├── Domain-Specific Keyphrase Extraction - Frank, Paynter and Witten.pdf └── KEA Practical Automatic Keyphrase Extraction - Witten, Paynter, Et al.pdf ├── Presentations ├── General Presentation, Op-Alert, 2010-01-27.odp ├── General Presentation, Op-Alert, 2010-01-27.pdf ├── General Presentation, Op-Alert, 2010-01-27.ppt ├── LVI Proposal, Michael Lissner, Juriscraper, 2012-03-15.pdf ├── LVI-Presentation-Lissner-Juriscraper.odp ├── LVI-Presentation-Lissner-Juriscraper.pdf ├── PatentLawPic132.jpg ├── carver-presentation-abstract-lvi2012-final.pdf └── lvi-presentation-notes.txt └── README.md /Citation/ALWD Citation Abbreviation Formats.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Citation/ALWD Citation Abbreviation Formats.pdf -------------------------------------------------------------------------------- /Citation/Cite Unseen, 2007, Gallacher.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Citation/Cite Unseen, 2007, Gallacher.pdf -------------------------------------------------------------------------------- /Citation/Neutral Citation, Martin.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Citation/Neutral Citation, Martin.pdf -------------------------------------------------------------------------------- /Citation/Proposed Course of Action for UniversalCitation.org, Martin.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Citation/Proposed Course of Action for UniversalCitation.org, Martin.pdf -------------------------------------------------------------------------------- /Citation/aall_citation_formats_committee_report.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Citation/aall_citation_formats_committee_report.pdf -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/Citations/Paul Hellyer - Assessing the Influences of Computer-Assisted Legal Research, 2005.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/CourtListener Studies/Michael Lissner/Paper/Citations/Paul Hellyer - Assessing the Influences of Computer-Assisted Legal Research, 2005.pdf -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/Citations/Robert Berring - Legal Research and Legal Concepts.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/CourtListener Studies/Michael Lissner/Paper/Citations/Robert Berring - Legal Research and Legal Concepts.pdf -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/Final_Report_Michael_Lissner_2010-05-07pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/CourtListener Studies/Michael Lissner/Paper/Final_Report_Michael_Lissner_2010-05-07pdf -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/Outline.txt: -------------------------------------------------------------------------------- 1 | I. Defining the Problem 2 | A. No-cost Current Awareness 3 | B. Audience 4 | C. Almost a side-effect: gradual creation of a corpus 5 | 6 | II. Designing a Solution 7 | A. Interviews with Librarians 8 | B. Interviews with public interest lawyers 9 | 10 | 11 | 12 | III. Technical Decisions 13 | A. Why Python/Django/Sphinx/MySQL/apache/beautiful soup/etc ? 14 | B. Decisions regarding the user interface(s) 15 | C. Anything interesting happen while designing scraper/backend/user 16 | accounts/etc. (optional) 17 | 18 | III. Features and coverage 19 | A. Features 20 | B. Coverage. 21 | 22 | IV. Future Possibilities 23 | A. Expanding coverage (corpus) 24 | B. Expanding features 25 | C. Ideas for monetization consistent with broad no-cost public access? 26 | D. Generalizability: What problems have been encountered and solved in 27 | this context that might arise and be similarly treated in some other 28 | information service on a totally different subject? 29 | 30 | 31 | 32 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/README.txt: -------------------------------------------------------------------------------- 1 | This paper is created with Latex. 2 | 3 | It can be compiled and opened on a Linux machine with: 4 | make veryclean; make pdf; make ; gnome-open paper.pdf 5 | 6 | Some Ubuntu dependencies are: 7 | sudo aptitude install biblatex-dw biblatex gedit-latex-plugin 8 | 9 | It probably has others. 10 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/appendixI.tex: -------------------------------------------------------------------------------- 1 | \clearpage 2 | \section{Appendix of design sketches} 3 | 4 | \begin{flushleft} 5 | \includegraphics[width=0.85\textwidth]{DSC02180.JPG} 6 | 7 | \emph{Figure 1} --- The first sketch of the home page 8 | \vspace{5mm} 9 | 10 | \includegraphics[width=0.85\textwidth]{DSC02184.JPG} 11 | 12 | \emph{Figure 2} --- The user profile page (/profile/settings/) 13 | \vspace{5mm} 14 | 15 | \includegraphics[width=0.85\textwidth]{DSC02183.JPG} 16 | 17 | \emph{Figure 3} --- The alert creation page (/alert/create/) 18 | \vspace{5mm} 19 | 20 | \includegraphics[width=0.85\textwidth]{DSC02185.JPG} 21 | 22 | \emph{Figure 4} --- The user alerts page (/profile/alerts/) 23 | \vspace{5mm} 24 | \end{flushleft} 25 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/bibliography.bib: -------------------------------------------------------------------------------- 1 |  2 | @article{berring_legal_1987, 3 | title = {Legal Research and Legal Concepts: Where Form Molds Substance}, 4 | volume = {75}, 5 | issn = {00081221}, 6 | shorttitle = {Legal Research and Legal Concepts}, 7 | url = {http://www.jstor.org/stable/3480571}, 8 | number = {1}, 9 | journal = {California Law Review}, 10 | author = {Robert C. Berring}, 11 | year = {1987}, 12 | note = {{ArticleType:} primary\_article / Issue Title: {Seventy-Fifth} Anniversary Issue / Full publication date: Jan., 1987 / Copyright © 1987 California Law Review, Inc.}, 13 | pages = {15--27} 14 | }, 15 | 16 | @book{kirby_reports_1788, 17 | title = {Reports of cases adjudged in the Superior court of the state of Connecticut}, 18 | author = {Ephraim Kirby}, 19 | year = {1788} 20 | }, 21 | 22 | @article{harrington_brief_1984, 23 | title = {Brief History of {Computer-Assisted} Legal Research, A}, 24 | volume = {77}, 25 | url = {http://heinonline.org/HOL/Page?handle=hein.journals/llj77&id=553&div=&collection=journals}, 26 | journal = {Law Library Journal}, 27 | author = {William G Harrington}, 28 | year = {1984}, 29 | pages = {543} 30 | }, 31 | 32 | @book{clegg_case_1994, 33 | title = {Case Method {Fast-Track:} A Rad Approach}, 34 | isbn = {{020162432X}}, 35 | shorttitle = {Case Method {Fast-Track}}, 36 | url = {http://portal.acm.org/citation.cfm?id=561543}, 37 | abstract = {From the {Book:In} writing this book I tried to achieve two different objectives. The first objective was to provide an overview of fast-track, the techniques it applies, and particularly the management challenges it presents. The second objective was to provide a handbook for project managers running fast-track projects. Putting these together might have endangered the second, and more pragmatic objective. This I did not want to do, so the book is split into two distinct parts. Chapters 1 to 3 deal with fast-track in general, Chapter 4 to 7 deal with the life-cycle of a project in more detail, identifying the tasks and their deliverable, the techniques and the tools that support them. These later chapters refer back to the relevant sections of the first part of the book so the descriptions of specific techniques are all gathered together in Chapter 3 and not scattered and duplicated through subsequent {chapters.I} hope that this structure will mean a casual reader can gain an understanding of the fast-track approach quickly from the first part of the book, and the practitioner can have an organized and compact reference for daily use from the second {part.In} assembling the second part I frequently found myself adding an extra step or an additional task, for completeness. The result is that, while not all projects will need every step of every task, they are all necessary under some circumstances. Every task is included in a project on the basis of what it produces. If a particular project has no need of the result, for example, documentation of test results, then it should not be produced and the task should be omitted. During the initial planning of a project the template project plan offeredin Chapters 4 to 6 should be assessed and tailored for the project, cutting out the unnecessary tasks and steps. Do not be afraid to prune. The questions to ask are: {"Do} I need this deliverable to manage the project?" and {"Does} the sponsor value this deliverable above its price?" I have used Oracle tools throughout in examples, and in particular I have recommended an approach that exploits the ability of the Oracle {CASE} tools to define rules once then reuse them many times. Some {CASE} tools do not exploit reusability in this way, but the trend in development technology is towards greater reusability so it seems very appropriate {here.Dai} {CleggJuly} 1994}, 38 | publisher = {{Addison-Wesley} Longman Publishing Co., Inc.}, 39 | author = {Dai Clegg and Richard Barker}, 40 | year = {1994} 41 | }, 42 | 43 | @misc{open_web_application_security_project_owasp_2010, 44 | title = {{OWASP} Top 10 - 2010: The Ten Most Critical Web Appilcation Security Risks}, 45 | url = {http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project}, 46 | author = {Open Web Application Security Project}, 47 | year = {2010} 48 | } -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/features.tex: -------------------------------------------------------------------------------- 1 | \section{Additional features} 2 | In addition to the technologies that are visible to users of the site, or which have been mentioned elsewhere in this report, the following are also in use: 3 | \begin{description} 4 | \item[Search engine optimization (SEO).] A priority of the site has been that it be accessible and findable by as many people as possible. As a result, creating a site that is optimized for search engine access and crawling has been a priority. The site has search engine-friendly markup throughout, has an indexed sitemap containing a link to all of the documents in the corpus, and pings all of the major search engines whenever the sitemap has major changes.\footnote{For details about sitemap specification and purpose, see http://sitemaps.org.} Additionally, because each document in the website can be accessed via a link to its case name, number or SHA1 hash, a canonical link is provided in the HTML header to inform search engines that the two URIs are the same.\footnote{http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html.} 5 | \item[OpenSearch plugin.] A browser search tool is available on the site to allow users to query the database directly from their browser. 6 | \item[Traffic and usage monitoring.] In order to monitor the traffic and usage of the site, I have installed and configured the Piwik open source web analytics package. It provides tracking services similar to third party tracking services, but allows the tracking to be self-served, and thus more private. In addition, administrators using the site are not tracked, making the generated statistics as accurate as possible. 7 | \item[Privacy policy and implementation.] Although the site collects very little private information, it is configured to delete all logs after 12 weeks, and has a clear privacy policy indicating what information is collected and how long it is kept. 8 | \item[Atom specification.] For the RSS feeds that are provided on the site, the Atom specification has been chosen for its strict XML conformance. Atom feeds are provided for each court, and can also be dynamically created via the search interface. 9 | \item[Caching.] Caching is completed on four basic levels. Memcached caches the pages of the site in their final form, MySQL caches database queries, and Apache caches compiled versions of the django settings and programs. Finally, the page describing the coverage of the site is generated once nightly and is cached in the database. 10 | \item[User profiles.] User profiles have been set up with complete features, including account deletion, password reset, forgotten password assistance, and profile configuration. 11 | \item[Bug tracking.] Bugs are being tracked on the project page, with 56 beta-targeted bugs closed, and an additional 35 bugs targeted at version 1.0. 12 | \end{description} 13 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/future.tex: -------------------------------------------------------------------------------- 1 | \label{future} 2 | \section{Future possibilities} 3 | Looking at the 35 bugs that are targeted at version 1.0 reveals the future possibilities of the project. Many of the bugs that are filed are the result of comments that were made by current users of the system. A comment that several users have made is that using the more advanced Boolean connectors is too complicated, and that a query builder would be a useful tool. Another suggestion that users have made is for the corpus to expand horizontally, such that it encompasses more courts. Monitoring the site has also revealed that many users are not creating accounts on the site, and thus are using it primarily as a search tool. Since only registered users can create alerts, an open question is how to convert these visitors into registered users more consistently. 4 | 5 | Another question that remains open is the long-term costs of the project. It is currently a relatively inexpensive operation, but if it becomes popular, it could become very expensive very quickly, and due to its daily aggregation of additional content, it will soon need more hardware to hold the database, PDFs and indexes. Some sustaining ideas have been drawn up for the site, and can be implemented if enough users begin using it. These ideas range from advertising on the site to premium services for extreme users. Keeping the site free is a priority, so implementing these ideas carefully is a must. 6 | 7 | During the next few months I will be analyzing these options for future development, and will be selecting those options that provide the most value to the system. As the site grows in popularity and features, it will be necessary to recruit additional developers to expand and maintain the features of the site, but at its current state it provides a much-needed tool to the legal research community, filling a gap that was inadequately served by most other systems, and costly when done well. 8 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/future.txt: -------------------------------------------------------------------------------- 1 | %IV. Future Possibilities 2 | %A. Expanding coverage (corpus) 3 | %B. Expanding features 4 | %C. Ideas for monetization consistent with broad no-cost public access? 5 | %D. Generalizability: What problems have been encountered and solved in 6 | %this context that might arise and be similarly treated in some other 7 | %information service on a totally different subject? 8 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/intro.tex: -------------------------------------------------------------------------------- 1 | \label{intro} 2 | \section{Introduction} 3 | At least as far back as the early thirteenth century, there has been an awareness of the need for legal documents to be organized, cataloged, and archived. The first known English-language case reports were known as the Year Books, which were prepared in England from 1292 to 1535.\cite{berring_legal_1987} Since that time, the collection of such information has been inconsistent, and has changed format numerous times. 4 | In America, the cataloging of legal documents likely began with Ephraim Kirby's Report of Cases,\cite{kirby_reports_1788} in 1785, however it was not until 1880, when the West Publishing group began their Federal Reporter series that a consistent, complete and well-organized catalog of Federal cases was created. Since that time, West's Federal Reporter has become the de facto source of legal citations, and has become known as a careful and complete source of Federal cases. 5 | 6 | In the 1970's however, there was a revolution in the ways that lawyers and academics accessed legal documents, as Computer-Aided Legal Research (CALR) became an increasingly powerful possibility.\footnote{Although it was not until the 1980's that they became commercially viable. Originally, there was a per search cost of up to \$5,000 for queries that are trivial by today's standards. \cite{harrington_brief_1984}} While there was initially much debate in the law librarian field as to the merits of such research methods, with some expressing outright scorn for the new systems, for the most part, the debate has subsided, with most researchers and lawyers accepting the merits of the new systems. 7 | 8 | With these new systems gaining in popularity, and with the ever-decreasing cost of computer hardware and software, a new niche has emerged for free legal research tools and corpora. There are currently a handful of such tools available on the market, including Google Scholar, Resource.org, FindLaw, Justia, LexisOne, and, until recently, AltLaw.\footnote{As of 3 May 2010, AltLaw has posted a notice in their site stating, ``AltLaw.org has shut down, permanently. We would like to thank everyone for their support.'' A cited reason in their explanation is Google's recent entry into the legal research field.} 9 | While the systems with the largest corpora are not yet free to the public, this is nevertheless a huge development in the law, as, for the first time in history, it lowers the barriers of legal research such that lay people can easily complete much of the same research as professional legal scholars and attorneys. As a result, the legal world is opened widely to the public, aiding in their understanding of the law, and allowing them to research matters that are of interest to them. 10 | 11 | One function that these tools lack, however, is a method for their users to stay up to date with new cases as they are issued by the courts. This leaves researchers with few options if they want or need to stay up to date with an area of the law or with a series of cases. One option that they have is to subscribe to mailing lists (electronic or otherwise), which aim to keep lawyers up to date with certain areas of the law by sending regular highlights of cases that they feel are relevant.\footnote{During one of the user interviews, this was also discussed as a method of demonstrating awareness of changes in the law, in the event of a malpractice lawsuit.} This can be a free or inexpensive approach for staying up to date, but the choice of material is not in the hands of users, and separating the wheat from the chaff can be time-consuming, at best. Another option that is available for users is to use existing alert systems, such as Google Alerts, however these can be highly unreliable, as users are subject to the tool's crawl rate, which can take a very long time to discover new content, or which can omit relevant information altogether. A final option that is available to supplement or replace the first two, is to simply visit the court websites on a regular basis, and to check there for any new content of interest. For the most part, this approach works, though it requires a considerable amount of effort, and some courts do not freely publish all of their documents. 12 | 13 | In this paper, I introduce a new service, CourtListener.com, which aims to ease this problem by providing a free and open source platform for the aggregation, organization, search and retrieval of legal documents. The aggregation of new court documents is completed by a daemon on a rolling basis, building a huge corpus, and providing the latest cases from the Federal Courts of Appeal within -- on average -- about fifteen minutes from the moment they are published on the court website. From there, the documents are quickly indexed, and RSS feeds and document listings are updated. Finally, at the close of each day and beginning of each week and month, alerts are emailed to registered users informing them about topics that they have identified as relevant. More details about the creation of the corpus, and the design decisions that went into this are available in section 3. 14 | 15 | In building this system, I spoke with a number of lawyers and academics to understand their needs, and to get input into the design of the system. I will discuss the findings of these informal interviews in section 2, below. Further, after releasing the beta version of the platform, I have received some feedback from users, which I will discuss in section 5, which is devoted to discussions of the future of the platform. 16 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/paper.tex: -------------------------------------------------------------------------------- 1 | \documentclass[11pt]{article} 2 | 3 | %What I'm running to compile the mo': make veryclean; make pdf; make ; gnome-open paper.pdf 4 | %Dependencies on ubuntu: sudo aptitude install biblatex-dw biblatex gedit-latex-plugin 5 | 6 | % This should fix widow and orphan paragraphs, I think [mlissner] 7 | \clubpenalty = 10000 8 | \widowpenalty = 10000 9 | 10 | % Make our citation numbers superscript. No longer necessary with biblatex code below. [mlissner] 11 | %\usepackage{overcite} 12 | 13 | % Make pdf bookmarks for the mo'. Make them NOT have a crazy red border. [mlissner] 14 | \usepackage[bookmarks=true, pdfborder = 0]{hyperref} 15 | 16 | % Make images work 17 | \usepackage[pdftex]{graphicx} 18 | 19 | 20 | % Do citations as footnotes. 21 | \usepackage[style=footnote-dw]{biblatex} 22 | \bibliography{bibliography} 23 | 24 | \begin{document} 25 | 26 | % this title is meh, but sort of works. Might be stronger words than we intend. Might not be. 27 | \title{CourtListener.com: A platform for researching and staying abreast of the latest in the law} 28 | 29 | % in alphabetical order, by last name 30 | \author{Michael Lissner, advised by, and with much debt to Brian Carver} 31 | \date{May 7, 2010} 32 | \maketitle 33 | \begin{quote} 34 | ``Such an undertaking is not only possessed of great intrinsic merit, but, now that it has been fairly inaugurated, It actually appears to present itself in the light of a public necessity.'' 35 | 36 | \hspace*{10mm}\emph{---Peyton Boyle, preface to the first Federal Reporter} 37 | \end{quote} 38 | 39 | 40 | %We'll put our title and TOC on their own page. 41 | %\clearpage 42 | \vspace{20mm} 43 | \tableofcontents 44 | \clearpage 45 | 46 | % external sections 47 | % if we want each section to start on a new page, use \include. figured we didn't (brho) 48 | % Good call. It might look better if we do certain ones. TBD later. [mlissner] 49 | \input{intro} 50 | \input{solutiondesign} 51 | \input{techdecisions} 52 | \input{features} 53 | \input{future} 54 | \input{appendixI} 55 | 56 | \clearpage 57 | %\printbibliography UNCOMMENT THIS TO TURN THE BIBLIOGRAPHY BACK ON 58 | 59 | % No longer necessary with biblatex package. 60 | %\bibliographystyle{plain} 61 | %\bibliography{bibliography} 62 | 63 | \end{document} 64 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/problemdefine.txt: -------------------------------------------------------------------------------- 1 | %I. Defining the Problem 2 | %A. No-cost Current Awareness 3 | %B. Audience 4 | %C. Almost a side-effect: gradual creation of a corpus 5 | 6 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/solutiondesign.tex: -------------------------------------------------------------------------------- 1 | \label{solutiondesign} 2 | \section{Designing a solution} 3 | When initially creating the design of this system, I began by seeking out as many legal technologists, lawyers, academics, and librarians as possible. In addition to numerous informal conversations, I, with the assistance of others, completed a total of seven formal 1-on-1 interviews and two group interview sessions. The seven 1-on-1 sessions were approximately 45 minutes in length, and interviewed people of the following professions: 4 | \begin{itemize} 5 | \item Law professor: 1 6 | \item Practicing lawyer: 1 7 | \item Law technologist: 1 8 | \item Law librarians: 3 9 | \item Private legal researcher: 1 10 | \end{itemize} 11 | In addition, group interviews were completed with the legal team at the Electronic Frontier Foundation (EFF) and with the University of California, Berkeley, School of Law legal librarians group. These interviews proved invaluable to my understanding of the problem space, and legal research on the whole. In addition to helping me shape the scope of the project, these interviews allowed me to bounce ideas off of the people that would likely use or recommend the product, and who were experts in the legal research field. The questions I asked during these interviews attempted to teach me about their day to day work and the motivations for any processes they have or tools they use. 12 | 13 | At the time of the interviews, user participation was considered as a method of accurately and efficiently categorizing and creating content, however a significant finding from the interviews was that nearly all of the people we interviewed felt that their time for researching was severely constrained, and that there was little that a website could do that would motivate them to contribute. To a prompt regarding whether people would contribute to a system if it meant creating a public good, one public interest attorney expressed that, ``For people to contribute, it would have to benefit them.'' By this comment, he expressed his opinion that contributing to a public good would not be sufficient motivation for busy lawyers, and that any contribution would have to directly benefit the person making it. This is a common sentiment among users, and creating systems in which users feel that their work towards the public good is also for their own good is indeed challenging. As a result of this finding however, ideas for user-contributed content were set aside, and the site was designed to be exclusively unidirectional with regards to content production and curation. 14 | 15 | A notable person that I interviewed was a private legal researcher. As a part of her job, each morning she spends an hour or two researching new cases. Generally, she looks for two different items while researching. First, she attempts to identify any new opinions from California courts that could be relevant to the firm where she works, and second, she looks for any ongoing cases that are in her area of the law. I discussed with her the tools she uses, and discovered that for the most part, she relies on curated electronic email lists, as discussed in section 1, and on browsing court websites manually, traipsing for hints of relevant cases. She was very excited to hear about the planned platform, but disappointed that it would initially only contain records for federal cases since her area of research was state law. 16 | 17 | During the other interviews, I attempted to learn more specific details about the kinds of expectations users will have when approaching a new research tool. Some questions that I aimed to answer were whether users would be comfortable with Boolean searching, what kinds of Boolean connectors they might find valuable, whether they use RSS feeds, and the kinds of document categorization they might expect. The result of these inquiries indicated that the primary users of this tool are highly sophisticated users. Most of the people interviewed knew about or used RSS feeds, and all of them were familiar with Boolean connectors. When speaking to the EFF legal team, we were able to determine which connectors people valued.\footnote{Specifically, they mentioned: Number of word occurrences, sentence and paragraph containment, and quorum identification (e.g. find two of the eight following words).} Most of their requests are now possible on CourtListener.com. 18 | 19 | As for the kinds of categorization users wanted, the interviews revealed that users felt that more categorization was always better. At the time of the interviews, consideration was given to creating a system that semantically analyzed, and automatically extracted and categorized a court opinion along a range of categories such as the judges, legal domain, precedential nature, plaintiffs, defendants, case name and case number. Ultimately, most of these categories were not implemented, however the case name, number, date and precedential status are all obtained and placed in the database. 20 | 21 | Other design considerations that were made early on were that a clean and simple interface was a must, and that the site itself must have minimal visual clutter, with as much standards-compliance and accessibility as possible. These decisions were made in an effort to make the site as useful as possible to as many people as possible, and to minimize visual distractions, making users more efficient. 22 | 23 | From these findings, and the above design considerations, the design of the site proceeded along two paths. First, a so-called ``MoSCoW'' document was drawn up that contained lists of the things the site Must, Should, Could, and Wouldn't do.\cite{clegg_case_1994} This document served the purpose of listing and prioritizing all the ideas that were on the table for the project. The second path that was followed was translating the emerging MoSCoW analysis into a database model, URL design, and interface sketches.\footnote{See appendix I for details.} Once these plans were created, designing and building the site was largely a matter of choosing and implementing appropriate technology solutions. 24 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/solutiondesign.txt: -------------------------------------------------------------------------------- 1 | %II. Designing a Solution 2 | %A. Interviews with Librarians 3 | %B. Interviews with public interest lawyers 4 | 5 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/techdecisions.tex: -------------------------------------------------------------------------------- 1 | \label{techdecisions} 2 | \section{Technical decisions} 3 | In designing the CourtListener.com platform, I made many technical decisions. In this section, I will delve deeply into a few of the more difficult decisions, and will provide an overview of the reasoning behind their final outcomes.\footnote{As was mentioned in the introduction, since this is an open source project, all of the code is available online under the third version of the GNU Affero General Public License. This license was chosen because it allows people to copy and use the code for free, but requires that they publicly share any modifications that they make to it. The more-common GNU General Public License (GPL) similarly allows the code to be used at no cost, but only requires that changes to the code be shared with the public if the resulting program itself is distributed to the public. Because this project is server-based, the program is never technically distributed, and so might not have the same protections if covered by the GNU GPL. The Affero General Public License closes this loophole by requiring all modifications to the code be shared. To browse the code, please see: http://bitbucket.org/mlissner/legal-current-awareness.} 4 | 5 | Going into the creation process, some initial decisions were made simply to limit the possibilities. Because of my previous experience with Python and the Linux, Apache, MySQL stack, I decided early on to use these technologies. Building on this, two major decisions had to be made. First, I had to determine the best search engine, and second, I had to decide on a web framework to use as my Object Relational Model (ORM) and templating engine. 6 | 7 | For the question of which search engine to use, I completed an in-depth review of every open source search engine I could find.\footnote{For details please see the spreadsheet located in the project repository, at http://bitbucket.org/mlissner/legal-current-awareness/raw/b35105d6a233/Documents /Search\%20Engine\%20Analysis,\%202010-02-06.ods} I examined each search engine along 16 dimensions, including community support, documentation, features, license, and code base size, among others. Once I had identified the three open source search engines that appeared the best, I took a close look at their Boolean support, simplicity of design, and features. Ultimately, I decided on Sphinx Search because it has sophisticated Boolean support,\footnote{For details of the Boolean syntax supported, see https://www.courtlistener.com/search/ advanced-techniques.} a relatively small code base size, an active community, and an engaged developer.\footnote{http://sphinxsearch.com/} This decision has worked out well, as it was possible to link Sphinx directly to the MySQL database, and it provides very fast and accurate search results, even for very complicated queries. An unanticipated side-effect of using such a powerful search engine is that it builds a very large search index. Numerous times during the corpus aggregation phase, the index filled the entire hard drive, and a larger plan with the server provider had to be purchased. This problem has largely been solved by removing some of Sphinx's more powerful search capabilities, such as infix searching, and by implementing a main+delta reindexing scheme. 8 | 9 | The main+delta reindexing scheme creates two indexes that Sphinx searches. The first is the main index, which contains full-text search indexes for about 130,000 legal opinions, and is currently about 4.1GB in size. Recreating this index currently takes the server about an hour to complete, during which time a copy of the index is created, thus doubling it in size. The second index -- the so-called delta index -- contains only the newest documents, is about 20MB in size, and takes about a minute to reindex. Thus, each hour, it is possible for the indexer to add new documents to the delta index, and once every two months, in the middle of the night, the two indexes are merged. 10 | 11 | 12 | The second decision that greatly shaped the development of the project was to use Django as the web framework.\footnote{http://www.djangoproject.com/} This decision was made in part because I had used it in the past, and in part because it supported all of the features that were on the MoSCoW analysis mentioned in section 2. This decision has worked out well, as many of the more complicated features of the site, such as pagination of search results, syndication, form creation and validation, and security are all built into Django. Not having to worry about data validation or more complicated things such as cross-site request forgeries (CSRF) made building the features of the site more appealing and streamlined.\footnote{The Open Web Application Security Project identifies CSRF as a ``Widely prevalent'' security weakness, and lists it as number five on its top ten list of Critical Web Application Security Risks. \cite{open_web_application_security_project_owasp_2010}} An additional benefit of the Django framework is the admin interface that it provides: on the back end, it is possible to browse and edit all of the data in the system, and creating tie-ins on the front end for content administration is under way, with each document in the corpus having an ``Edit'' link available to administrators in the side navigation panel. 13 | 14 | One of the more complicated features of CourtListener.com is its pluggable court scraper and PDF extractor. This part of the platform has undergone many iterations, starting with a basic scraper that crashed regularly and silently, and ending at its current version as a multi-threaded daemon that is running all the time on the server, and which downloads the latest opinions -- on average -- within 15 minutes of their posting. Designing the scraper to be reliable, efficient and have low bandwidth requirements has been a major challenge. The current implementation can be started in PDF parse and/or scrape mode, has three verbosity levels (debug, chatty and silent), can be told which courts to scrape, and uses the following algorithm: 15 | \begin{enumerate} 16 | \item{Download the HTML of the court website, and generate a digital fingerprint of it. Check that fingerprint in the database to see if the site has changed.} 17 | \item{If the site has changed, build a tree out of its HTML, and use XPath to identify the relevant leaves of the tree to analyze. If it has not, move to the next court.} 18 | \item{Begin downloading the first PDF opinion from the site, and generate a digital fingerprint of it. If the fingerprint is already in the database, move to the next PDF. If three PDFs in a row are already in the database, move to the next court.} 19 | \item{If the fingerprint of the PDF is not already in the database, parse the leaves of the tree, and extract and format the relevant information from them. Once all the information has been successfully extracted place it all in the database, and save the PDF to disk.} 20 | \item{Extract the text from any downloaded PDFs, sleep for a few minutes, then repeat this process for each PDF in each court requested.} 21 | \end{enumerate} 22 | The result of this algorithm is that each court is visited about once every half hour, and changes to the court website are identified at that time. Since PDFs are large files, this minimizes the number of PDFs that are downloaded, and duplicates are eliminated at the source. 23 | 24 | Another major issue that I have encountered has been scaling the site. Since the site now contains the almost the entire Supreme Court record, and thousands of documents for other courts, completing tasks such as a simple lookup of a record in the database have begun to slow down. The solution to this has been to aggressively implement database caching and indexing and front end caching through memcached for users that are not logged in. This has eliminated much of the latency problem that the platform initially had, but some queries need to be optimized manually. MySQL is currently logging any query that takes too long to finish, and I will be analyzing the results of this log soon. 25 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Paper/techdecisions.txt: -------------------------------------------------------------------------------- 1 | %III. Technical Decisions 2 | %A. Why Python/Django/Sphinx/MySQL/apache/beautiful soup/etc ? 3 | %B. Decisions regarding the user interface(s) 4 | %C. Anything interesting happen while designing scraper/backend/user 5 | %accounts/etc. (optional) 6 | -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Poster/Search.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/CourtListener Studies/Michael Lissner/Poster/Search.png -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Poster/email.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/CourtListener Studies/Michael Lissner/Poster/email.png -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Poster/results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/CourtListener Studies/Michael Lissner/Poster/results.png -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Poster/save-alert.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/CourtListener Studies/Michael Lissner/Poster/save-alert.png -------------------------------------------------------------------------------- /CourtListener Studies/Michael Lissner/Poster/v1.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 24 | CourtListener.com 26 | 28 | 30 | 34 | 38 | 39 | 42 | 46 | 50 | 51 | 58 | 63 | 64 | 71 | 78 | 85 | 92 | 99 | 106 | 113 | 120 | 122 | 126 | 130 | 131 | 133 | 137 | 141 | 142 | 144 | 148 | 152 | 153 | 155 | 159 | 163 | 164 | 166 | 170 | 174 | 175 | 177 | 181 | 185 | 189 | 190 | 192 | 196 | 200 | 201 | 203 | 207 | 211 | 215 | 216 | 223 | 234 | 245 | 256 | 267 | 278 | 288 | 290 | 294 | 298 | 302 | 303 | 305 | 309 | 313 | 314 | 316 | 320 | 324 | 325 | 333 | 337 | 341 | 342 | 350 | 354 | 358 | 359 | 361 | 365 | 369 | 370 | 381 | 384 | 388 | 392 | 393 | 403 | 405 | 409 | 413 | 417 | 418 | 429 | 432 | 436 | 440 | 441 | 452 | 459 | 466 | 476 | 478 | 482 | 486 | 490 | 491 | 494 | 498 | 502 | 503 | 514 | 525 | 527 | 531 | 535 | 536 | 538 | 542 | 546 | 547 | 558 | 560 | 564 | 568 | 569 | 580 | 588 | 592 | 596 | 597 | 608 | 616 | 620 | 624 | 625 | 635 | 637 | 641 | 645 | 649 | 650 | 661 | 672 | 679 | 689 | 691 | 695 | 699 | 703 | 704 | 715 | 718 | 722 | 726 | 727 | 738 | 740 | 744 | 748 | 749 | 760 | 762 | 766 | 770 | 771 | 782 | 784 | 788 | 792 | 793 | 804 | 812 | 816 | 820 | 821 | 832 | 840 | 844 | 848 | 849 | 859 | 861 | 865 | 869 | 873 | 874 | 885 | 892 | 902 | 904 | 908 | 912 | 916 | 917 | 928 | 931 | 935 | 939 | 940 | 942 | 946 | 950 | 951 | 953 | 957 | 961 | 962 | 973 | 975 | 979 | 983 | 984 | 995 | 1003 | 1007 | 1011 | 1012 | 1023 | 1031 | 1035 | 1039 | 1040 | 1050 | 1052 | 1056 | 1060 | 1064 | 1065 | 1076 | 1087 | 1098 | 1105 | 1112 | 1117 | 1118 | 1125 | 1132 | 1137 | 1138 | 1145 | 1152 | 1157 | 1158 | 1165 | 1172 | 1177 | 1178 | 1185 | 1192 | 1199 | 1204 | 1205 | 1212 | 1219 | 1224 | 1225 | 1232 | 1239 | 1246 | 1251 | 1252 | 1259 | 1266 | 1273 | 1280 | 1287 | 1294 | 1299 | 1300 | 1307 | 1314 | 1321 | 1328 | 1333 | 1334 | 1341 | 1348 | 1355 | 1360 | 1361 | 1368 | 1375 | 1382 | 1389 | 1396 | 1401 | 1402 | 1409 | 1416 | 1423 | 1430 | 1439 | 1448 | 1455 | 1462 | 1467 | 1468 | 1477 | 1487 | 1488 | 1516 | 1527 | 1531 | 1535 | 1539 | 1543 | 1547 | 1551 | 1555 | 1559 | 1560 | 1562 | 1563 | 1565 | image/svg+xml 1566 | 1568 | CourtListener.com 1569 | May, 2010 1570 | 1571 | 1572 | Michael Lissner 1573 | 1574 | 1575 | 1576 | 1577 | Michael Lissner 1578 | 1579 | 1580 | 1581 | 1582 | UC Berkeley School of Information 1583 | 1584 | 1585 | 1586 | 1587 | 1588 | 1590 | 1591 | 1593 | 1595 | 1597 | 1599 | 1601 | 1603 | 1605 | 1606 | 1607 | 1608 | 1613 | 1621 | 1628 | 1637 | 1646 | CourtListener.com 1661 | Using the power of search to deliver custom alerts about Federal Appellate Court opinions 1672 | Michael Lissner 1683 | advised by Brian Carver 1694 | UC Berkeley School of Information 1705 | 1708 | 1712 | 1716 | 1720 | 1724 | 1728 | 1732 | 1736 | 1740 | 1744 | 1745 | 1748 | 1752 | 1756 | 1760 | 1764 | 1768 | 1772 | 1776 | 1780 | 1784 | 1785 | 1788 | 1792 | 1796 | 1800 | 1804 | 1808 | 1812 | 1816 | 1820 | 1824 | 1825 | 1828 | 1832 | 1836 | 1840 | 1844 | 1848 | 1852 | 1856 | 1860 | 1864 | 1865 | 1872 | 1875 | 1878 | 1881 | 1885 | 1892 | 1897 | 1902 | 1903 | 1904 | 1908 | 1916 | 1925 | 1928 | 1932 | 1940 | 1944 | 1948 | 1949 | 1953 | 1961 | 1965 | 1969 | 1970 | 1974 | 1975 | 1979 | 1983 | 1984 | 1989 | 1994 | 1995 | 1999 | 2012 | 2013 | 2014 | 2017 | 2020 | 2024 | 2031 | 2036 | 2041 | 2042 | 2043 | 2047 | 2055 | 2064 | 2067 | 2071 | 2079 | 2083 | 2087 | 2088 | 2092 | 2100 | 2104 | 2108 | 2109 | 2113 | 2114 | 2118 | 2122 | 2123 | 2128 | 2133 | 2134 | 2138 | 2151 | 2152 | 2153 | 2156 | 2159 | 2163 | 2170 | 2175 | 2180 | 2181 | 2182 | 2186 | 2194 | 2203 | 2206 | 2210 | 2218 | 2222 | 2226 | 2227 | 2231 | 2239 | 2243 | 2247 | 2248 | 2252 | 2253 | 2257 | 2261 | 2262 | 2267 | 2272 | 2273 | 2277 | 2290 | 2291 | 2292 | 2295 | 2298 | 2302 | 2309 | 2314 | 2319 | 2320 | 2321 | 2325 | 2333 | 2342 | 2345 | 2349 | 2357 | 2361 | 2365 | 2366 | 2370 | 2378 | 2382 | 2386 | 2387 | 2391 | 2392 | 2396 | 2400 | 2401 | 2406 | 2411 | 2412 | 2416 | 2429 | 2430 | 2431 | 2432 | 2436 | 2440 | 2444 | 2448 | 2452 | 2456 | 2460 | 2464 | 2471 | Crawlers&Extractors 2492 | Court websites are crawled every15 minutes for new documents 2508 | 2513 | 2517 | Crawlers and parsers visit courtwebsites, download documents,categorize them and store themin the database. 2543 | 2546 | 2549 | 2556 | 2558 | 2559 | 2562 | 2568 | 2570 | 2575 | 2582 | 2586 | 2590 | 2593 | 2596 | 2599 | 2600 | 2605 | 2610 | 2617 | 2621 | 2625 | 2629 | 2633 | 2637 | 2641 | 2645 | 2648 | 2651 | 2654 | 2655 | 2660 | 2661 | 2662 | 2663 | 2664 | SearchEngine 2680 | 2685 | The database is indexed, and users can make querieson the index. 2706 | 2707 | 2720 | 2727 | 2735 | Results are displayed to users in reverse chronological order, with newest hits at the top. 2746 | 2747 | 2750 | 2757 | 2764 | 2765 | 2770 | Users can save the query as an alert in their profile. 2786 | 2789 | 2793 | 2797 | 2801 | 2804 | 2807 | 2810 | 2813 | 2817 | 2821 | 2825 | 2828 | 2829 | 2834 | 2847 | ...as was stated in Citizens United... 2865 | Courts make new documents that trigger alerts 2876 | 2886 | 2901 | 2912 | 2923 | 2931 | 2933 | 2936 | 2943 | Emails are delivered when new results are found by the scrapers. 2954 | 2955 | 2962 | 2963 | CourtListener.com aims to ease the difficulty of staying up to date with court documents by providing a free and open source platform for the aggregation, organization, search and retrieval of legal documents. The aggregation of new court documents is completed by a custom-designed web crawler on a rolling basis, building a huge corpus, and providing the latest cases from the Federal Courts of Appeal within – on average – about fifteen minutes from the moment they are published on the court website. From there, the documents are quickly indexed and Atom feeds and document listings are updated. Finally, at the user's option, alerts are emailed either daily, weekly, or monthly informing them about the documents that matched their alert queries. 2996 | 2997 | -------------------------------------------------------------------------------- /CourtListener Studies/Roywn McDonald and Karen Rustad/mcdonald_rustad_report.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/CourtListener Studies/Roywn McDonald and Karen Rustad/mcdonald_rustad_report.pdf -------------------------------------------------------------------------------- /CourtListener Studies/Sarah Tyler/sarah_tyler_dissertation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/CourtListener Studies/Sarah Tyler/sarah_tyler_dissertation.pdf -------------------------------------------------------------------------------- /Entity Extraction (UIMA)/An Interface for Rapid Natural Language Processing Development in UIMA.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Entity Extraction (UIMA)/An Interface for Rapid Natural Language Processing Development in UIMA.pdf -------------------------------------------------------------------------------- /Entity Extraction (UIMA)/Automated Concept Extraction to aid Legal eDiscovery Review.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Entity Extraction (UIMA)/Automated Concept Extraction to aid Legal eDiscovery Review.pdf -------------------------------------------------------------------------------- /Entity Extraction (UIMA)/Opinion Mining in Legal Blogs.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Entity Extraction (UIMA)/Opinion Mining in Legal Blogs.pdf -------------------------------------------------------------------------------- /Entity Extraction (UIMA)/U-Compare Share and Compare Text Mining Tools with UIMA.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Entity Extraction (UIMA)/U-Compare Share and Compare Text Mining Tools with UIMA.pdf -------------------------------------------------------------------------------- /Entity Extraction (UIMA)/UIMA - An Architectural Approach to Unstructured Information Processing in the Corporate Research Environment.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Entity Extraction (UIMA)/UIMA - An Architectural Approach to Unstructured Information Processing in the Corporate Research Environment.pdf -------------------------------------------------------------------------------- /Judicial Situation/Lawyer_Demographics, ABA.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Judicial Situation/Lawyer_Demographics, ABA.pdf -------------------------------------------------------------------------------- /Judicial Situation/PACER - Additional Functional Requirements Group Final Report.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Judicial Situation/PACER - Additional Functional Requirements Group Final Report.pdf -------------------------------------------------------------------------------- /Judicial Situation/Rewiring Old Architecture - Why U.S. Courts Have Been So Slow and.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Judicial Situation/Rewiring Old Architecture - Why U.S. Courts Have Been So Slow and.pdf -------------------------------------------------------------------------------- /Judicial Situation/twelve.tables or 7-11, Malamud.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Judicial Situation/twelve.tables or 7-11, Malamud.pdf -------------------------------------------------------------------------------- /Keyword Extraction/Domain-Specific Keyphrase Extraction - Frank, Paynter and Witten.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Keyword Extraction/Domain-Specific Keyphrase Extraction - Frank, Paynter and Witten.pdf -------------------------------------------------------------------------------- /Keyword Extraction/KEA Practical Automatic Keyphrase Extraction - Witten, Paynter, Et al.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Keyword Extraction/KEA Practical Automatic Keyphrase Extraction - Witten, Paynter, Et al.pdf -------------------------------------------------------------------------------- /Presentations/General Presentation, Op-Alert, 2010-01-27.odp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Presentations/General Presentation, Op-Alert, 2010-01-27.odp -------------------------------------------------------------------------------- /Presentations/General Presentation, Op-Alert, 2010-01-27.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Presentations/General Presentation, Op-Alert, 2010-01-27.pdf -------------------------------------------------------------------------------- /Presentations/General Presentation, Op-Alert, 2010-01-27.ppt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Presentations/General Presentation, Op-Alert, 2010-01-27.ppt -------------------------------------------------------------------------------- /Presentations/LVI Proposal, Michael Lissner, Juriscraper, 2012-03-15.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Presentations/LVI Proposal, Michael Lissner, Juriscraper, 2012-03-15.pdf -------------------------------------------------------------------------------- /Presentations/LVI-Presentation-Lissner-Juriscraper.odp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Presentations/LVI-Presentation-Lissner-Juriscraper.odp -------------------------------------------------------------------------------- /Presentations/LVI-Presentation-Lissner-Juriscraper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Presentations/LVI-Presentation-Lissner-Juriscraper.pdf -------------------------------------------------------------------------------- /Presentations/PatentLawPic132.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Presentations/PatentLawPic132.jpg -------------------------------------------------------------------------------- /Presentations/carver-presentation-abstract-lvi2012-final.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/freelawproject/related-literature/9a349b5e1bae3320494359ba67fb9aa969579451/Presentations/carver-presentation-abstract-lvi2012-final.pdf -------------------------------------------------------------------------------- /Presentations/lvi-presentation-notes.txt: -------------------------------------------------------------------------------- 1 | Question: 2 | - How long? 30 min + 15 min Q&A 3 | - What day? Monday, 1pm 4 | - When due? Sept. 15 5 | - What due? 6 | 7 | Process: 8 | - Figure out above questions 9 | - Read proposal 10 | - Read Brian's proposal 11 | - Write outline 12 | - Make slides 13 | - Prep remarks 14 | - Verify quality of the scrapers (are they still working?) 15 | - Make schedule of what I want to see (and share it) 16 | 17 | - CL background (3) 18 | - Bio (2) 19 | - Not lawyer 20 | - Not CS 21 | - But I am xyz (michaeljaylissner.com) 22 | - Technical: 23 | - Features 24 | - Juriscraper (10) 25 | - Extensibility (for Video, oral arguments, etc) 26 | - Federal Appeals Courts (and some state courts) 27 | - Varied (possibly i18n) geographies 28 | - The code: 29 | - DRY 30 | - OO 31 | - Small 32 | - Python (PEP8) & Xpath 33 | - lxml 34 | - requests 35 | - chardet 36 | - No repitition 37 | - Many 38 | - Automated character detection and conversion to utf8 39 | - Simple installation 40 | - Friendly and transparent to court websites 41 | - Harmonization of words (USA, vs.) and dates 42 | - Title casing of case names 43 | - Sanity checking and (rudimentary) alerts 44 | - Hard failure design 45 | - Caller (with state store) (8min) 46 | - Duplicate detection/elimination 47 | - Minimization of impact on court websites 48 | - Mimetype detection 49 | - OCR 50 | - Special font work (http://www.michaeljaylissner.com/blog/and-the-winning-font-in-court-documents-is) 51 | - "Decryption" 52 | - LXML tester tool (2 minutes) 53 | - Future (4 minutes) 54 | - Better alerts 55 | - Rate throttling per court 56 | - HTML Tidying 57 | - API refactoring 58 | - More courts! 59 | - More contributors 60 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Literature 2 | ========== 3 | 4 | In this repository we keep the many documents, presentations and other 5 | materials that have influenced our thinking or that we have produced 6 | ourselves. Have a look around and see if anything strikes your fancy. 7 | 8 | If you find something that you think we might find interesting, give a whistle 9 | with a pull request. We'll likely be happy to add it! 10 | 11 | If you want to talk to us about this kind of thing, [contact us here](https://free.law/contact/). 12 | --------------------------------------------------------------------------------