CS 501 - Software Engineering
Project for the Cornell Legal Information Institute
PDF versions of the United States Code
Software Requirements Document
Version 1.2 (revised - links fixed)
August 27 2004
Project Homesite
Key Stakeholders / Organizations :
Developer Team: Tsung-Yueh Chiou, Tsee Yuan Lee, Kohsuke Kawaguchi, Soyeon Kim, Justin Tung
Project sponsors: Professor Thomas Bruce, Professor William Arms
TA Consultant: Matthew Harris
LII Associates: Sylvia Kwakye, Patrice Crooks, Peter Martin
User Input and Acceptance: Robert Green
Revision History
Date | Version | Description |
2/14/02 | 0.1 | Initial Draft |
3/5/02 | 0.5 | Preview |
3/8/02 | 1.0 | Deliverable |
1. Introduction
2. Operating and Development Environments
2.1 Hardware Platform
2.2 Software Platform
2.3 Additional Software for Project
2.4 Timetable
2.5 Reference books
3. PDF Conversion Requirements
3.1 Interfaces
3.2 Functionality
3.3 Usability
3.4 Reliability
3.5 Performance
3.6 Capacity and Resource Utilization
3.7 Structure Specification
3.8 Acceptable Bugs
4. Document Selection and Order Placement Requirements
4.1 Document Selection
4.2 Shopping Cart
4.3 Usability
4.4 Performance
4.5 Acceptable Bugs
5. Order Fulfillment Requirements
5.1 Functionality
5.2 Usability
5.3 Performance
5.4 Logging
5.5 Invalid Information
5.6 Acceptable Bugs
6. Quality Assurance Requirements
7. Documentation Requirements
7.1 PPD
7.2 Website Users
7.3 LII Personnel and Future CS 501 Teams
7.4 CS 501 Staff
8. Future Considerations
9. Legal Requirements
9.1 Licensing Requirements
9.1.1 Software Development Contract
9.2 Legal, Copyright, and Other Notices
10. References
1. Introduction
The United States Code is released to the general public by the U.S. House of Representatives. This code is an ASCII version to which the
Legal Information Institute (LII) adds value.
The improved version is here.
An earlier CS 501 team (Legal Data Markup Software Project) and a later
student project developed programs for the LII that convert the raw ASCII output of
the House of Representatives to XML, for subsequent reuse in various settings. This XML is then converted to
HTML producing the current website. This conversion was facilitated by XSLT scripts. The existing web user
interface supports a search engine, HTML legal code, cross-referencing, and notes on each section of the code.
The US Code currently gets about half a million hits a day.
Figure 1: LII Front Page
Figure 2: Search or browse the U.S. Code.
Figure 3: A section of the Code.
1.1 Goals
The new project, overseen by Thomas Bruce, Co-Director of the LII, is to create PDF versions of code from the existing XML. In the current website, users navigate through HTML links to see the leaves of the legal code tree, which are called 'sections'. Our goal is to implement a system so that users can print more than one section at a time, and in a format that preserves the meaning of the Code.
Users can pick sections and larger divisions (called 'chapter', 'part', and other similar names) via a shopping cart system. The new PDF format adds printability and portability to the legal code, which was hard to achieve with HTML.
The cart functionality will be integrated into the existing website and will be present in each HTML page of legal code. The idea is to charge the user for the service, which includes the delivery of PDF documents. Customers will also receive additional information (e.g. table of contents, notes, cross-referencing) relevant to their selections. The reason for adding a shopping cart is to charge users for PDF conversion, which inhibits overuse of the system and reminds users not to waste paper, since most saved documents are printed.
The purpose of this document is to define all the requirements of the PDF delivery system needed to:
1.2 Glossary
Since the products are not intended for public consumption, most of the terminology used in this document should be understandable to the technical personnel involved in the project. This glossary serves to provide a common understanding among the parties involved in the project.
Term | Definition |
Legal | |
Appendices | Contains important additional materials pertinent to the Code body |
Catchline | Short headers in the U.S. Code that specify the current position in the organizational hierarchy by including a short description of the ensuing text. |
Chapter, Part | These are sub-divisions that appear within each title. Some may use all of these, and others only a few of them. Others have finer divisions, such as sub-chapters and sub-parts. The logical levels to which these correspond vary from title to title. Elsewhere on the project we have begun to refer to these generically as "supersections." |
Division | Along with sub-division, this is a somewhat informal term that refers to a logical block of the Code, such as a section, sub-part, or title. |
Notes | Each sub-division in the Code may have notes, which contain information relevant to that division. |
Table of Contents | Used to present an overview of the various titles and their sub-divisions. A.k.a. TOC. |
Section | Like chapters and parts, this is a sub-division, and is the lowest division or leaf node of U.S. Code. |
Title | The U.S. Code is currently divided into 50 titles, each on a different topic. Examples: Title 1 is "General Provisions" and Title 26 is the "Internal Revenue Code," otherwise known as the Tax Code. |
U.S. Code | "A consolidation and codification by subject matter of the general and permanent laws of the United States" prepared by the Office of the Law Revision Counsel of the U.S. House of Representatives. Sometimes shortened to the Code. |
Technical | |
DDD | DTD Design Document |
DTD | Document Type Definition |
LDMS | Legal Data Markup Software, a previous CS 501 project that provides tools to convert the U.S. Code (in ASCII format from Congress) to the XML format. It also converts XML into HTML for web users. |
Lula | Name of server running development and application environment (lula.law.cornell.edu) |
LII | Legal Information Institute |
PDD | Program Design Document |
XML | Extensible Markup Language |
XSL | Extensible Stylesheet Language |
XSL-FO | Extensible Stylesheet Language Formatting Objects |
XSLT | Extensible Style Language Transformation |
Xalan | Open-source XSLT engine developed at Apache Software Foundation |
Xerces | Open-source XML parser developed at Apache Software Foundation |
FOP | Open-source XSL-FO engine developed at Apache Software Foundation |
Miscellaneous | |
Website Users | People who access the LII website to obtain the U.S. Code ranging from lawyers, legal/criminal experts, to the general public. |
We expect to add, during development, a section of typographic terms and other specialized vocabulary related to the print version.
1.3 Knowledge Constraints
At the current time, neither the client nor the developers possess the necessary typographical expertise to properly and adequately express the requirements for typesetting in the PDF files. As a result, Version 1.0 of this document does not include the vocabulary or the requirements for this portion of the project. Instead, the developers will meet with the client and other quality assurance personnel to develop such requirements. At this time, the client will provide various samples, and the developers will present outputs that are possible. Differences will then be discussed and a consensus reached on the proper handling.
2. Operating and Development Environments
2.1 Hardware Platform
The client requests that both development work and the production system be hosted on Lula, which is a P-III 800MHz with 2 GB of SDRAM and a RAID array of 2 40GB hard drive. Lula may be replaced sometime in the future. In the meantime, performance and resource requirements are described in terms of Lula, and in terms of users' Internet connection speed where applicable.
2.2 Software Platform
Lula is running on Red Hat Linux v 7.1. The web server is powered by Apache. Java version is 1.3.1.02. The various Apache-derived XSL software - Xalan, Xerces, their FOP processor, and others - are also hosted here. These software are pre-installed by the client. In addition, a PDG Shopping Cart system is also available. Currently, this system provides software purchasing capabilities.
2.3 Additional software for project
Partly due to the fact that this project is mostly funded by Red Hat Linux, and partly at the client's insistence, it is highly encouraged that only free software is used for the production and development processes. The developers do not yet foresee any use for proprietary software at this time.
The developers should know or learn the following technologies in order to complete the project:
2.4 Timetable
Deadline | Goals | Progress |
2/14/02 | Feasibility Report regarding PDF conversion | Completed |
3/5/02 | Requirement Report (presentation & documentation) | Preview |
c. 4/3/02 | Design (presentation & documentation) | Initialized |
c. 4/10/02 | Milestone 1 (PDF converter beta) | Planned |
c. 4/17/02 | Milestone 2 (Shopping cart testing) | Planned |
c. 4/24/02 | Milestone 3 (Project beta) | Planned |
c. 5/3/02 | Final Delivery (presentation & documentation) | Planned |
2.5 Reference books
Prof. Bruce has offered to purchase any books necessary in implementing this project. No book is required at this time.
3. PDF Conversion Requirements
This section discusses the both functional and non-functional requirements relating to the production of PDF files based on XML files from the LDMS project.
3.1 Interfaces
It is assumed that the XML source will be provided at a specified location, supplied to the converter via command-line argument or settings file. The output will be stored in one specified directory or its subfolder, if pre-generation is used, or in a cache directory, if on-the-fly conversion is adopted. There may be a tree structure for the folders, such as one folder for each title.
A wrapper should be provided to convert one title at a time. It should be modifiable so that in the future, more than one title can be converted at a time. It can then be invoked as necessary.
3.2 Functionality
The foremost objective is to faithfully re-create a PDF version of the print edition as provided by the Government Printing Office. Lawyers depend highly on the visual structure of the legal code, and any alteration to the visual format, such as indentation or extra whitespace, can significantly change the meaning of the Code. Due to resource constraints, we must rely on the XML conversion provided by the previous CS 501 team in LDMS, although efforts will be made where bugs are found. The tagging in XML source is thus assumed to be accurate.
The print edition of the U.S. Code thus serves as a useful reference for formatting, such as heading, typefaces, and so on. Sample sections will be provided in the future as neither the developers nor the client currently has the knowledge about typesetting terminology or constraints to specify such requirements at the present time.
There are two ways to convert XML into PDF. Pre-generation means doing the conversion at the time updates are received. On-the-fly means converting at time of order fulfillment - after a user places an order for a document. Most of the requirements are independent of the method of conversion, unless otherwise noted.
3.3 Usability
The PDF conversion tool will be invoked at specified intervals or upon stimuli, such as changes in XML, if pre-generation is used, or invoked automatically, such as when requested by a user, if on-the-fly conversion is used. In either case, maintenance personnel can quickly find out how to operate the conversion, if at all, through documentation.
System administrators and future developers will need approximately 5-7 days to familiarize with the tools and code, to be able to trouble-shoot, fix bugs, and deliver new features.
3.4 Reliability
Due to the large number of users, reliability is a top priority. It should be optimized even at the cost of functionality or performance. Downtime should be limited to an hour per year. The current site and services should remain available during feature addition or bug fixing, and not be affected by any work on this project.
3.5 Performance
Currently, a bot checks for U.S. Code updates on the Congressional website once a day. Each updates usually consists of one title, or a series of consecutive titles. PDF conversion for each title should finish within 12 hours. This applies whether pre-generation is used, in which case the timer starts when XML input becomes available, or whether on-the-fly is used, in which case the 12 hours start when an order for that part of the Code is placed.
3.6 Capacity and Resource Utilization
There is no efficient way that the developers know of to control the amount of resources used for conversion, such as preventing the PDF converter from occupying more than half of the computing cycles. Instead, the process should be able to run with a lower priority than normal tasks, thus not using excessive amounts of system resources.
Storage usage is not expected to be significant. Prototyping tests show no more than 3-to-1 ration between PDF and HTML file sizes in the worst case.
3.7 Structure Specification
The follow is regarding the format of PDFs to be delivered to user:
Each section of the U.S. Code shall begin on a new page.
For higher hierarchies, multiple sections are presented in the same file. To save space (and paper), the developers will investigate the possibility of placing multiple sections on a page when they do not overflow.
Each section (or chapter, part, etc.) shall be delivered in a separate document. Preferably, an optimization will allow the user to receive the immediate supersection if many divisions on the same level are requested.
Each division that is delivered as a document should include extra information relevant to that division, including notes and cross-references.
Each page shall have a header showing the higher hierarchy necessary for identifying page contents. It should include the title number, and the chapter number, part number, and so on, if applicable. A footer showing the page number appears on the page as well, and should correspond to the information on the Table of Contents. A TOC shall be created for each delivered division. Appendices (without restructuring due to inconsistent organizational structures), and catchlines will also be included where available. Structural navigation-aiding tags should be used to create cross-links so as to provide ease of use of documents.
For this project, one level above section, and the individual sections, are desired for conversion. This should be controlled by a line in the settings file so future improvements can be made as resources become adequate.
3.8 Acceptable Bugs
Defects that do not directly contradict the above PDF conversion requirements are deemed acceptable. Examples include long-term memory leaks, spurious error messages, and minor differences between output and print editions.
4. Document Selection and Order Placement Requirements
These relate to what the user should be able to do on the website.
4.1 Document Selection
A user can add a section or its immediate supersection to the shopping cart for delivery in PDF format. The Code will be presented hierarchically as it is. The user can browse by following links that lead to lower levels, such as chapters and sections. Or they may arrive at some division via the search engine. A link to add a particular sub-division to the cart should be placed next to links to such divisions in the hierarchy and on the page containing that section itself.
4.2 Shopping Cart
There will be a link to the shopping cart on all pages where the user can review orders, remove items, or confirm the order. He or she then enters payment or credit card information, which will be passed onto Verisign for verification.
The client shall be responsible for determining the fee structure (charging scheme). The developer shall be responsible for creating a system that supports different charges for differing amounts of material (eg. a section can be charged differently than a chapter).
4.3 Usability
Since the output is delivered from web pages, UI complexity should be kept to a minimum. Users, who are mostly casual users of computers, should be able to intuitively know how to use the site or spend no more than 3 minutes learning though easy to read help documents. Some useful references are e-commerce sites: most people find sites such as Amazon.com easy and intuitive.
4.4 Performance
The website should be quick, since most information is text (or rather, HTML). Response time should not differ significantly than current performance levels.
4.5 Acceptable Bugs
Defects that do not directly contradict selection or placement requirements are deemed acceptable. Examples include bugs in the supplied shopping cart system that do not prevent orders from being placed or fulfilled.
5. Order Fulfillment Requirements
These relate to the experience after an order is confirmed.
5.1 Functionality
Order confirmation should be emailed to the user. Each email shall include the identity of the purchaser, the time of purchase and cost, and the identity of the document.
The AOL email limit, 5 MB, may well be insufficient for certain long sections. Other services may have even lower limits on attachment size. Therefore, documents will not be sent. Rather, links to the generated PDF files will be emailed to the user after they become available. Each section or immediate supersection shall be contained in one file, but the actual file path will be invisible. An indecipherable URL will be generated for each document in every order, and the system shall redirect file requests to the correct file. In this way, users cannot guess or obtain the paths to other files they did not order, but at the same time can download the documents at a convenient time.
5.2 Usability
The system sends links to PDF files to the user by email. It is assumed that the user knows how to download files and use them. A short tutorial will be provided on the site for reference for those who do not.
5.3 Performance
An order confirmation shall be provided by email within 5 minutes after the order is confirmed. The PDF conversion time has already been discussed in the conversion requirements. Since links are provided to users, the files are downloaded directly by users, so that the time to receive the documents will be constrained by the user's Internet connection.
5.4 Logging
Each order should be logged. In the future, this log can be expanded in functionality and will be used for system analysis, usage pattern recognition, and in cases where customers have questions or problems with orders.
5.5 Invalid Information
If a credit card is rejected, the order shall be rejected, and no confirmation should be sent. The user shall be responsible for mistyped email addresses, although efforts will be made to confirm the address more than once. If on-the-fly conversion is used, and document conversion fails, the user will receive an email notice. The system will not support refunds at this time.
5.6 Acceptable Bugs
Defects that do not directly contradict selection or placement requirements are deemed acceptable. Examples include long-term memory leaks.
6. Quality Assurance Requirements
The development team shall be responsible for testing permissible during the Spring 2002 semester.
6.1 Testing
Testing shall be done through the implementation phase, with special emphasis on structural soundness in PDF files and large numbers of simultaneous requests. The development team will meet with the client and other quality assurance personnel at least once a week to resolve problems. Sample outputs will be provided to ensure that the structure does not deviate materially from the print edition of the U.S. Code.
6.2 Bug Reports
Bug reports filed by the client or website users before the end of the project shall be handled by the developers as time permits (all last minute bugs may not be resolved).
7. Documentation Requirements
Documentation is divided into three parts: for LII personnel and future CS 501 teams, for CS 501 staff, and for website users. First, some required documents.
7.1 Program Design Document
Development of the LDMS shall be documented by a program design document (PDD) outlining the implementation. It shall be the central reference for developers responsible for understanding, maintaining, and extending the LDMS. The PDD shall contain a high level view of the LDMS processing engine, detail individual processing components, and display all interfaces, within and external to, the system. To aid in supporting the LDMS, no development diverging from the requirements shall occur without peer approval, without modifying requirements, nor without modifying the PDD.
7.2 Website Users
A simple walk-through will be provided in case a website user is unable to order his or her desired documents. It will link to customer support information. Prominent help should be placed next to ordering links and lead users to walk-throughs and detailed instructions as well as contact information for personal technical support.
7.3 LII Personnel and Future CS 501 Teams
Documentation will be provided so that both LII personnel and future CS 501 teams who work with LII can understand the design and implementation of the PDF conversion, shopping cart system, and PDF delivery system. Scripts modified or created will have helpful documentation in the source code. Other documents will address possible failure modes and suggests remedies.
PDF filenames will be inherited from the corresponding XML files. For instance, Title1Chpt1Sec1.xml will be converted into a PDF file named Title1Chpt1Sec1.pdf. Higher nodes in the hierarchy, such as Title1Chpt1, will be created as well, aggregating all lower levels in that portion of the U.S. code.
7.4 CS 501 Staff
This requirement specification, as well as other documentation in the future, will be designed to allow Professor Arms and TA consultant to understand the progression of the project.
8. Future Considerations
On-the-fly PDF generation is more flexible in allowing more presentation formats. The current project will attempt to allow such a feature to be "plugged in" in the future. Additional plugins may include queueing, so that system administrators can manually assign priorities to user requests (eg. shorter jobs receive higher priority); statistics can be collected to monitor the use of the system and allow intelligent updates and improvements. On the user side, an improved user interface with radio buttons, tree navigation of the code, and an optimized ordering system that incorporates data gathering by the LII may be implemented. In addition, PDF files that contain arbitrary levels of hierarchy may be created with the ability to convert more than one title at once. Logging may be extended to provide more in-depth and advanced data mining regarding usage patterns.
9. Legal Requirements
9.1 Licensing Requirements
The final product should be extendable at the source level by the client. Additionally, the issue of possible revenues generated by such extension of the product must be addressed. As much of the code may result in deriving from freely-available sources, care must be taken to ensure that use of such code does not entail legal duties which are inconsistent with possible future commercial use of the product. Therefore, a contract has been drawn up to address these issues.
9.1.1 Software Development Contract
Both Project Sponsors and Developers agree to the following:
1. That all code, documentation and other copyright-protected material produced in the course of this CS501 project (PROJECT MATERIAL) shall understood by all to be the work of joint authors and not as a work made for hire;
2. That the joint authors shall include all the undersigned, the CS501 students working on the project and Thomas R. Bruce;
3. That despite joint authorship there will be no duty on the part of the student authors, individually or as a group, to account for any return on subsequent commercial use or development of the PROJECT MATERIAL;
4. That, in contrast, should Thomas R. Bruce or the Legal Information Institute realize royalties or other direct financial return from licensing any of the PROJECT MATERIAL there will be a duty to account to the other joint authors for any such revenue net of costs; and
5. That the developers will use care to assure that the PROJECT MATERIAL does not incorporate code covered by copyright and licensed on terms that are inconsistent with unlimited noncommercial distribution.
9.2 Legal, Copyright and Other Notices
At this juncture, the final product shall be distributed without any warranty, express or implied, and without even the implications of merchantability or fitness for a particular purpose. The developers will make every effort to ensure that the product fulfills the requirements listed above. There will, however, be no legal duties to ensure any of them are fulfilled.
10. References
Other sources are documented on the developers' website.