CS 501 - Software Engineering
Feasibility Report
Project for the Cornell Legal Information Institute
PDF versions of the United States Code
Version 1.1 (revised)
November 3 2003
Project Homesite
Key stakeholders/organizations :
Developer Team:
Tsung-Yueh Chiou, Tsee Yuan Lee, Kohsuke Kawaguchi, Soyeon Kim, Justin Tung
Project sponsors: Thomas Bruce, Professor William Arms
Table Contents
I. Introduction
II. Software Development Plan
I. Introduction
1. Project Outline
The United States Code is released to the general public by the US House of
Representatives on its Web site at
http://uscode.house.gov/download.htm.
This code is a plain-vanilla ASCII version to which the Legal Information
Institute (http://lii.law.cornell.edu/) adds value. This website is at:
http://www4.law.cornell.edu/uscode/.
An earlier CS 501 team (Legal Data Markup Software Project) and a later student project developed programs for the Legal Information Institute (LII) that convert the raw ASCII output of the House of Representatives to XML, for subsequent reuse in various settings. This XML is then converted to HTML producing the current website. This conversion was facilitated by XSLT scripts. The existing web user interface supports a search engine, HTML legal code, cross-referencing, and notes on each section of the code. The US Code currently gets about half a million hits daily.
The new project, overseen by Thomas Bruce, Co-Director of the Legal Information Institute, is to create PDF versions of code from the existing XML. In the current website, users navigate through HTML links to see the leaves of the legal code tree, which are called 'sections'. Our goal is to implement a system where user can pick sections and more larger divisions (called as 'chapter' or 'part') via a shopping cart system. The new PDF format adds printability and portability to the legal code, which was hard to achieve with the HTML version. The cart functionality will be integrated into the existing website and will be present in each HTML page of legal code. The idea is then to possibly charge the user for the service followed by on-the-fly XML to PDF conversion using XSL. Then the generated PDF documents containing the sections of code requested by the user along with additional information (e.g. table of contents, notes, cross-referencing) will be delivered to the user.
2. Scope
Required
- Pre-generate PDFs for bottom elements of legal code tree
- Pre-generate PDFs for larger divisions
- Modify existing XSLT script to incorporate PDF functionality in existing website
Desired
PDF generation of US legal code for either:
- on-the-fly PDF generation
- an extensible caching-queuing framework
- Shopping cart payment system and charging scheme for PDFs
Optional
- Database to support queuing and to track user statistics
3. Project Relations
Another team is working at LII on improving XML format of US code but does not directly affect our project.
4. Benefits
- Convenience of legal code in a PDF format available to public as supposed to website format
- Ability to provide an organized format for legal code in PDF for printing purposes
- Shopping cart payment system for LII
II. Software Development Plan
1. Constraints
The project is funded by Red Hat, therefore the client expressed a preference to rely solely on open-source software. Given the number of visitors to the website (approx. 9 million per week), the system must be able to handle the large volume. Experience-wise, the project team must learn new programming languages and environments.
2. Obstacles and Risk
i. Scalability
At this moment we are not sure whether the currently deployed shopping cart
system can handle a large number of code sections which we will need to handle.
Also, it is not clear how many users will use this PDF service. Given the number
of visitors to the website, the load of the system can be huge.
ii. Lack of Experience
The developers' need to learn several systems and technologies is apparent in
several areas. The production system is running on Linux, but none of the team
members has good Linux background. The project needs to use various XML-related
technologies and CVS fairly extensively, but most of the team members need to
learn those technologies and CVS. In addition, none of the team members has
experience working with the shopping cart system and the payment system currently
deployed. Regarding legacy systems, the existing system that produces HTML from
US code has to be changed, therefore the team must spend some resources to become
knowledgeable about it.
iii. Integration with existing systems
The new system might need to coordinate with the currently deployed shopping
cart system. However, at this moment we are not sure if this system provides
such a coordination mechanism.
iv. US Legal Code Complexities
The code contains many irregularities in structure and data organization.
Also, the DTD does not capture the precise semantics of the code.
3. Deliverables
- Feasibility Report
- Requirements Analysis
- Prototype
- System and Program Design
- Modified website
- PDF Conversion Engine
4. Project Process
Following feasibility and requirements analysis, we plan to produce an initial prototype of the system to show to the client to confirm developer thoughts on the project. After the prototype, we will move on to systems and program design and coding phases. Unfortunately, due to the nature of the legal data, there is not comprehensive test plan. At the end, we will conduct documentation and debugging as well as acceptance testing.
5. Resources
Software:
- XSLT Engine
- PDG Shopping cart
- XML
- Verisign Pay Flow
- XSL/FO Processor (PDF Generation)
- Apache server
- CVS
- Linux
Hardware:
- 1 server (lula.law.cornell.edu) used for project development and as a server for the concurrent version control system (CVS).
Facilities and Tools
- Regular meetings Mondays 5-7 in Upson Hall or the Law School
- Yahoo groups: cs501 (http://groups.yahoo.com/group/cs501/) egroup for message board and limited file sharing capabilities
- CVS Repository on lula.law.cornell.edu will be used for development
ARTEMIS | resources/cs501feasibility.html by justin tung generated using Apache Software Foundation's Xalan-J version 2.7.2