Free the Data: Accessibility, Format, and Transformation Issues Related to XML-Based Humanities Resources
My paper and presentation will discuss the importance of delivering XML-based digital humanities texts in their native XML format as well as in a variety of other useful formats and will further discuss and demonstrate the use of XSL and other tools to create dynamically many of the different possible delivery formats.
A number of recent
technologies, in particular XML-related technologies such as XSL, XPath, and X-Query, provide, for both
content providers and content consumers, a wealth of new ways to manipulate and query XML-based digital humanities texts and other XML-based data. One significant problem with many current electronic text delivery models is that SGML- or XML-encoded data is hidden behind a proprietary system or web interface that provides a limited number of formats to the user. The SGML or XML is searched on the server but delivered to the user as HTML or some other format such as PDF or page images in various graphic file formats. The XML data never reach the user, and the formats available to the user lack the often rich markup and detailed structure of the original SGML or XML data.
In cases of proprietary or Web interfaces, the content provider also makes decisions about how the text may be accessed and searched, and these decisions may or may not coincide with the specialized needs of the scholars and researchers using the data. A Web form is often used to construct a query, and these Web forms typically limit the structure and complexity of the query. The advantage of these systems is that they provide a relatively familiar and user-friendly Web interface that provides users with quick and simple access to the data. Content providers should continue the development of these interfaces, always striving of course to provide increased flexibility and a better user experience to the data consumer.
However, in addition to developing new and improved Web interfaces to e-text projects, I would argue that, whenever possible given copyright and other restrictions, content providers should provide easy access to the raw XML data as well. By providing users access to the raw XML documents, users will be free to manipulate the data in any way they see fit and to apply to the data the full arsenal of XML manipulation and query tools in arbitrarily complex ways. With technologies like XPath and XQuery and the availability of free implementations of these technologies, scholars, researchers, and general purpose users of our collections can manipulate and transform the data themselves in a greater variety of discipline- and interest-specific ways than could likely be accommodated by a user-friendly, Web-based interface. By providing the raw XML data to users, the scholarly community may also reap the benefit of encouraging and facilitating the acquisition of greater technical skills among humanities scholars