We will be witnessing the birth of the artificial, or in-silico, scientist. J. D. Wren
The field of
bioinformatics
has blossomed in the last ten years, and as a result, there is a large and increasing number of researchers generating computational tools for solving problems relevant to biology. Because the number of artifacts has increased greatly, it is impossible for many
bioinformatics researchers to track tools, databases, and methods in the fieldor even perhaps within their own specialty area. More critically, however, biologist users and scientists approaching the field do not have a comprehensive index of bioinformatics algorithms, databases, and literature annotated with information about their context and appropriate use. We suggest that the full set of bioinformatics resourcesthe resourceomeshould be explicitly characterized and organized. A hierarchical and machine-understandable organization of the field, along with rich cross-links (an ontology!) would be a useful start. It is likely that a distributed development approach would be required so that those with focused expertise can classify
resources in their area, while providing the metadata that would allow easier access to useful existing resources.
The growth of bioinformatics can be quantified in many ways. The Intelligent Systems for Molecular Biology Meeting began in 1993, and numerous other meetings have been established. The International Society for Computational Biology (ISCB) was formed in 1995, and recent membership numbers have reached 2,000. The field has gone from having one or two journals to having more than a dozenif one considers -omics (i.e., subjects relating to high-throughput functional genomics, where computation plays a central role) and the emerging field of systems biology. Because bioinformatics has a strong element of engineering, the creation and maintenance of tools provide value only insofar as they are used. These tools may be databases that hold biological data, or they may be algorithms that act on this data to draw inferences. Access to these artifacts is currently uneven. Of course, the published literature is the archival resting place for the initial description of these innovations, but it only contains a snapshot of most tools early in their lifetime. The literature does not use any standard classification system to describe tools, so the sensitivity of searches for specific functions is not generally high. Indeed, the bibliome itself is idiosyncratically organized, and finding the right article is often like searching for a needle in a haystack . Finally, the published literature does not contain reliable references to the location and to the availability of most bioinformatics resources ,. One could also argue that Google () provides adequate access to tools based on keyword searching . However, the lack of standard terms makes sensitive and specific searches difficult. In addition, most search hits confound papers, Web sites, tools, departments, and people in a manner that makes extracting useful information very difficult.
Recognizing this limitation, there have been some grassroots attempts to organize the bioinformatics resourceome. Among the most famous are the archaeological Pedro''s Lista list of computer tools for molecular biologists ()and the Expasy Life Sciences Directory, formerly known as the Amos''s WWW links page (). The Bioinformatics Links Directory () today contains more than 700 curated links to bioinformatics resources, organized into eleven main categories, including all the databases and Web servers yearly listed in the dedicated Nucleic Acids Research special issues . The National Center for Biotechnology Institute has tried to make access to its suite of tools transparent, with moderate success. Many Web sites can be found listing useful sites, especially concerning special interest or limited topics (e.g., microarrays, text mining, and gene regulation). But all of these ef