Collaboratory

A collaboratory, as defined by William Wulf in 1989, is a “center without walls, in which the nation’s researchers can perform their research without regard to physical location, interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries” (Wulf, 1989).

Bly (1998) refines the definition to “a system which combines the interests of the scientific community at large with those of the computer science and engineering community to create integrated, tool-oriented computing and communication systems to support scientific collaboration” (Bly, 1998, p. 31).

Rosenberg (1991) considers a collaboratory as being an experimental and empirical research environment in which scientists work and communicate with each other to design systems, participate in collaborative science, and conduct experiments to evaluate and improve systems.

A simplified form of these definitions would describe the collaboratory as being an environment where participants make use of computing and communication technologies to access shared instruments and data, as well as to communicate with others.

However, a wide-ranging definition is provided by Cogburn (2003) who states that “a collaboratory is more than an elaborate collection of information and communications technologies; it is a new networked organizational form that also includes social processes; collaboration techniques; formal and informal communication; and agreement on norms, principles, values, and rules” (Cogburn, 2003, p. 86).

This concept has a lot in common with the notions of Interlock research, Information Routing Group and Interlock diagrams introduced in 1984.

Background
Problems of geographic separation are especially present in large research projects. The time and cost for traveling, the difficulties in keeping contact with other scientists, the control of experimental apparatus, the distribution of information, and the large number of participants in a research project are just a few of the issues researchers are faced with.

Therefore, collaboratories have been put into operation in response to these concerns and restrictions. However, the development and implementation proves to be not so inexpensive. From 1992 to 2000 financial budgets for scientific research and development of collaboratories ranged from US$447,000 to US$10,890,000 and the total use ranged from 17 to 215 users per collaboratory (Sonnenwald, 2003). Particularly higher costs occurred when software packages were not available for purchase and direct integration into the collaboratory or when requirements and expectations were not met.

Chin and Lansing (2004) state that the research and development of scientific collaboratories had, thus far, a tool-centric approach. The main goal was to provide tools for shared access and manipulation of specific software systems or scientific instruments. Such an emphasis on tools was necessary in the early development years of scientific collaboratories due to the lack of basic collaboration tools (e.g. text chat, synchronous audio or videoconferencing) to support rudimentary levels of communication and interaction. Today, however, such tools are available in off-the-shelf software packages such as Microsoft NetMeeting, IBM Lotus Sametime, Mbone Videoconferencing (Chin and Lansing, 2004). Therefore the design of collaboratories may now move beyond developing general communication mechanisms to evaluating and supporting the very nature of collaboration in the scientific context (Chin & Lansing, 2004).

Characteristics and considerations
A distinctive characteristic of collaboratories is that they focus on data collection and analysis. Hence the interest to apply collaborative technologies to support data sharing as opposed to tool sharing. Chin and Lansing (2004) explore the shift of collaboratory development from traditional tool-centric approaches to more data-centric ones, to effectively support data sharing. This means more than just providing a common repository for storing and retrieving shared data sets. Collaboration, Chin and Lansing (2004) state, is driven both by the need to share data and to share knowledge about data. Shared data is only useful if sufficient context is provided about the data such that collaborators may comprehend and effectively apply it. It is therefore imperative, according to Chin and Lansing (2004), to know and understand how data sets relate to aspects of overall data space, applications, experiments, projects, and the scientific community, identifying the critical features or properties among which we can mention:


 * General data set properties (owner, creation data, size, format);
 * Experimental properties (conditions of the scientific experiment that generated that data);
 * Data provenance (relationship with previous versions);
 * Integration (relationship of data subsets within the full data set);
 * Analysis and interpretation (notes, experiences, interpretations, and knowledge produced)
 * Scientific organization (scientific classification or hierarchy);
 * Task (research task that generated or applies the data set);
 * Experimental process (relationship of data and tasks to the overall process);
 * User community (application of data set to different users).

Henline (1998) argues that communication about experimental data is another important characteristic of a collaboratory. By focusing attention on the dynamics of information exchange, the study of Zebrafish Information Network Project (Henline, 1998) concluded that the key challenges in creating a collaboratory may be social rather than technical. “A successful system must respect existing social conventions while encouraging the development of analogous mechanisms within the new electronic forum” (Henline, 1998, p. 69). Similar observations were made in the Computer-supported collaborative learning (CSCL) case study (Cogburn, 2003). The author (Cogburn, 2003) is investigating a collaboratory established for researchers in education and other related domains from United States of America and southern Africa. The main finding was that there have been important intellectual contributions on both sides, although the context was that of a developed country working together with a developing one and there have been social as well as cultural barriers. He further develops the idea that a successful CSCL would need to draw the best lessons learned on both sides in computer-mediated communication (CMC) and computer-supported cooperative work (CSCW).

Sonnenwald (2003) conducted seventeen interviews with scientists and revealed important considerations. Scientists expect a collaboratory to “support their strategic plans; facilitate management of the scientific process; have a positive or neutral impact on scientific outcomes; provide advantages and disadvantages for scientific task execution; and provide personal conveniences when collaborating across distances” (Sonnenwald, 2003, p. 68). Many scientists looked at the collaboratory as means to achieve strategic goals that were organizational and personal in nature. Other scientists anticipated that the scientific process would speed up when they had access to the collaboratory.

Design philosophy
Finholt (1995), based on the case studies of the Upper Atmospheric Research Collaboratory (UARC) and the Medical Collaboratory, establishes a design philosophy: a collaboratory project must be dedicated to a user-centered design (UCD) approach. This means a commitment to develop software in programming environments that allow rapid prototyping, rapid development cycles (Finholt, 1995). A consequence of the user-centered design in the collaboratory is that the system developers must be able to distinguish when a particular system or modification has positive impact on users’ work practices. An important part of obtaining this understanding is producing an accurate picture of how work is done prior to the introduction of technology. Finholt (1995) explains that behavioral scientists had the task of understanding the actual work settings for which new information technologies were developed. The goal of a user-centered design effort was to inject those observations back into the design process to provide a baseline for evaluating future changes and to illuminate productive directions for prototype development (Finholt, 1995).

A similar viewpoint is expressed by Cogburn (2003) who relates the collaboratory to a globally-distributed knowledge work, stating that human-computer interaction (HCI) and user-centered design (UCD) principles are critical for organizations to take advantage of the opportunities of globalization and the emergence of an Information society. He (Cogburn, 2003) refers to distributed knowledge work as being a set of “economic activities that produce intangible goods and services […], capable of being both developed and distributed around the world using the global information and communication networks” (Cogburn, 2003, p. 81). Through the use of these global information and communications networks, organizations are able to take part in globally disarticulated production, which means they can locate their research and development facilities almost anywhere in the world, and engineers can collaborate across time zones, institutions and national boundaries.

Evaluation
Meeting expectations is a factor that influences adoption of innovations, including scientific collaboratories. Some of the collaboratories implemented thus far have not been entirely successful. The Mathematics and Computer Science Division of Argonne National Laboratory, Waterfall Glen collaboratory (Henline, 1998) is an illustrative example. This collaboratory had its shares of problems. There have been the occasional technical and social disasters, but most importantly it did not meet all of the collaboration and interaction requirements.

The vast majority of the evaluations performed thus far are concentrating mainly on the usage statistics (e.g. total number of members, hours of use, amount of data communicated) or on the immediate role in the production of traditional scientific outcomes (e.g. publications and patents). Sonnenwald (2003), however, argues that we should rather look for longer-term and intangible measures such as new and continued relationship among scientists, and subsequent, longer-term creation of new knowledge.

Regardless of the criteria used for evaluation, we must focus on understanding the expectations and requirements defined for a collaboratory. Without such understanding a collaboratory runs the risk of not being adopted.

Success factors
Olson, Teasley, Bietz, and Cogburn (2002) ascertain some of the success factors of a collaboratory. They are: collaboration readiness, collaboration infrastructure readiness, and collaboration technology readiness.

Collaboration readiness is the most basic pre-requisite for an effective collaboratory, according to Olson, Teasley, Bietz, and Cogburn (2002). Often the critical component to collaboration readiness is based on the concept of “working together in order to achieve a science goal” (Olson, Teasley, Bietz, & Cogburn, 2002, p. 46). Incentives to collaborate, shared principles of collaboration, and experience with the elements of collaboration are also crucial. Successful interaction between users requires a certain amount of common ground. Interactions require a high degree of trust or negotiation, especially when they involve areas where there is a cultural difference. “Ethical norms tend to be culturally specific, and negotiations about ethical issues require high levels of trust” (Olson, Teasley, Bietz, & Cogburn, 2002, p. 49).

When analyzing the collaboration infrastructure readiness Olson, Teasley, Bietz, and Cogburn (2002) state that modern collaboration tools require adequate infrastructure to operate properly. Many off-the-shelf applications will run effectively only on state-of-the-art workstations. An important piece of the infrastructure is the technical support necessary to ensure version control, to get participants registered, and to recover in case of disaster. Communications cost is another element which can be critical for collaboration infrastructure readiness (Olson, Teasley, Bietz, & Cogburn, 2002). Pricing structures for network connectivity can affect the choices that users will make and therefore have an effect on the collaboratory’s final design and implementation.

Collaboration technology readiness, according to Olson, Teasley, Bietz, and Cogburn (2002), refers to the fact that collaboration does not involve only technology and infrastructure, but also requires a considerable investment in training. Thus, it is essential to assess the state of technology readiness in the community to ensure success. If the level is too primitive more training is required to bring the users’ knowledge up-to-date.

Biological Sciences Collaboratory
A comprehensively described example is the Biological Sciences Collaboratory (BSC) at the Pacific Northwest National Laboratory (Chin & Lansing, 2004). This collaboratory enables the sharing and analysis of biological data through metadata capture, electronic laboratory notebooks, data organization views, data provenance tracking, analysis notes, task management, and scientific workflow management. BSC supports various data formats, has data translation capabilities, and is able to interact and exchange data with other sources (e.g. external databases). It offers subscription capabilities (to allow certain individuals to access data), verification of identities, establishes and manages permissions and privileges, and has data encryption capabilities (to ensure secure data transmission) as part of its security package.

BSC also provides a data provenance tool and a data organization tool. These tools allow a hierarchical tree to display the historical lineage of a data set. From this tree-view the scientist may select a particular node (or an entire branch) to access a specific version of the data set (Chin & Lansing, 2004).

The task management provided by BSC allows users to define and track tasks related to a specific experiment or project. Tasks can have deadlines assigned, levels of priority, and dependencies. Tasks can also be queried and various reports produced. Related to task management, BSC provides workflow management to capture, manage, and supply standard paths of analyses. The scientific workflow may be viewed as process templates that captures and semi-automate the steps of an analysis process and its encompassing data sets and tools (Chin & Lansing, 2004).

BSC provides project collaboration by allowing scientists to define and manage members of their group. Security and authentication mechanisms are therefore applied to limit access to project data and applications. Monitoring capability allows for members to identify other members that are online working on the project (Chin & Lansing, 2004).

BSC offers community collaboration capabilities: scientists may publish their data sets to a larger community through the data portal. Notifications are in place for scientists interested in a particular set of data and when that data is modified the scientists get a notification via email (Chin & Lansing, 2004).

Diesel Combustion Collaboratory
Pancerella, Rahn, and Yang (1999) analyzed the Diesel Combustion Collaboratory (DCC) which was a problem-solving environment for combustion research. The main goal of DCC was to make the information exchange for the combustion researchers more efficient. Researchers would collaborate over the Internet using various DCC tools. These tools included “a distributed execution management system for running combustion models on widely distributed computers (distributed computing), including supercomputers; web accessible data archiving capabilities for sharing graphical experimental or modeling data; electronic notebooks and shared workspaces for facilitating collaboration; visualization of combustion data; and videoconferencing and data conferencing among researchers at remote sites” (Pancerella, Rahn, & Yang, 1999, p. 1).

The collaboratory design team defined the requirements to be (Pancerella, Rahn, & Yang, 1999):


 * Ability share graphical data easily;
 * Ability to discuss modeling strategies and exchange model descriptions;
 * Archiving collaborative information;
 * Ability to run combustion models at widely separated locations;
 * Ability to analyze experimental data and modeling results in a web-accessible format;
 * Videoconference and group meetings capabilities.

Each of these requirements had to be done securely and efficiently across the Internet. Resources availability was a major concern because many of the chemistry simulations could run for hours or even days on high-end workstations and produce Kilobytes to Megabytes of data sets. These data sets had to be visualized using simultaneous 2-D plots of multiple variables (Pancerella, Rahn, & Yang, 1999).

The deployment of the DCC was done in a phased approach. The first phase was based on iterative development, testing, and deployment of individual collaboratory tools. Once collaboratory team members had adequately tested each new tool, it was deployed to combustion researchers. The deployment of the infrastructure (videoconferencing tools, multicast routing capabilities, and data archives) was done in parallel (Pancerella, Rahn, & Yang, 1999). The next phase was to implement full security in the collaboratory. The primary focus was on two-way synchronous and multi-way asynchronous collaborations (Pancerella, Rahn, & Yang, 1999). The challenge was to balance the increased access to data that was needed with the security requirements. The final phase was the broadening of the target research to multiple projects including a broader range of collaborators.

The collaboratory team found that the highest impact was perceived by the geographically separated scientists that truly depended on each other to achieve their goals. One of the team’s major challenges was to overcome the technological and social barriers in order to meet all of the objectives (Pancerella, Rahn, & Yang, 1999). User openness and low maintenance security collaboratories are hard to achieve, therefore user feedback and evaluation are constantly required.

Other collaboratories
Other collaboratories that have been implemented and can be further investigated are:

Special consideration should be attributed to TANGO (Henline, 1998) because it is a step forward in implementing collaboratories, as it has distance learning and health care as main domains of operation. Henline (1998) mentions that the collaboratory has been successfully used to implement applications for distance learning, command and control center, telemedical bridge, and a remote consulting tool suite.
 * Biological Collaborative Research Environment (BioCoRE) developed at University of Illinois at Urbana–Champaign – a collaboration tool for biologists (Chin and Lansing, 2004);
 * Molecular Interactive Collaborative Environment (MICE) developed at the San Diego Supercomputer Center – provides collaborative access and manipulation of complex, three-dimensional molecular models as captured in various scientific visualization programs (Chin and Lansing, 2004);
 * Molecular Modeling Collaboratory (MMC) developed at University of California, San Francisco – allows remote biologists to share and interactively manipulate three-dimensional molecular models in applications such as drug design and protein engineering (Chin and Lansing, 2004);
 * Collaboratory for Microscopic Digital Anatomy (CMDA) – a computational environment to provide biomedical scientists remote access to a specialized research electron microscope (Henline, 1998);
 * The Collaboratory for Strategic Partnerships and Applied Research at Messiah College - an organization of Christian students, educators, and professionals affiliated with Messiah College, aspiring to fulfill Biblical mandates to foster justice, empower the poor, reconcile adversaries, and care for the earth, in the context of academic engagement.
 * Waterfall Glen – a multi-user object-oriented (MOO) collaboratory at Argonne National Laboratory (Henline, 1998);
 * The International Personality Item Pool (IPIP) – a scientific collaboratory for the development of advanced measures of personality and other individual differences (Henline, 1998);
 * TANGO – a set of collaborative applications for education and distance learning, command and control, health care, and computer steering (Henline, 1998).
 * Collaborative architecture and Interactive architecture, the work of Adam Somlai-Fischer and Usman Haque.

Summary
To date, most collaboratories have been applied largely in scientific research projects, with various degrees of success and failure. Recently, however, collaboratory models have been applied to additional areas of scientific research in both national and international contexts. As a result, a substantial knowledge base has emerged helping us in understanding their development and application in science and industry (Cogburn, 2003). Extending the collaboratory concept to include both social and behavioral research as well as more scientists from the developing world could potentially strengthen the concept and provide opportunities of learning more about the social and technical factors that support a distributed knowledge network (Cogburn, 2003).

The use of collaborative technologies to support geographically distributed scientific research is gaining wide acceptance in many parts of the world. Such collaboratories hold great promise for international cooperation in critical areas of scientific research and not only. As the frontiers of knowledge are pushed back the problems get more and more difficult, often requiring large multidisciplinary teams to make progress. The collaboratory is emerging as a viable solution, using communication and computing technologies to relax the constraints of distance and time, creating an instance of a virtual organization. The collaboratory is both an opportunity with very useful properties, but also a challenge to human organizational practices (Olson, 2002).