Open knowledge management for the substitution of hazardous chemicals
A working paper. By Akos Kokai. Last updated 02016-04-20.
It is increasingly clear that managing toxic substances, to prevent harm to health and environment, requires action beyond existing regulatory programs and market structures.  Through technological change we can envision redesigning systems of industrial production to use resources sustainably and to substitute toxic substances with benign alternatives.   The question of how to enact such a transition is riddled with social complexity, in ways that I don’t even touch upon here. Rather, I discuss the domain of knowledge that informs technology-driven efforts to solve the problem of toxics, and I sketch some features of an ‘open’ knowledge system that might be helpful for mobilizing that knowledge. My discussion is patterned on the analytic framework of organizing systems.  I am viewing knowledge as resources, and asking how these resources could be intentionally arranged to support valuable interactions. Organizing knowledge involves specifying principles or rules for the collection, arrangement, and description of knowledge resources.
To protect human and ecological health, technological design and production should be informed by scientific knowledge of the environmental heath effects of chemicals and materials. An array of people and organizations participate in the design and governance of these technologies, including: chemical and downstream product manufacturers, consultants, government agencies, and non-governmental organizations. It follows that each of these actors should, at some level, be able to access scientific knowledge relevant to their needs. Indeed, these actors increasingly use specialized practices to evaluate chemical hazards and compare safer alternatives. I will focus on two such practices, chemical hazard assessment (CHA) and alternatives assessment (AA).    Practitioners use scientific knowledge as a resource: knowledge of chemical properties and how chemicals are used, toxicological data, and so on. At the same time, they produce new forms of knowledge—such as standardized assessments and metrics of chemical safety, or analyses of safer technology options—that could in turn be used as resources by others.
Efforts to substitute toxic chemicals—and the practices of CHA and AA in particular—face difficult challenges centered on the availability and management of scientific and technical knowledge.
- Availability and accessibility of knowledge are limited. There are large gaps in basic scientific data about the toxicity or safety of chemicals.   Existing knowledge is often not publicly available, instead locked within ‘silos’ belonging to firms, organizations, or scientific publishers. Knowledge that is free and publicly available it is generally not optimally organized, and accessing involves the ‘friction’ of time-consuming searches through scientific literature and arcane government databases.
- There is inadequate flow of chemical-related information throughout networks where decisions about chemicals are made. Industrial supply systems and value chains involving chemicals are complex and globally distributed, yet the communication of chemical hazard and risk information in these networks is limited.   Information asymmetries, exacerbated by trade secrecy, impede downstream users of chemicals from using scientific knowledge to select safer substances. 
- Interpreting uncertain and contested scientific knowledge can be difficult. Scientific findings on environmental issues are often unavoidably uncertain. This can lead to technical and political conflicts around the interpretation of evidence.   The health risks of many chemicals are contested among competing interest groups, such as the chemical industry and environmental advocates. Product manufacturers and other downstream users of chemicals are left to sort through scientific data themselves (if they can find it), or to choose among conflicting interpretations put forth by different stakeholders. These decision-makers lack access to trusted, credible sources of scientific expertise. 
Taken together, these are enormous challenges. The organizing systems perspective lets us focus on a sub-problem: how to improve knowledge interactions among key stakeholders, particularly those whose goals and interests are already aligned toward the substitution of hazardous substances. Improving systems of knowledge management could potentially address the problems of accessibility and information flow.
What follows is a description of a hypothetical organizing system conceived from the perspective of chemical assessment practitioners who work with a mixture of public and private (confidential or restrictively licensed) data. This is not a proposal for a unified or centralized information infrastructure: such a system could be constituted by a number of interoperable projects. The discussion is meant to be agnostic of implementation as much as possible. The goals of the system would be to provide a shared collection of publicly accessible knowledge resources, and to allow a community of practice to collectively manage, produce, and enrich those resources. Contributors could publish or curate data sets and a variety of work products in structured formats that are semantically linked to other relevant resources. These goals are in accord with growing interest in the collaborative development of openly shared information resources in this domain—e.g.   —and with broad demands for transparency in evidence-based decision making. 
What resources are being organized?
The domain of knowledge pertaining to chemical substitution includes some kinds of knowledge resources that are unique to those practices, but also overlaps with the the domains of chemistry, toxicology, chemical and ecological informatics, and supply-chain chemicals management. Below I have tried to categorize major types of knowledge resources in a simple way (other researchers have made comparable but different categorizations, e.g., ). This is not a comprehensive analysis, but the following types of knowledge resources emerge as most relevant. I also outline some of relationships among them in Figure 1.
Or rather, representations and descriptions of chemical substances. In order to represent and describe chemical substances, they must be identified in an effective and agreed-upon manner. Perhaps surprisingly, this is an unsolved vocabulary problem throughout the domain. Certain kinds of relationships among substances (e.g. isomerism, multiple physical forms, synthesis and degradation pathways) and certain described properties (such as their technical functions) are highly relevant to the system goals, and need to be modeled in the organizing system.
Chemicals can be seen as the ingredients that make up materials, and materials in turn are what products are made of.  Organizing knowledge about the environmental impacts of materials and products is parallel to the goals of this organizing system. Even if the focus of this system is chemicals, the relationships among those entities should still be considered.
These are resources that enumerate or list chemicals for a particular purpose. Inventories are what is used to keep track of chemicals that are produced, used, sold, regulated, and so on. This also includes lists of chemical ingredients that make up products or materials.
Hazard properties and EHS information
Alternatives assessment practitioners evaluate chemicals across many dimensions, including different kinds of hazard. For example: cancer, aquatic toxicity, and flammability are distinct hazard properties, and assessing each of these for any given chemical requires a different kind of data. Data relevant to environmental health and safety (EHS data) encompasses toxicological studies, physicochemical properties, and other types of information. Hazard properties are conceptually distinct in a subtle way: they are interpretations of EHS data made according to particular conventions: see assessment frameworks.
The organizing system should semantically link all information about chemical hazards to the appropriate substances. But this information also comes with references to other resources, such as scientific literature, databases, or particular test methods or assessment criteria. The organizing system should be able to keep track of these references too, to provide contextual links necessary for fully understanding the hazard data.
A variety of frameworks and guidelines exist to facilitate chemical assessment practices. These include hazard classification systems, criteria for interpreting data as evidence, protocols for weighing multiple sources of scientific evidence, and overarching decision-making frameworks for evaluating multiple technological alternatives. By specifying what is important and how to evaluate it, frameworks significantly shape the production of assessments.
Each framework can potentially reference other frameworks or other kinds of standards. An example of a hazard assessment framework is the freely available, peer-reviewed, and widely used GreenScreen for Safer Chemicals. GreenScreen provides a taxonomy of hazard types, a method for interpreting scientific evidence relating to each hazard (much of this, in turn, based on the GHS), and a method for combining evidence into weighted overall scores. GreenScreen is invoked as a hazard assessment sub-framework in certain alternatives assessment frameworks.
The system should organize the knowledge produced by its community of practice, which includes chemical hazard assessment reports and alternatives assessment case studies. Assessments, in general, synthesize EHS data, frameworks, standards, and other intellectual resources, and they are resources that are designed to be useful in decision-making. Assessment is a kind of knowledge production that is instrumental for making evidence-based improvements in the environmental performance of technologies.
Software tools may come to play important roles in chemical assessment practices.
Figure 1. A partial ontology of chemical hazard information.
Why are the resources being organized?
The goals of organizing these resources are to make the assessment of chemical and material hazards easier and less costly, and also more transparent and scientifically defensible. This organizing system would be situated within the community of practice of chemical hazard assessment and alternatives analysis. The focus is more on specialized, technical contexts—such as the design and engineering of products and processes—than on consumer or public awareness contexts.
Facilitating assessment work by organizing knowledge is a matter of supporting sophisticated interactions with the knowledge, which enable effective strategies for finding and obtaining the right piece of information. User requirements research has not been done yet, so I will discuss some postulated requirements. First, users should be able to easily discover all publicly available data and assessment work that has been done on any particular chemicals of interest. Second, the organizing system must enable navigating and finding knowledge resources on the basis of a variety of properties and relationships. For example, users should be able to navigate or select chemicals according to their industrial uses and functions (e.g. in order to identify functional alternatives to a chemical of concern). Most databases containing hazard information do not describe or classify chemical uses in a standardized way; doing so is a clear requirement for our organizing system.
An organizing system like this will help to deepen the impact of past investments in producing knowledge about chemicals and their environmental health impacts. Much of the information that exists has been produced because it was required or funded by government programs; large data collections are published by government agencies. Similarly, governments worldwide collect various kinds of information about the amounts of chemicals that are produced, imported, used, and released. However, government agencies typically lack the mandate and the resources to engage in the broad organization of knowledge that goes beyond the immediate topical, legal, or jurisdictional focus of their information collection activities. Realize this organizing system as a government service would entail breaking down boundaries between government information “silos.”
This organizing system is also intended to promote collaboration and transparency in chemical assessment, in a way that strengthens the validity and authority of the assessments being produced. Beyond providing free access to organized information resources, the system would allow contributors to publish assessments in structured formats that transclude, link, or otherwise expose the data and methods used to create them. Readers of these studies could easily determine on the basis of what data and what frameworks the assessment made made; by whom, when, and by what provenance; in what industrial use context; in comparison to what substances; and so on. Some of these properties could also be used to find, navigate, or retrieve published assessment work.
How much are the resources organized?
The extent of resource organization will depend not only on direct user requirements but also on existing standards, conventions, and related organizing systems, which are effectively indispensable for accomplishing the goals of this system (in other words, maintaining “sideways compatibility”).
Information about chemical substances needs to be organized enough for users to determine which substance in the system corresponds to which substance in reality. Unfortunately, there are diverse understandings and vocabularies of chemical identity, which vary between communities like chemists, manufacturers, health scientists, and regulators. It would be impossible to completely resolve these semantic and lexical inconsistencies. Rather, the most practical approach is to create a deliberately principled, well-documented, and internally consistent system of identification, and to implement the system in a way that maximizes the potential for interoperability with other systems. The uniqueness of a substance in this organizing system will be determined by whatever structural characteristics are most intrinsically linked to its toxicological characteristics. Each substance would have an alphanumeric identifier (or URI), associated with a structural formula  wherever possible, and any number of synonyms. Unlimited aliasing is necessary, because each chemical can have many systematic and commercial names.
Among chemical assessment practitioners, the dominant authority-controlled identification system for chemicals is the Chemical Abstracts Service (CAS) Registry; to the extent possible, CAS identifiers should be included as synonyms, but they are unsuitable as globally unique identifiers because they are proprietary. Although CASRN are used widely in industry, government, and public web resources, there is no way to authoritatively look up or check a CASRN without paid and legally restrictive access to the CAS Registry. Furthermore, the CAS system is clumsy for organizing chemical hazard information. It is rigidly implemented with a specific ontological perspective (of synthetic chemistry), and forcing flexible understandings of CAS identifiers further obscures and corrupts the system. If identifiers from other organizing systems are added as aliases (e.g. the corresponding PubChem compound identifiers, established using automated methods), then the automated retrieval of data could make additional resources available to users.
Functional use of chemicals
Descriptions of chemical functions should enable, as much as possible, the identification of safer alternatives to problematic chemicals and material technologies. Functional use is valuable as an organizing principle for chemical selection, assessment, and substitution. For instance, identifying technically feasible alternatives to replace a chemical of concern is much easier if candidate substances can be filtered by functional properties.
However, this is a complex and challenging requirement because of the many meanings of “function” and “use.” In recent work, Tickner et al. have described three levels of functional substitution: chemical function (at the level of molecular properties), end-use function (relating to the role of a substance within a product or process), and function as service (relating to the higher, system-level goal served by a substance).  They argue that chemical assessment practices should involve “functional substitution” that takes into account these three different scopes for design and decision-making related to function. Existing classification systems for chemical use, however, have not been developed with this perspective, making further work necessary to arrive at a good way of systematically describing functional use.
Resources must be organized enough to associate data with the hazard properties that are relevant to practitioners. There is no single, universally agreed-upon classification system for hazards; rather, the ontological basis for the description and classification of data resources is provided by hazard assessment frameworks. How many different hazard classification systems the organizing system should support (if more than one) depends on user requirements and implementation constraints. See further discussion of this point under frameworks and tools.
Frameworks and tools
Assessment frameworks and tools should be organized and described enough for their roles in the production of assessments to be transparent. A recent review of 20 alternatives assessment frameworks by Jacobs et al. revealed core differences between frameworks in their scope, depth, priorities, technical features (such as criteria), and conceptual models.  For example, different frameworks may use different hazard categories, metrics, and criteria, or may assign different roles to hazard, exposure, and risk in decision logics. Different frameworks could be used to assess the same substances using identical bodies of supporting data, and may produce diverging or incommensurate results (although Jacobs et al. did not find any studies comparing the results of applying assessment frameworks). Therefore, frameworks should be represented and described in the organizing system to distinguish among potentially contradictory but internally consistent uses of other resources.
As with hazard properties, it is an open question how much flexibility is possible and desirable with respect to assessment frameworks. Organizing resources in a way that is agnostic to the content of assessment frameworks trades greater flexibility for increased heterogeneity and complexity. On the other hand, assuming prescribed limits on the possible arrangements (and interpretations) of resources makes organizing and retrieving information much simpler, but limits the range of possible work (and viewpoints) that the system can facilitate.
The system should clearly express the relationships assessments to other resources—to underlying data, as well as overarching methods and frameworks as discussed above.
When are the resources organized?
Organizing would take place during the creation of resources, during the integration of datasets, and afterwards in an ongoing process of curation. The system would need to have standard protocols for adding new chemical substances. Datasets and assessment reports would likewise need to have protocols for creation, modification, and description of authorship and provenance. Lastly, there would need to be some form of version control; datasets, reports, and even hazard assessment frameworks are expected to change over time, and these changes should be tracked systematically.
How, by whom, by what processes?
This deals with questions of governance of the organizing system (a “meta” level) and infrastructure (a “whatever is the opposite of meta” level).
Agency and authority
An important remaining question is: who does the organizing? In an ideal system, organizing would be done collaboratively by a subset of the users who belong to the community of interest, and who act as accountable agents participating in a commons-based project. Due to the potential conflicts of interest surrounding chemical toxicity data, membership in the active peer-group would need to be consciously managed. The organizing system would benefit from governance under the auspices of a formally constituted organization, such as an NGO, research institute, university, or agency. Users of the organizing system would be situated in the for-profit private sector as well as the public sector, but managing the system itself would consume rather than generate money. Institutional support in the form of infrastructure, staff, and recognized credibility would be a bonus.
Given the uncertainty and ignorance that plagues the development of environmentally benign technologies, it is reasonable to expect that this system would at least aspire to grow in scale over time—the more data, the better—but this would probably only be possible if it accumulates a “critical mass” of users, organizational robustness, and financial backing. Design and implementation would have to involve planning for future growth in the scale and user population of the system.
In order to have the most impact on technological transitions to safer chemicals and materials, the boundaries of the organizing system should be managed in a way that facilitates interoperation with other organizing systems in the domains of chemistry and sustainable production. One approach is to expose resources and resource descriptions, as much as possible, in standardized machine-readable formats. This way, information can be provided to other systems—for example, systems geared toward the assessment of materials and products that contain chemicals.
This approach follows one of the central principles of open data. Open data efforts have been successful in biological and ecological informatics, and a case has been made for semantic-web-style implementations in chemical informatics. 
Whatever the particular technological implementation, there should be open access to knowledge resources through this system, and those resources should be provided with licenses that enable re-use. These are the remaining principles of open data. The title of this working paper begins with “open,” but rather than simply accepting these principles as dogma, I argue that they make sense for this domain of knowledge.
Overall, the lack of dissemination of health-relevant chemical information and ‘know-how’ hinders the green chemistry innovation system and slows the adoption of safer technologies.  For example, the privatization and protection of knowledge keeps the costs of chemical assessment high, by requiring companies to re-produce knowledge that could have been simply re-used. The immediate beneficiaries of this information asymmetry are the companies whose hazardous products remain competitive in a flawed market. Intellectual property rights in this context seem more like an unfair advantage: for example, United States chemicals policy has structural and implementation-level biases that overwhelmingly favor the protection of information claimed confidential by the chemical industry over the public disclosure of information relevant to chemical safety.  
As to broader societal goals, open data isn’t a panacea and doesn’t necessarily create justice or empowerment. But under the social, political, and technical circumstances, constraining the circulation of chemical knowledge almost certainly disempowers civil society—which, after all, bears the health risks of toxic substances.
Geiser, K. (2015). Chemicals without harm: policies for a sustainable world. Cambridge, Massachusetts: The MIT Press. ↩
McDonough, W., & Braungart, M. (2002). Cradle to cradle: remaking the way we make things. New York: North Point Press. ↩
Anastas, P. T., & Warner, J. C. (1998). Green chemistry: theory and practice. Oxford; New York: Oxford University Press. ↩
Glushko, R. J. (2013). The discipline of organizing. Cambridge, Mass.: The MIT Press. ↩
National Research Council (US). (2014). A framework to guide selection of chemical alternatives. Washington, D.C: The National Academies Press. ↩
Organisation for Economic Co-operation and Development. (n.d.). OECD Substitution and Alternatives Assessment Toolbox. ↩
Judson, R., Richard, A., Dix, D. J., Houck, K., Martin, M., Kavlock, R., … Smith, E. (2009). The Toxicity Data Landscape for Environmental Chemicals. Environmental Health Perspectives, 117(5), 685–695. doi:10.1289/ehp.0800168 ↩
Wilson, M. P., & Schwarzman, M. R. (2009). Toward a New U.S. Chemicals Policy: Rebuilding the Foundation to Advance New Science, Green Chemistry and Environmental Health. Environmental Health Perspectives, 117(8), 1202–1209. doi:10.1289/ehp.0800404 ↩
Massey, R. (2008). Sharing knowledge about chemicals: policy options for facilitating information flow. In J. A. Tickner, Y. Torrie, M. Coffin, & M. L. Dunn (Eds.), Options for state chemicals policy reform: a resource guide (pp. 69–95). Lowell, MA: Lowell Center for Sustainable Production. ↩
Fransson, K., & Molander, S. (2013). Handling chemical risk information in international textile supply chains. Journal of Environmental Planning and Management, 56(3), 345–361. doi:10.1080/09640568.2012.681032 ↩
Scruggs, C. E., Ortolano, L., Schwarzman, M. R., & Wilson, M. P. (2014). The role of chemical policy in improving supply chain knowledge and product safety. Journal of Environmental Studies and Sciences, 4(2), 132–141. doi:10.1007/s13412–013–0158–4 ↩
Jasanoff, S. (1990). The fifth branch: science advisers as policymakers. Cambridge, Mass.: Harvard University Press. ↩
Sarewitz, D. (2004). How science makes environmental controversies worse. Environmental Science & Policy, 7(5), 385–403. doi:10.1016/j.envsci.2004.06.001 ↩
Scruggs, C. E., & Ortolano, L. (2011). Creating safer consumer products: the information challenges companies face. Environmental Science & Policy, 14(6), 605–614. doi:10.1016/j.envsci.2011.05.010 ↩
Geiser, K. (2014, January 13). The New Chemicals in Products Information Transparency. ↩
Rossi, M., Tickner, J., & Geiser, K. (2006). Alternatives Assessment Framework of the Lowell Center for Sustainable Production. Lowell, MA: Lowell Center for Sustainable Production. ↩
Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D., & Pletnev, I. (2013). InChI - the worldwide chemical structure identifier standard. Journal of Cheminformatics, 5(1), 7. doi:10.1186/1758–2946–5–7 ↩
Tickner, J. A., Schifano, J. N., Blake, A., Rudisill, C., & Mulvihill, M. J. (2015). Advancing Safer Alternatives Through Functional Substitution. Environmental Science & Technology, 49(2), 742–749. doi:10.1021/es503328m ↩
Jacobs, M. M., Malloy, T. F., Tickner, J. A., & Edwards, S. (2016). Alternatives Assessment Frameworks: Research Needs for the Informed Substitution of Hazardous Chemicals. Environmental Health Perspectives, 124(3) . doi:10.1289/ehp.1409581 ↩
Murray-Rust, P., Adams, S. E., Downing, J., Townsend, J. A., & Zhang, Y. (2011). The semantic architecture of the World-Wide Molecular Matrix (WWMM). Journal of Cheminformatics, 3(1), 42. doi:10.1186/1758–2946–3–42 ↩
Matus, K. J. M., Clark, W. C., Anastas, P. T., & Zimmerman, J. B. (2012). Barriers to the Implementation of Green Chemistry in the United States. Environmental Science & Technology, 46(20), 10892–10899. doi:10.1021/es3021777 ↩
Wilson, M. P., & Schwarzman, M. R. (2009). Toward a New U.S. Chemicals Policy: Rebuilding the Foundation to Advance New Science, Green Chemistry and Environmental Health. Environmental Health Perspectives, 117(8), 1202–1209. doi:10.1289/ehp.0800404 ↩
Denison, R. A. (2010, February 10). Worse than we thought: Decades of out-of-control CBI claims under TSCA. ↩