+++
Assessing Toxicity of Chemicals—Introduction
++
In order to assess the toxicity of chemicals, information from four types of studies is used: structure–activity relationships (SAR), in vitro or short-term studies, in vivo animal bioassays, and information from human epidemiologic studies. In many cases, toxicity information for chemicals is limited. For example, in 1998, the EPA evaluated high production volume (HPV) chemicals (those produced in excess of one million lb per year) to ascertain the availability of chemical hazard data. Their study found that for 43% of these HPV chemicals, there were no publicly available studies for any of the basic toxicity end points (US EPA, 1998a). In response, EPA established a voluntary program called the HPV Challenge Program (US EPA, 2011c). Industry participants in this program committed to filling the data gaps for HPV chemicals. Since its inception, there have been over 2200 chemicals sponsored for additional testing. International efforts such as the Organization for Economic Cooperation and Development’s screening information data set (OECD/SIDS) program are also addressing data needs related to HPV, highlighted by the publication of their Manual for Investigation of HPV Chemicals (OECD, 2011).
++
Data requirements for specific chemicals can vary greatly by compound type and applicable regulatory statutes. Introduced in 2003 and approved in 2006, the European Union released a regulatory framework for Registration, Evaluation and Authorisation of Chemicals (REACH) (REACH, 2011). Under the REACH framework, since 2007, all stakeholders submit dossiers that include physical, chemical, and toxicological data as well as risk assessment studies for all chemicals in use in Europe. Such chemical safety assessments are submitted prior to approval in approaches similar to premanufacturing notices (PMNs) in the United States for the US EPA (US EPA, 2011d). Table 4-3 shows requirements and costs for one example class of agents, pesticides (Stevens, 1997; US EPA, 1998b; EPA, 2000), in the United States (40 CFR 158.340). It also illustrates current international efforts to align these testing guidelines by listing examples of the harmonized 870 test guidelines (US EPA, 2010a). The emphasis in REACH is on non–in vivo animal tests. Also, REACH has brought about new international labeling laws for hazard identification.
++
+++
Assessing Toxicity of Chemicals—Approaches
+++
Structure–Activity Relationships
++
Given the cost of two to four million dollars and the three to five years required for testing a single chemical in a lifetime rodent carcinogenicity bioassay, initial decisions on whether to continue development of a chemical, to submit PMN, or to require additional testing may be based largely on results from SARs and limited short-term assays.
++
A chemical’s structure, solubility, stability, pH sensitivity, electrophilicity, volatility, and chemical reactivity can be important information for hazard identification. Historically, certain key molecular structures have provided regulators with some of the most readily available information on the basis of which to assess hazard potential. For example, 8 of the first 14 occupational carcinogens were regulated together by the Occupational Safety and Health Administration (OSHA) as belonging to the aromatic amine chemical class. The EPA Office of Toxic Substances relies on SARs to meet deadlines to respond to PMN for new chemical manufacture under the Toxic Substances Control Act (TSCA). Structural alerts such as N-nitroso or aromatic amine groups, amino azo dye structures, or phenanthrene nuclei are clues to prioritize chemicals for additional evaluation as potential carcinogens. SAR information for specific noncancer health end points can be challenging. The database of known developmental toxicants limits SARs to a few chemical classes, including chemicals with structures related to those of valproic acid, retinoic acid, phthalate esters, and glycol ethers (NRC, 2000). More recently, omic technologies have been used to supplement SAR relationship databases, as seen with the creation of the National Cancer Institute’s (NCI) gene expression database (http://dtp.nci.nih.gov/) and the US EPA Computational Toxicity Program Screening Database (ToxCastTM) (US EPA ACToR, 2011).
++
SARs have been used for assessment of complex mixtures of structurally related compounds. A prominent application has been the assessment of risks associated with 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), related chlorinated and brominated dibenzo-p-dioxins, dibenzofurans, and planar biphenyls, and chemicals generally present as mixtures in the environment. Toxicity equivalence factors (TEFs) are used to evaluate health risks associated with closely related chemicals. For the TCDD class, this is based on a common mechanism of aryl hydrocarbon (Ah) receptor induction (US EPA, 1994a). The estimated toxicity of environmental mixtures containing these chemicals is calculated as the sum of the product of the concentration of each chemical multiplied by its TEF value. The World Health Organization has organized efforts to reach international consensus on the TEFs used for polychlorinated biphenyls (PCBs), polychlorinated dibenzo-p-dioxins (PCDDs), and polychlorinated dibenzofurans (PCDFs) for both humans and wildlife, and has updated its values and published the supporting database (Van den Berg et al., 1998, 2006; Haws et al., 2006). Under the auspices of WHO, the dioxin-like PCB congeners have been assigned TEFs reflecting their toxicity relative to TCDD, which itself has been assigned a TEF of 1.0.
++
Computerized SAR methods have, in general, given disappointing results in the National Toxicology Program (NTP) rodent carcinogenicity prediction challenges (Ashby and Tennant, 1994; Omenn et al., 1995; Benigni and Zito, 2004). More successes are those of pharmaceutical companies using combinatorial chemistry and 3-dimensional (3D) molecular modeling approaches to design ligands (new drugs) that can sterically fit into the “receptors of interest.” However, for environmental pollutants where selective binding to specific receptors is rare, these applications of SAR have had limited success within risk assessment. A renewed interest in quantitative SAR (QSAR) approaches has resulted from the need to evaluate nano-engineered materials where the tremendous number of unique new products has highlighted the necessity of using QSAR approaches to handle the avalanche of novel untested materials (Maynard et al., 2006; Liu et al., 2011).
++
Efforts within REACH have also emphasized the potential for use of SAR as similar chemicals are collectively evaluated using a concept of “read-across.” Substances whose physicochemical, toxicological, and ecotoxicological properties are similar can be grouped as a “category” of substances when they have a common functional group, common precursor or breakdown product, or a common pattern of potency.
+++
In Vitro and Short-Term Tests
++
The next level of biological information obtained within the hazard identification process includes assessment of the test chemical in in vitro or short-term tests, ranging from bacterial mutation assays performed entirely in vitro to more elaborate short-term tests, such as skin painting studies in mice or altered rat liver foci assays conducted in vivo. For example, EPA mutagenicity guidelines call for assessment of reverse mutations using the Ames Salmonella typhimurium assay; forward mutations using mammalian cells, mouse lymphoma L5178Y, Chinese hamster ovary, or Chinese hamster lung fibroblasts; and in vivo cytogenetics assessment (bone marrow metaphase analysis or micronucleus tests) (US EPA, 2005). Chap. 8 discusses uses of these assays for identifying chemical carcinogens and Chap. 9 describes in detail various assays of genetic and mutagenic end points. Other assays evaluate specific health end points such as developmental toxicity (Faustman, 1988; Whittaker and Faustman, 1994; Brown et al., 1995; Lewandowski et al., 2000; NRC, 2000; Spielmann et al., 2006), reproductive toxicity (Gray, 1988; Harris et al., 1992; Shelby et al., 1993; Yu et al., 2009), neurotoxicity (Atterwill et al., 1992; Costa, 2000), and immunotoxicity (ICCVAM, 1999, 2011a) (Chap. 12). Less information is available on the extrapolation of these test results for noncancer risk assessment than for the mutagenicity or carcinogenicity end points; however, mechanistic information obtained in these systems has been applied to risk assessment (Abbott et al., 1992; US EPA, 1994b; Leroux et al., 1996; NRC, 2000).
++
Overall, progress in developing and validating new in vitro assays has been slow and frustrating. The Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) of NTP reinvigorated the validation process in the United States as a result of Public Law 103-43 and coordinates cross-agency issues relating to development, validation, acceptance, and national/international harmonization of toxicological test methods for use in risk assessments. The committee has put forth recommendations for over 40 alternative safety testing methods using various short-term/in vitro assays, such as the cell-free corrosivity test, and for the mouse local lymph node assay for assessing chemical potential to elicit allergic reactions (ICCVAM, 1999, 2011a; NIEHS, 1999a,b). In 2006 the committee released a document that extensively reviews in vitro acute toxicity methods (National Toxicology Program Center for the Evaluation of Alternative Toxicological Methods (US) and Interagency Coordinating Committee on the Validation of Alternative Methods (US), 2006). The European Centre on the Validation of Alternative Methods (ECVAM) has been very active given the visibility of animal rights issues in the European Union, and this Center was originally formed to “support the development, validation, and acceptance of methods that could reduce, refine, or replace [3 Rs] the use of laboratory animals” (http://ihcp.jrc.ec.europa.eu/our_labs/eurl-ecvam). Early successes of ECVAM are described in Hartung et al. (2003). However, a recent report for the Center for Alternatives for Animal Testing highlights the slow progress on full replacement of animal testing, especially for repeated dose toxicity testing, carcinogenicity, or reproductive toxicity testing (Adler et al., 2011; Hartung et al., 2011). It does highlight numerous successes for evaluating sensitization and toxicokinetics (TK). International efforts to reduce the number of animals required for chemical safety testing have been established between ICCVAM, Korea, Japan, Canada, and ECVAM (http://ntp.niehs.nih.gov/pubhealth/evalatm/iccvam/international-partnerships/index.html).
++
The validation and application of short-term assays is particularly important to risk assessment because such assays can be designed to provide information about mechanisms of effects, and, moreover, they are fast and inexpensive compared with lifetime bioassays (McGregor et al., 1999). Validation of in vitro assays, like other kinds of tests, requires determination of their sensitivity (eg, ability to identify true carcinogens), specificity (eg, ability to recognize noncarcinogens as noncarcinogens), and predictive value for the toxic end point under evaluation. The societal costs of relying on such tests, with false positives (noncarcinogens classified as carcinogens) and false negatives (true carcinogens not detected), are the subject of a value-of-information model for testing in risk assessment and risk management (Lave and Omenn, 1986; Omenn and Lampen, 1988).
++
Efforts to improve our ability to utilize short-term tests for carcinogenicity prediction include increased attention to improving the mechanistic basis of short-term testing. Examples of this approach include the development and application of several knockout transgenic mouse models as shorter-term in vivo assays to identify carcinogens (Nebert and Duffy, 1997; Tennant et al., 1999). Specific assays for evaluating mechanisms such as the function of the estrogen receptor are an example where in vitro test methods are used to predict endocrine disruptor actions (ICCVAM, 2011b, 2012). The primary use of short-term tests in the United States continues to be for mechanistic evaluations with the hope of informing chemical-specific information or overall MOA. In that context, results from short-term assays have impacted risk assessments. For example, evidence of nonmutagenicity in both in vitro and in vivo short-term assays plays an essential role, allowing regulators to consider nonlinear cancer risk assessment paradigms for nongenotoxic carcinogens (US EPA, 1999a). Mechanistic information from short-term in vitro assays can also be used to extend the range of biological observations available for dose–response assessment. In addition, for developmental toxicity assessment, assay methods that acknowledge the highly conserved nature of developmental pathways across species have accelerated the use of a broader range of model organisms and assay approaches for noncancer risk assessments (NRC, 2000). Toxicity Testing in the 21st Century and Toxicity Pathway-Based Risk Assessment emphasized such approaches for using model organisms and tiered toxicity strategies for Risk Assessment (NRC, 2007a; NRC, 2010).
++
Animal bioassays are a key component of the hazard identification process. A basic premise of risk assessment is that chemicals that cause tumors in animals can cause tumors in humans. All human carcinogens that have been adequately tested in animals produce positive results in at least one animal model. Thus, “although this association cannot establish that all chemicals and mixtures that cause cancer in experimental animals also cause cancer in humans, nevertheless, in the absence of adequate data on humans, it is biologically plausible and prudent to regard chemicals and mixtures for which there is sufficient evidence of carcinogenicity in experimental animals as if they presented a carcinogenic risk to humans” (IARC, 2000)—a reflection of the “precautionary principle.” The US EPA cancer guidelines (US EPA, 2005) also assume relevance of animal bioassays unless lack of relevance for human assessment is specifically determined. In general, the most appropriate rodent bioassays are those that test exposure and biological pathways of most relevance to predicted or known human exposure pathways. Bioassays for reproductive and developmental toxicity and other noncancer end points have a similar rationale. The National Toxicology Program (NTP) serves as a resource for conducting, designing, and evaluating bioassays for cancer as well as noncancer evaluation. The NTP Office of Human Health Assessment and Translation serves as a resource to the public and regulatory agencies regarding the interpretation and assessment of adverse effects of chemicals (National Toxicology Program, 2011b). The WHO International Agency for Research on Cancer (IARC) has evaluated over 900 agents “of which more than 400 have been identified as carcinogenic, probably carcinogenic or possibly carcinogenic to humans” (WHO, 2011b).
++
Consistent features in the design of standard cancer bioassays include testing in two species and both sexes, with 50 animals per dose group and near-lifetime exposure. Important choices include the strains of rats and mice, the number of doses, and dose levels (typically 90%, 50%, and 10%–25% of the maximally tolerated dose [MTD]), and details of the required histopathology (number of organs to be examined, choice of interim sacrifice pathology, etc). The NTP Web site lists details on study designs and protocols (National Toxicology Program, 2011a). Positive evidence of chemical carcinogenicity can include increases in number of tumors at a particular organ site, induction of rare tumors, earlier induction (shorter latency) of commonly observed tumors, and/or increases in the total number of observed tumors. Recently, NTP has added an in utero exposure period to the start of their cancer bioassays to more directly evaluate the significance of early life exposures for cancer incidence.
++
The cancer bioassay, originally designed for hazard identification, is frequently used to evaluate dose–response. The relatively limited number of evaluated doses and the use of high doses have caused issues for low-dose extrapolations and have limited the use of cancer bioassays as a “gold standard” for prediction of human carcinogenicity risk (McClain, 1994; Cohen, 1995; Risk Commission, 1997; Rodericks et al., 1997; Capen et al., 1999; Rice et al., 1999). Tumors may be increased only at the highest dose tested, which is usually at or near a dose that causes systemic toxicity (Ames and Gold, 1990). Second, even without toxicity, the high dose may trigger different events than do low-dose exposures and high doses can saturate important metabolism and elimination pathways.
++
Rats and mice give concordant positive or negative results in approximately 70% of bioassays, so it is unlikely that rodent/human concordance would be higher (Lave et al., 1988). Haseman and Lockhart (1993) concluded that most target sites in cancer bioassays showed a strong correlation (65%) between males and females—especially for forestomach, liver, and thyroid tumors—so they suggested, for efficiency, that bioassays could rely on a combination of male rats and female mice. Even when concordant, positive results are observed, there can still be large differences in potency, as observed in aflatoxin-induced tumors in rats and mice. In this example, an almost 100,000-fold difference in susceptibility to aflatoxin B1 (AFB1)–induced liver tumors is seen between the sensitive rat and trout species and the more resistant mouse strains. Genetic differences in the expression of cytochrome P450 and glutathione S-transferases explain most of these species differences and suggest that humans may be as sensitive to AFB1-induced liver tumors as rats (Eaton and Gallagher, 1994; Eaton et al., 1995, 2001). These species differences have been supported by research results (Groopman and Kensler, 1999; Kensler et al., 2011) and have been extended within epidemiologic studies to demonstrate the interaction of hepatitis C infection with AFB1 exposure to fully explain elevated human liver cancer risks.
++
Lifetime bioassays have been enhanced with the collection of additional mechanistic data and with the assessment of multiple noncancer end points. It is feasible and desirable to integrate such information together with data from mechanistically oriented short-term tests and biomarker and genetic studies in epidemiology (Perera and Weinstein, 2000). In the example of AFB1-induced liver tumors, AFB1–DNA adducts have proved to be an extremely useful biomarker. A highly linear relationship was observed between liver tumor incidence (in rats, mice, and trout) and AFB1–DNA adduct formation over a dose range of 5 orders of magnitude (Eaton and Gallagher, 1994). Such approaches may allow for an extension of biologically observable phenomena to doses lower than those leading to frank tumor development and help to address the issues of extrapolation over multiple orders of magnitude to predict response at environmentally relevant doses.
++
Table 4-4 presents some mechanistic details about rodent tumor responses that are no longer thought to be predictive of cancer risk for humans. This table lists examples of both qualitative and quantitative considerations useful for determining relevance of rodent tumor responses for human risk evaluations. An example of qualitative considerations is the male rat kidney tumors observed following exposure to chemicals that bind to a2u-globulin (eg, unleaded gasoline, 1,4-dichlorobenzene, d-limonene). The a2u-globulin is a male-rat-specific low-molecular-weight protein not found in female rats, humans, or other species, including mice and monkeys (McClain, 1994; Neumann and Olin, 1995; Oberdorster, 1995; Omenn et al., 1995; Risk Commission, 1997; Rodericks et al., 1997).
++
++
Table 4-4 also illustrates quantitative considerations important for determining human relevance of animal bioassay information. For example, doses of compounds so high as to exceed solubility in the urinary tract outflow lead to tumors of the urinary bladder in male rats following crystal precipitation and local irritation leading to hyperplasia. Such precipitates are known to occur following saccharin or nitriloacetic acid exposure (Cohen et al., 2000). The decision to exclude saccharin from the NTP list of suspected human carcinogens reaffirms the nonrelevance of such high-dose responses for likely human exposure considerations (Neumann and Olin, 1995; National Toxicology Program, 2005). A gross overloading of the particle clearance mechanism of rat lungs via directly administered particles, as was seen in titanium dioxide (TDO) exposures, resulted in EPA’s delisting TDO as a reportable toxicant for the Clean Air Act Toxic Release Inventory (US EPA, 1988; Oberdorster, 1995).
++
Other rodent responses not likely to be predictive for humans include localized forestomach tumors after gavage. Ethyl acrylate, which produces such tumors, was delisted on the basis of extensive mechanistic studies (National Toxicology Program, 2005). In general, for risk assessment, it is desirable to use the same route of administration as the likely exposure pathway in humans to avoid such extrapolation issues. Despite the example of forestomach tumors, tumors in unusual sites—such as the pituitary gland, the eighth cranial nerve, or the Zymbal gland—should not be immediately dismissed as irrelevant, since organ-to-organ correlation is often lacking (NRC, 1994). The EPA cancer guidelines provide a good list of considerations for evaluating relevance in the sections on evaluating weight of evidence (US EPA, 2005).
++
In an attempt to improve the prediction of cancer risk to humans, transgenic mouse models have been developed as possible alternative to the standard 2-year cancer bioassay. Transgenic models use knockout or transgenic mice that incorporate or eliminate a gene that has been linked to human cancer. NTP evaluated some of these models and found the p53-deficient (p53+/− heterozygous) and Tg.AC (v-Ha-ras transgene) models to be particularly useful in identifying carcinogens and mechanisms of action (Bucher, 1998; Chhabra et al., 2003). The use of transgenic models has the power to improve the characterization of key cellular and MOA of toxicological responses (Mendoza et al., 2002; Gribble et al., 2005). However, these studies have been used primarily for mechanistic characterization than for hazard identification. Transgenic models have been shown to reduce cost and time as compared with the standard 2-year assay but they have also been shown to be somewhat limited in their sensitivity (Cohen, 2001). As stated in the current EPA cancer guidelines, transgenic models should not be used to replace the standard 2-year assay, but can be used in conjunction with other types of data to assist in the interpretation of additional toxicological and mechanistic evidence (US EPA, 2005). A series of genetically defined (fully sequenced genomes, 20 × coverage) mice referred to as the Collaborative Cross have been established to investigate genetic and environmental influences on toxicological response in mice (Chesler et al., 2008) and these strains should improve our understanding of mammalian genes that predispose to cancer (Koturbash et al., 2011).
+++
Use of Epidemiologic Data in Risk Assessment
++
The most convincing lines of evidence for human risk are well-conducted epidemiologic studies in which a positive association between exposure and disease has been observed (NRC, 1983). Environmental and occupational epidemiologic studies are frequently opportunistic. Studies begin with known or presumed exposures, comparing exposed with nonexposed individuals, or with known cases, compared with persons lacking the particular diagnosis.
++
Table 4-5 shows examples of epidemiologic study designs and provides clues on types of outcomes and exposures evaluated. Although convincing, there are important limitations inherent in epidemiologic studies. Robust exposure estimates are often difficult to obtain as they are frequently done retrospectively (eg, through retrospective job history records). Also, since many important health effects have long latency before clinical manifestations appear, reconsideration of relevant populations can be challenging. Another challenge for interpretation is that there are often exposures to multiple chemicals, especially when a lifetime exposure period is considered. There is frequently a trade-off between detailed information on relatively few persons and very limited information on large numbers of persons. Contributions from lifestyle factors, such as smoking and diet, are important to assess as they can have a significant impact on cancer development. Human epidemiologic studies can provide both very useful information for hazard assessment and quantitative information for data characterization. Good illustrations of epidemiologic studies and their interpretation for toxicological evaluation are available (Gill et al., 2011 and Regalado et al., 2006).
++
++
Three types of epidemiologic study designs—cross-sectional studies, cohort studies, and case–control studies—are detailed in Table 4-5. Cross-sectional studies survey groups of humans to identify risk factors (exposure) and disease but are not useful for establishing cause and effect. Cohort studies can evaluate individuals selected on the basis of their exposure to a chemical under study. Thus, based on exposure status, these individuals are monitored for development of disease. These prospective studies monitor over time individuals who initially are disease-free to determine the rates at which they develop disease. In case–control studies, subjects are selected on the basis of disease status: disease cases and matched cases of disease-free individuals. Exposure histories of the 2 groups are compared to determine key consistent features in their exposure histories. All case–control studies are retrospective studies.
++
In risk assessment, epidemiologic findings are judged by the following criteria: strength of association, consistency of observations (reproducibility in time and space), specificity (uniqueness in quality or quantity of response), appropriateness of temporal relationship (did the exposure precede responses?), dose–responsiveness, biological plausibility and coherence, verification, and analogy (biological extrapolation) (Hill, 1965; Faustman et al., 1997; World Health Organization and International Programme on Chemical Safety, 1999; Adami et al., 2011). These same criteria have been used for evaluating MOAs where integrated considerations of both human and animal studies are done.
++
Epidemiologic study designs should also be evaluated for their power of detection, appropriateness of outcomes, verification of exposure assessments, completeness of assessing confounding factors, and general applicability of the outcomes to other populations at risk. Power of detection is calculated using study size, variability, accepted detection limits for end points under study, and a specified significance level (Healey, 1987; EGRET, 1994; Dean et al., 1995). Meta-analysis is used with epidemiologic studies to combine results from different studies using weighting of results to account for sample size across studies. The importance of human studies for risk assessment is shown in evaluations with arsenic and dioxin (US EPA, 2010b).
++
Advance from the human genome project have increase sophistication of molecular biomarkers and have improved the mechanistic bases for epidemiologic hypotheses. This has allowed epidemiologists to get within the “black box” of statistical associations and forward our understanding of biological plausibility and clinical relevance. “Molecular epidemiology,” the integration of molecular biology into traditional epidemiologic research, is an important focus of human studies where improved molecular biomarkers of exposure, effect, and susceptibility have allowed investigators to more effectively link molecular events in the causal disease pathway. Epidemiologists can now include the contribution of potential genetic factors with environmental risk factors for the determination of the etiology, distribution, and prevention of disease. Highlighting the potential power of genetic information to epidemiologic studies, the Human Genome Epidemiology (HuGE) Network was launched in 1998, providing a literature database of published, population-based epidemiologic studies of human genes (Khoury, 1999).
++
With the advance of genomics, the range of biomarkers has grown dramatically and includes identification of single-nucleotide polymorphisms (SNPs), genomic profiling, transcriptome analysis, and proteomic analysis (Simon and Wang, 2006). Implications of these improvements for risk assessment are tremendous, as they provide an improved biological basis for extrapolation across the diversity of human populations and allow for improved cross-species comparisons with rodent bioassay information because of evolutionarily conserved response pathways (NRC, 2000). In addition, genomics allows for “systems-based” understanding of disease and response, moving risk assessment away from a linear, single-event-based concept and improving the biological plausibility of epidemiologic associations (Toscano and Oehlke, 2005; NRC, 2010).
+++
Integrating Qualitative Aspects of Risk Assessment
++
Qualitative assessment of hazard information should include a consideration of the consistency and concordance of findings, including a determination of the consistency of the toxicological findings across species and target organs, an evaluation of consistency across duplicate experimental conditions, and a determination of the adequacy of the experiments to consistently detect the adverse end points of interest.
++
Qualitative assessment of animal or human evidence is done by many agencies, including the EPA and IARC. Similar evidence classifications have been used for both the animal and human evidence categories by both agencies. These evidence classifications have included levels of “sufficient, limited, inadequate, and no evidence” (US EPA, 1994b, 2005) or “evidence suggesting lack of carcinogenicity” (IARC, 2000). For both agencies, these classifications are used for an overall weight-of-evidence approach for carcinogenicity classification.
++
Weight of evidence is an integrative step used by the EPA to “characterize the extent to which the available data support the hypothesis that an agent causes cancer in humans” (US EPA, 2005). It is the process of “weighing” all of the evidence to reach a conclusion about carcinogenicity. With this method, the likelihood of human carcinogenic effect is evaluated within an evaluation of the conditions under which such effects may be expressed. Weight of evidence can consider both the quality and quantity of data as well as any underlying assumptions. The evidence includes data from all of the hazard assessment and characterization studies such as SAR data, in vivo and/or in vitro studies, and epidemiologic data. Using this type of information and weight-of-evidence approach, the EPA includes hazard descriptors to define carcinogenic potential and to provide a measure of clarity and consistency in the characterization narrative: “carcinogenic to humans,” “likely to be carcinogenic to humans,” “suggestive evidence of carcinogenic potential,” “inadequate information to assess carcinogenic potential,” and “not likely to be carcinogenic to humans.” In this section, approaches for evaluating cancer end points are discussed for carcinogens. Similar weight-of-evidence approaches have been proposed for reproductive risk assessment (refer to sufficient and insufficient evidence categories in EPA’s guidelines for reproductive risk [US EPA, 1996a] and considerations by NTP).
++
The Institute for Evaluating Health Risks defined an “evaluation process” by which reproductive and developmental toxicity data can be consistently evaluated and integrated to ascertain their relevance for human health risk assessment (Moore et al., 1995; Faustman et al., 2011). This evaluation process has served as the basis for US EPA’s guidelines for developmental toxicity risk assessment (US EPA, 1991) and for NTP’s Office of Health Assessment and Translation (Formerly CERHR) (National Toxicology Program, 2011b). Application of such carefully deliberated approaches for assessing noncancer end points has helped avoid the tendency to list chemicals as yes or no (positive or negative) without human relevancy information.
++
The EPA has emphasized in their revised cancer guidelines the importance of using “weight of evidence” to arrive at insights to possible “MOA” (US EPA, 2005). MOA information describes key events and processes leading to molecular and functional effects that would in general explain the overall process of cancer development. In many cases these could be plausible hypothesized MOAs for specific toxicity end points, but the detailed mechanistic nuances of the pathway might not yet be fully known. EPA is using such MOA information to suggest non-default approaches for cancer risk assessments and for evaluating toxicity of compounds with common MOAs in cumulative risk assessments (US EPA, 1996a, 1998a).
++
Within the EPA’s carcinogenic risk assessment guidelines, the MOA framework considers evidence from animal studies, relevance to humans, and life stage or population susceptibility (US EPA, 2005). Chemical-specific adjustment factors for interspecies differences and human variability have been proposed and build upon guidance developed by the WHO’s International Programme on Chemical Safety Harmonization Project (WHO, 2000). Critical to the MOA development is the use of “criteria of causality” considerations, which build upon Hill criteria used in epidemiology (Hill, 1965; Faustman et al., 1996; US EPA, 1999a; Klaunig et al., 2003), and consider dose–response relationships and temporal associations, as well as the biological plausibility, coherence, strength, consistency, and specificity of the postulated MOA.