Two main principles underlie all descriptive animal toxicity testing. The first is that the effects produced by a compound in laboratory animals, when properly qualified, are applicable to humans. This premise applies to all of experimental biology and medicine. Most, if not all, known chemical carcinogens in humans are carcinogenic in some species, but not necessarily in all species of laboratory animals. It has become increasingly evident that the converse—that all chemicals identified as carcinogenic in laboratory animals are also carcinogenic in humans—is not true (Dybing and Sanner, 1999; Grisham, 1997; Hengstler et al., 1999). However, for regulatory and risk assessment purposes, positive carcinogenicity tests in animals are usually interpreted as indicative of potential human carcinogenicity. If a clear understanding of the mechanism of action of the carcinogen indicates that a positive response in animals is not relevant to humans, a positive animal bioassay may be considered irrelevant for human risk assessment (see Chap. 4). This species variation in carcinogenic response appears to be due in many instances to differences in biotransformation of the procarcinogen to the ultimate carcinogen (see Chap. 6).
The second principle is that exposure of experimental animals to chemicals in high doses is a necessary and valid method of discovering possible hazards in humans. This principle is based on the quantal dose–response concept that the incidence of an effect in a population is greater as the dose or exposure increases. Practical considerations in the design of experimental model systems require that the number of animals used in toxicology experiments always be small compared with the size of human populations at risk. Obtaining statistically valid results from such small groups of animals requires the use of relatively large doses so that the effect will occur frequently enough to be detected. However, the use of high doses can create problems in interpretation if the response(s) obtained at high doses does not occur at low doses. Thus, for example, it has been shown that bladder tumors observed in rats fed very high doses of saccharin will not occur at the much lower doses of saccharin encountered in the human diet. At the high concentrations fed to rats, saccharin forms an insoluble precipitate in the bladder that subsequently results in chronic irritation of bladder epithelium, enhanced cell proliferation, and ultimately bladder tumors (Cohen, 1998, 1999). In vitro studies have shown that precipitation of saccharin in human urine will not occur at the concentrations that could be obtained from even extraordinary consumption of this artificial sweetener. As noted above and shown in Fig. 2-8, even for mutagenic chemicals that form DNA adducts, the response at high doses, as seen for DBC, may not be linear at low doses, although for another DNA-reactive carcinogen, AFB1, the high-dose data were reflective of low-dose response in an approximately linear fashion. Examples such as these illustrate the importance of considering the molecular, biochemical, and cellular mechanisms responsible for toxicological responses when extrapolating from high to low dose and across species.
Toxicity tests are not designed to demonstrate that a chemical is safe but to characterize the toxic effects a chemical can produce. Although there are no set toxicology tests that have to be performed on every chemical intended for commerce, a tiered approach typical of many hazard assessment programs is illustrated in Fig. 2-13. Depending on the eventual use of the chemical, the toxic effects produced by structural analogs of the chemical, as well as the toxic effects produced by the chemical itself, contribute to the determination of the toxicology tests that should be performed. The FDA, EPA, and Organization for Economic Cooperation and Development (OECD) have written good laboratory practice (GLP) standards and other guidance that stipulate that procedure must be defined and accountability documented. These guidelines are expected to be followed when toxicity tests are conducted in support of the introduction of a chemical to the market.
Typical tiered testing scheme for the toxicological evaluation of new chemicals. (From Wilson et al. 2008, Fig. 19-1, p. 918.)
The following sections provide an overview of basic toxicity testing procedures in use today. For a detailed description of these tests, the reader is referred to several authoritative texts on this subject (Barile, 2010; Hayes, 2008; Jacobson-Kram and Keller, 2006; Eaton and Gallagher, 2010).
Although different countries have often had different testing requirements for toxicity testing/product safety evaluation, efforts to “harmonize” such testing protocols have resulted in more standardized approaches. The International Conference on Harmonization (ICH) of Technical Requirements for Registration of Pharmaceuticals for Human Use includes regulatory authorities from Europe, Japan, and the United States (primarily the FDA), as well as experts from the pharmaceutical industry in the 3 regions, who worked together to develop internationally recognized scientific and technical approaches to pharmaceutical product registration. ICH has adopted guidelines for most areas of toxicity testing (Table 2-3). In addition to safety assessment (ICH guidelines designated with an “S”), ICH has also established guidelines on quality (Q), efficacy (E), and multidisciplinary (M) topics. (See http://www.ich.org/products/guidelines.html for a description of current ICH guidelines and reviews by Pugsley et al. (2008, 2011) for a detailed discussion of in vitro and in vivo approaches to safety pharmacology that has been informed by the ICH regulatory guidance document for preclinical safety testing of drugs.)
Table 2-3International Conference on Harmonization (ICH) Codification of “Safety” Protocols ||Download (.pdf) Table 2-3 International Conference on Harmonization (ICH) Codification of “Safety” Protocols
|Carcinogenicity studies |
|S1A ||Need for Carcinogenicity Studies of Pharmaceuticals |
|S1B ||Testing for Carcinogenicity of Pharmaceuticals |
|S1C(R1) ||Dose Selection for Carcinogenicity Studies of Pharmaceuticals & Limit Dose |
|Genotoxicity studies |
|S2A ||Guidance on Specific Aspects of Regulatory Genotoxicity Tests for Pharmaceuticals |
|S2B ||Genotoxicity: A Standard Battery for Genotoxicity Testing of Pharmaceuticals |
|Toxicokinetics and pharmacokinetics |
|S3A ||Note for Guidance on Toxicokinetics: The Assessment of Systemic Exposure in Toxicity Studies |
|S3B ||Pharmacokinetics: Guidance for Repeated Dose Tissue Distribution Studies |
|Toxicity testing |
| ||Single Dose Toxicity Tests |
|S4 ||Duration of Chronic Toxicity Testing in Animals (Rodent and Non Rodent Toxicity Testing) |
|Reproductive toxicology |
|S5(R2) ||Detection of Toxicity to Reproduction for Medicinal Products & Toxicity to Male Fertility |
|Biotechnological products |
|S6 ||Preclinical Safety Evaluation of Biotechnology-Derived Pharmaceuticals |
|Pharmacology studies |
|S7A ||Safety Pharmacology Studies for Human Pharmaceuticals |
|S7B ||The Non-Clinical Evaluation of the Potential for Delayed Ventricular Repolarization (QT Interval Prolongation) by Human Pharmaceuticals |
|Immunotoxicology studies |
|S8 ||Immunotoxicity Studies for Human Pharmaceuticals |
|Joint safety/efficacy (multidisciplinary) topic |
|M3(R1) ||Non-Clinical Safety Studies for the Conduct of Human Clinical Trials for Pharmaceuticals |
Typically, a tiered approach is used, with subsequent tests dependent on results of initial studies. A general framework for how new chemicals are evaluated for toxicity is shown in Fig 2-13. Early studies require careful chemical evaluation of the compound or mixture to assess purity, stability, solubility, and other physicochemical factors that could impact the ability of the test compound to be delivered effectively to animals. Once this information is obtained, the chemical structure of the test compound is compared with similar chemicals for which toxicological information is already available. Structure–activity relationships may be derived from a review of existing toxicological literature, and can provide additional guidance on design of acute and repeated-dose experiments, and what specialized tests need to be completed. Once such basic information has been compiled and evaluated, the test compound is then administered to animals in acute and repeated-dose studies.
Because of increased societal pressure to reduce or eliminate the use of animals in toxicity testing, while also ensuring that new chemicals do not represent unreasonable risks to human health or the environment, regulatory agencies have been encouraging new approaches to descriptive toxicity tests that do not rely on laboratory animals. For example, the European Union (EU) promulgated an important regulatory initiative for the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH). The implementation of REACH “will have significant impact on applied toxicology and exposure assessment by stimulating innovation in sampling and analysis, toxicology testing, exposure modeling, alternative toxicity testing, and risk assessment practices” (Williams et al., 2009b). Alternative, in vitro approaches to toxicity assessment are likely to transform the way that product safety evaluation is done in the future, although the standard approaches to hazard evaluation described in this section are likely to continue as the mainstay of toxicity evaluation for the next decade, irrespective of the fact that some areas, such as acute toxicity testing and eye irritation, are likely to be largely replaced by in vitro tests in the next decade (Ukelis et al., 2008).
The development of new “omics” technologies (discussed later in this section) may have profound implications for toxicity testing in the future (NAS/NRC, 2007). The recognition that many of the existing chemicals in commercial use today, as well as new chemicals being introduced into commerce, have little toxicological information about them has prompted calls for new “high-throughput” approaches to toxicity testing that will allow at least basic hazard characterization for the thousands of untested chemicals currently in the marketplace, as well as the many new chemicals that are introduced each year. A report from the National Academy of Sciences/National Research Council in 2007 called for a “paradigm shift” in how toxicity testing is done (NAS/NRC, 2007). A key component of this new vision on toxicity testing is the use of an extensive battery of in vitro tests to evaluate “pathways” of toxicity (NAS/NRC, 2010). The hope is that new technologies in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics (discussed later in this chapter) can be combined with automated high-throughput technologies to create a tiered structure for toxicity testing. The approach to using biochemical and molecular pathway-based analyses, rather than apical end points (eg, target organ damage, mutagenesis, carcinogenesis, reproductive and developmental effects), to identify potentially problematic chemicals early in their development is particularly attractive from a time frame and economic perspective (NAS/NRC, 2010). However, it is also recognized that validation of such tests is critically important to the reliable use of such screening technologies, and that the traditional in vivo studies described in the following section will continue to serve an important role in hazard evaluations for years to come, especially as a means of validating new high-throughput screening approaches.
Generally, the first toxicity test performed on a new chemical is acute toxicity, determined from the administration of a single exposure. The objectives of acute toxicity testing are to: (1) provide an estimate of the intrinsic toxicity of the substance, often times expressed as an approximate LD (eg, LD50), (2) provide information on target organs and other clinical manifestations of toxicity, (3) identify species differences and susceptible species, (4) establish the reversibility of the toxic response, and (5) provide information that will assist in the design and dose selection for longer-term (subchronic, chronic) studies. It should be noted that the ICH recommended in 1991 (D’Arcy and Harron, 1992) the elimination of LD50 determinations for pharmaceuticals, although other regulatory requirements, for example, pesticide registration, may still require determinations of LD50s.
The LD50 and other acute toxic effects are determined after 1 or more routes of administration (1 route being oral or the intended route of exposure) in 1 or more species. The species most often used are the mouse and rat. Studies are performed in both adult male and female animals. Food is often withheld the night before dosing. The number of animals that die in a 14-day period after a single dosage is tabulated. In addition to mortality and weight, daily examination of test animals should be conducted for signs of intoxication, lethargy, behavioral modifications, morbidity, food consumption, and so on.
Determination of the LD50 has become a public issue because of increasing concern for the welfare and protection of laboratory animals. The LD50 is not a biological constant. Many factors influence toxicity and thus may alter the estimation of the LD50 in any particular study. Factors such as animal strain, age, and weight, type of feed, caging, pretrial fasting time, method of administration, volume and type of suspension medium, and duration of observation have all been shown to influence adverse responses to toxic substances. These and other factors have been discussed in detail in earlier editions of this textbook (Doull, 1980). Because of this inherent variability in LD50 estimates, it is now recognized that for most purposes it is only necessary to characterize the LD50 within an order of magnitude range such as 5 to 50 mg/kg, 50 to 500 mg/kg, and so on.
There are several traditional approaches to determining the LD50 and its 95% confidence limit as well as the slope of the probit line. The reader is referred to the classic works of Litchfield and Wilcoxon (1949), Bliss (1957), and Finney (1971) for a description of the mechanics of these procedures. Other statistical techniques that require fewer animals, such as the “moving averages” method of Thompson and Weill (Weil, 1952), are available but do not provide confidence limits for the LD50 and the slope of the probit line. Finney (1985) has succinctly summarized the advantages and deficiencies of many of the traditional methods. For most circumstances, an adequate estimate of the LD50 and an approximation of the 95% confidence intervals can be obtained with as few as 6 to 9 animals, using the “up-and-down” method as modified by Bruce (1985). When this method was compared with traditional methods that typically utilize 40 to 50 animals, excellent agreement was obtained for all 10 compounds tested (Bruce, 1987). In mice and rats the LD50 is usually determined as described above, but in the larger species only an approximation of the LD50 is obtained by increasing the dose in the same animal until serious toxic effects are evident.
Alternative in vitro approaches to estimating the LD50 have been proposed. For example, the Registry of Cytotoxicity (RC), originally published in German in 1998 (Halle, 2003), was developed by linear regression analysis of the mean IC50 values determined in mammalian cells in culture and the LD50 values reported in the literature from various laboratory species. Using this approach, the authors predicted (within a reasonable dose range) the acute oral LD50 for 252 of 347 xenobiotics, and the intravenous LD50 for rats and/or mice for 117 of 150 xenobiotics (Halle, 2003). Of course, such in vitro approaches do not fully account for dispositional effects that could result in large species differences in acute toxicity, but do provide a rapid first approximation of acute toxicity without the use of experimental animals.
If there is a reasonable likelihood of substantial exposure to the material by dermal or inhalation exposure, acute dermal and acute inhalation studies are performed. When animals are exposed acutely to chemicals in the air they breathe or the water they (fish) live in, the dose the animals receive is usually not known. For these situations, the lethal concentration 50 (LC50) is usually determined, that is, the concentration of chemical in the air or water that causes death to 50% of the animals. In reporting an LC50, it is imperative that the time of exposure be indicated. The acute dermal toxicity test is usually performed in rabbits. The site of application is shaved. The test substance is kept in contact with the skin for 24 hours by wrapping the skin with an impervious plastic material. At the end of the exposure period, the wrapping is removed and the skin is wiped to remove any test substance still remaining. Animals are observed at various intervals for 14 days, and the LD50 is calculated. If no toxicity is evident at 2 g/kg, further acute dermal toxicity testing is usually not performed. Acute inhalation studies are performed that are similar to other acute toxicity studies except that the route of exposure is inhalation. Most often, the length of exposure is 4 hours.
By themselves LD50 and LC50 values are of limited significance given the growing sophistication of target organ toxicity end points and mechanistic analysis. The most meaningful scientific information derived from acute toxicity tests comes from clinical observations and post-mortem examination of animals rather than from the specific LD50 value.
The ability of a chemical to irritate the skin and eye after an acute exposure is usually determined in rabbits. For the dermal irritation test (Draize test), rabbits are prepared by removal of fur on a section of the back by electric clippers. The chemical is applied to the skin (0.5 mL of liquid or 0.5 g of solid) under 4 covered gauze patches (1 in square; 1 intact and 2 abraded skin sites on each animal) and usually kept in contact for 4 hours. The nature of the covering patches depends on whether occlusive, semiocclusive, or nonocclusive tests are desired. For occlusive testing, the test material is covered with an impervious plastic sheet; for semiocclusive tests, a gauze dressing may be used. Occasionally, studies may require that the material be applied to abraded skin. The degree of skin irritation is scored for erythema (redness), eschar (scab), and edema (swelling) formation, and corrosive action. These dermal irritation observations are repeated at various intervals after the covered patch has been removed. To determine the degree of ocular irritation, the chemical is instilled into 1 eye (0.1 mL of liquid or 100 mg of solid) of each test rabbit. The contralateral eye is used as the control. The eyes of the rabbits are then examined at various times after application.
Controversy over this test has led to the development of alternative in vitro models for evaluating cutaneous and ocular toxicity of substances. The various in vitro methods that have been evaluated for this purpose include epidermal keratinocyte and corneal epithelial cell culture models. Several commercially available “reconstructed human epidermis” models have been developed explicitly for the purposes of in vitro skin irritation and corrosion tests (Netzlaff et al., 2005).
Information about the potential of a chemical to sensitize skin is needed in addition to irritation testing for all materials that may repeatedly come into contact with the skin. Numerous procedures have been developed to determine the potential of substances to induce a sensitization reaction in humans (delayed hypersensitivity reaction), including the Draize test, the open epicutaneous test, the Buehler test, Freund’s complete adjuvant test, the optimization test, the split adjuvant test, and the guinea pig maximization test (Hayes et al., 2008; Rush et al., 1995). Although they differ in regard to route and frequency of duration, they all utilize the guinea pig as the preferred test species. In general, the test chemical is administered to the shaved skin topically, intradermally, or both and may include the use of adjuvant to enhance the sensitivity of the assay. Multiple administrations of the test substance are generally given over a period of 2 to 4 weeks. Depending on the specific protocol, the treated area may be occluded. Approximately 2 to 3 weeks after the last treatment, the animals are challenged with a nonirritating concentration of the test substance and the development of erythema is evaluated.
Subacute (Repeated-Dose Study)
Subacute toxicity tests are performed to obtain information on the toxicity of a chemical after repeated administration and as an aid to establish doses for subchronic studies. A typical protocol is to give 3 to 4 different dosages of the chemicals to the animals by mixing it in their feed. For rats, 10 animals per sex per dose are often used; for dogs, 3 dosages and 3 to 4 animals per sex are used. Clinical chemistry and histopathology are performed after either 14 or 28 days of exposure, as described in the section “Subchronic.”
The toxicity of a chemical after subchronic exposure is then determined. Subchronic exposure can last for different periods of time, but 90 days is the most common test duration. The principal goals of the subchronic study are to establish a NOAEL and to further identify and characterize the specific organ or organs affected by the test compound after repeated administration. One may also obtain a “lowest observed adverse effect level” (LOAEL) as well as the NOAEL for the species tested. The numbers obtained for NOAEL and LOAEL will depend on how closely the dosages are spaced and the number of animals examined. Determinations of NOAELs and LOAELs have numerous regulatory implications. For example, the EPA utilizes the NOAEL to calculate the reference dose (RfD), which may be used to establish regulatory values for “acceptable” pollutant levels (Barnes and Dourson, 1988) (see Chap. 4). An alternative to the NOAEL approach referred to as the benchmark dose uses all the experimental data to fit 1 or more dose–response curves (Crump, 1984). These curves are then used to estimate a benchmark dose that is defined as “the statistical lower bound on a dose corresponding to a specified level of risk” (Allen et al., 1994a). Although subchronic studies are frequently the primary or sole source of experimental data to determine both the NOAEL and the benchmark dose, these concepts can be applied to other types of toxicity testing protocols, such as that for chronic toxicity or developmental toxicity (Allen et al., 1994a,b; Faustman et al., 1994) (see also Chap. 4 for a complete discussion of the derivation and use of NOAELs, RfDs, and benchmark doses). If chronic studies have been completed, these data are generally used for NOAEL and LOAEL estimates in preference to data from subchronic studies.
A subchronic study is usually conducted in 2 species (usually rat and dog for FDA, and mouse for EPA) by the route of intended exposure (usually oral). At least 3 doses are employed (a high dose that produces toxicity but does not cause more than 10% fatalities, a low dose that produces no apparent toxic effects, and an intermediate dose) with 10 to 20 rodents and 4 to 6 dogs of each sex per dose. Each animal should be uniquely identified with permanent markings such as ear tags, tattoos, or electronically coded microchip implants. Only healthy animals should be used, and each animal should be housed individually in an adequately controlled environment. When the test compound is administered in the diet over a prolonged period of time (subchronic and chronic studies), the concentration in the diet should be adjusted periodically (weekly for the first 12–14 weeks) to maintain a constant intake of material based on food consumption and rate of change in body weight (Wilson et al., 2008). Animals should be observed once or twice daily for signs of toxicity, including changes in body weight, diet consumption, changes in fur color or texture, respiratory or cardiovascular distress, motor and behavioral abnormalities, and palpable masses. All premature deaths should be recorded and necropsied as soon as possible. Severely moribund animals should be terminated immediately to preserve tissues and reduce unnecessary suffering. At the end of the 90-day study, all the remaining animals should be terminated and blood and tissues should be collected for further analysis. The gross and microscopic condition of the organs and tissues (about 15–20) and the weight of the major organs (about 12) are recorded and evaluated. Hematology and blood chemistry measurements are usually done before, in the middle of, and at the termination of exposure. Hematology measurements usually include hemoglobin concentration, hematocrit, erythrocyte counts, total and differential leukocyte counts, platelet count, clotting time, and prothrombin time. Clinical chemistry determinations commonly made include glucose, calcium, potassium, urea nitrogen, ALT, serum AST, gamma-glutamyltranspeptidase (GGT), sorbitol dehydrogenase, lactic dehydrogenase, alkaline phosphatase, creatinine, bilirubin, triglycerides, cholesterol, albumin, globulin, and total protein. Urinalysis is usually performed in the middle of and at the termination of the testing period and often includes determination of specific gravity or osmolarity, pH, proteins, glucose, ketones, bilirubin, and urobilinogen as well as microscopic examination of formed elements. If humans are likely to have significant exposure to the chemical by dermal contact or inhalation, subchronic dermal and/or inhalation experiments may also be required. Subchronic toxicity studies not only characterize the dose–response relationship of a test substance after repeated administration but also provide data for a more reasonable prediction of appropriate doses for chronic exposure studies.
For chemicals that are to be registered as drugs, acute and subchronic studies (and potentially additional special tests if a chemical has unusual toxic effects or therapeutic purposes) must be completed before the company can file an Investigational New Drug (IND) application with the FDA. If the application is approved, clinical trials can commence. At the same time phase I, phase II, and phase III clinical trials are performed, chronic exposure of the animals to the test compound can be carried out in laboratory animals, along with additional specialized tests.
Long-term or chronic exposure studies are performed similarly to subchronic studies except that the period of exposure is longer than 3 months. In rodents, chronic exposures are usually for 6 months to 2 years. Chronic studies in nonrodent species are usually for 1 year but may be longer. The length of exposure is somewhat dependent on the intended period of exposure in humans. For example, for pharmaceuticals, the ICH S4 guidance calls for studies of 6 months in duration in rodents, and 9 months in nonrodents. However, if the chemical is a food additive with the potential for lifetime exposure in humans, a chronic study up to 2 years in duration is likely to be required.
Dose selection is critical in these studies to ensure that premature mortality from chronic toxicity does not limit the number of animals that survive to a normal life expectancy. Most regulatory guidelines require that the highest dose administered be the estimated maximum tolerable dose (MTD, also commonly referred to as the “minimally toxic dose”). This is generally derived from subchronic studies, but additional longer studies (eg, 6 months) may be necessary if delayed effects or extensive cumulative toxicity are indicated in the 90-day subchronic study. The MTD has had various definitions (Haseman, 1985). It has been defined by some regulatory agencies as the dose that suppresses body weight gain slightly (ie, 10%) in a 90-day subchronic study (Reno, 1997). However, regulatory agencies may also consider the use of parameters other than weight gain, such as physiological and pharmacokinetic considerations and urinary metabolite profiles, as indicators of an appropriate MTD (Reno, 1997). Generally, 1 or 2 additional doses, usually fractions of the MTD (eg, one-half and one-quarter MTD), and a control group are tested.
Chronic toxicity tests may include a consideration of the carcinogenic potential of chemicals so that a separate lifetime feeding study that addresses carcinogenicity does not have to be performed. However, specific chronic studies designed to assess the carcinogenic potential of a substance may be required (see below).
Developmental and Reproductive Toxicity
The effects of chemicals on reproduction and development also need to be determined. Developmental toxicology is the study of adverse effects on the developing organism occurring anytime during the life span of the organism that may result from exposure to chemical or physical agents before conception (either parent), during prenatal development, or postnatally until the time of puberty. Teratology is the study of defects induced during development between conception and birth (see Chap. 10). Reproductive toxicology is the study of the occurrence of adverse effects on the male or female reproductive system that may result from exposure to chemical or physical agents (see Chap. 20).
Several types of animal tests are utilized to examine the potential of an agent to alter development and reproduction. (For a detailed description of reproductive and developmental toxicity testing procedures, see Christian .) General fertility and reproductive performance (segment I) tests are usually performed in rats with 2 or 3 doses (20 rats per sex per dose) of the test chemical (neither produces maternal toxicity). Males are given the chemical 60 days and females 14 days before mating. The animals are given the chemical throughout gestation and lactation. Typical observations made include the percentage of females that become pregnant, the number of stillborn and live offspring, and the weight, growth, survival, and general condition of the offspring during the first 3 weeks of life.
The potential of chemicals to disrupt normal embryonic and/or fetal development (teratogenic effects) is also determined in laboratory animals. Current guidelines for these segment II studies call for the use of 2 species, including 1 nonrodent species (usually rabbits). Teratogens are most effective when administered during the first trimester, the period of organogenesis. Thus, the animals (usually 12 rabbits and 24 rats or mice per group) are usually exposed to 1 of 3 dosages during organogenesis (days 7-17 in rodents and 7-19 in rabbits), and the fetuses are removed by cesarean section a day before the estimated time of delivery (gestational days 29 for rabbit, 20 for rat, and 18 for mouse). The uterus is excised and weighed and then examined for the number of live, dead, and resorbed fetuses. Live fetuses are weighed; half of each litter is examined for skeletal abnormalities and the remaining half for soft tissue anomalies.
The perinatal and postnatal toxicities of chemicals also are often examined (segment III). This test is performed by administering the test compound to rats from the 15th day of gestation throughout delivery and lactation and determining its effect on the birth weight, survival, and growth of the offspring during the first 3 weeks of life.
In some instances a multigenerational study may be chosen, often in place of segment III studies, to determine the effects of chemicals on the reproductive system. At least 3 dosage levels are given to groups of 25 female and 25 male rats shortly after weaning (30–40 days of age). These rats are referred to as the F0 generation. Dosing continues throughout breeding (about 140 days of age), gestation, and lactation. The offspring (F1 generation) have thus been exposed to the chemical in utero, via lactation, and in the feed thereafter. When the F1 generation is about 140 days old, about 25 females and 25 males are bred to produce the F2 generation, and administration of the chemical is continued. The F2 generation is thus also exposed to the chemical in utero and via lactation. The F1 and F2 litters are examined as soon as possible after delivery. The percentage of F0 and F1 females that get pregnant, the number of pregnancies that go to full term, the litter size, the number of stillborn, and the number of live births are recorded. Viability counts and pup weights are recorded at birth and at 4, 7, 14, and 21 days of age. The fertility index (percentage of mating resulting in pregnancy), gestation index (percentage of pregnancies resulting in live litters), viability index (percentage of animals that survive 4 days or longer), and lactation index (percentage of animals alive at 4 days that survived the 21-day lactation period) are then calculated. Gross necropsy and histopathology are performed on some of the parents (F0 and F1), with the greatest attention being paid to the reproductive organs, and gross necropsy is performed on all weanlings.
The International Conference on Harmonization (ICH) guidelines provide for flexible guidelines that address 6 “ICH stages” of development: premating and conception (stage A), conception to implantation (stage B), implantation to closure of the hard palate (stage C), closure of the hard palate to end of pregnancy (stage D), birth and weaning (stage E), and weaning to sexual maturity (stage F). All of these stages are covered in the segment I to segment III studies described above (Christian, 2008).
Numerous short-term tests for teratogenicity have been developed (Faustman, 1988). These tests utilize whole-embryo culture, organ culture, and primary and established cell cultures to examine developmental processes and estimate the potential teratogenic risks of chemicals. Many of these in utero test systems are under evaluation for use in screening new chemicals for teratogenic effects. These systems vary in their ability to identify specific teratogenic events and alterations in cell growth and differentiation. In general, the available assays cannot identify functional or behavioral teratogens (Faustman, 1988).
Mutagenesis is the ability of chemicals to cause changes in the genetic material in the nucleus of cells in ways that allow the changes to be transmitted during cell division. Mutations can occur in either of 2 cell types, with substantially different consequences. Germinal mutations damage DNA in sperm and ova, which can undergo meiotic division and therefore have the potential for transmission of the mutations to future generations. If mutations are present at the time of fertilization in either the egg or the sperm, the resulting combination of genetic material may not be viable, and the death may occur in the early stages of embryonic cell division. Alternatively, the mutation in the genetic material may not affect early embryogenesis but may result in the death of the fetus at a later developmental period, resulting in abortion. Congenital abnormalities may also result from mutations. Somatic mutations refer to mutations in all other cell types and are not heritable but may result in cell death or transmission of a genetic defect to other cells in the same tissue through mitotic division. Because the initiating event of chemical carcinogenesis is thought to be a mutagenic one, mutagenic tests are often used to screen for potential carcinogens.
Numerous in vivo and in vitro procedures have been devised to test chemicals for their ability to cause mutations. Some genetic alterations are visible with the light microscope. In this case, cytogenetic analysis of bone marrow smears is used after the animals have been exposed to the test agent. Because some mutations are incompatible with normal development, the mutagenic potential of a chemical can also be evaluated by the dominant lethal test. This test is usually performed in rodents. The male is exposed to a single dose of the test compound and then is mated with 2 untreated females weekly for 8 weeks. The females are killed before term, and the number of live embryos and the number of corpora lutea are determined.
The test for mutagens that has received the widest attention is the Salmonella/microsome test developed by Ames et al. (1975). This test uses several mutant strains of Salmonella typhimurium that lack the enzyme phosphoribosyl ATP synthetase, which is required for histidine synthesis. These strains are unable to grow in a histidine-deficient medium unless a reverse or back mutation to the wild type has occurred. Other mutations in these bacteria have been introduced to enhance the sensitivity of the strains to mutagenesis. The 2 most significant additional mutations enhance penetration of substances into the bacteria and decrease the ability of the bacteria to repair DNA damage. Because many chemicals are not mutagenic or carcinogenic unless they are biotransformed to a toxic product by enzymes in the endoplasmic reticulum (microsomes), rat liver microsomes are usually added to the medium containing the mutant strain and the test chemical. The number of reverse mutations is then quantified by the number of bacterial colonies that grow in a histidine-deficient medium.
Strains of yeast have recently been developed that detect genetic alterations arising during cell division after exposure to nongenotoxic carcinogens as well as mutations that arise directly from genotoxic carcinogens. This test identifies deletions of genetic material that occur during recombination events in cell division that may result from oxidative damage to DNA, direct mutagenic effects, alterations in fidelity of DNA repair, and/or changes in cell cycle regulation (Galli and Schiestl, 1999). Mutagenicity is discussed in detail in Chap. 9.
With the advent of techniques that readily allow manipulation of the mouse genome, transgenic animals have been developed that allow for in vivo assessment of mutagenicity of compounds. For example, 2 commercially available mouse strains, the “MutaMouse” and “Big Blue,” contain the lac operon of E. coli that has been inserted into genomic DNA using a lambda phage to DNA to produce a recoverable shuttle vector. Stable, homozygous strains of these transgenic animals (both mice and rats have been engineered) can be exposed to potential mutagenic agents. Following in vivo exposure, the target lac genes can be recovered from virtually any cell type or organ and analyzed for mutations (Brusick et al., 2008).
Oncogenicity studies are both time consuming and expensive, and are usually only done when there is reason to suspect that a chemical may be carcinogenic, or when there may be wide spread, long-term exposures to humans (eg, widely used food additives, drinking water contaminants, or pharmaceuticals that are likely to be administered repeatedly for long periods of time). Chemicals that test positive in several mutagenicity assays are likely to be carcinogenic, and thus are frequent candidates for oncogenicity bioassay assessment. In the United States, the National Toxicology Program (NTP) has the primary responsibility for evaluating nondrug chemicals for carcinogenic potential. For pharmaceuticals, the FDA may require the manufacturer to conduct oncogenicity studies as part of the preclinical assessment, depending on the intended use of the drug, and the results of mutagenicity assays and other toxicological data.
Studies to evaluate the oncogenic (carcinogenic) potential of chemicals are usually performed in rats and mice and extend over the average lifetime of the species (18 months to 2 years for mice, 2–2.5 years for rats). To ensure that 30 rats per dose survive the 2-year study, 60 rats per group per sex are often started in the study. Both gross and microscopic pathological examinations are made not only on animals that survive the chronic exposure but also on those that die prematurely. The use of the MTD in carcinogenicity has been the subject of controversy. The premise that high doses are necessary for testing the carcinogenic potential of chemicals is derived from the statistical and experimental design limitations of chronic bioassays. Consider that a 0.5% increase in cancer incidence in the United States would result in over 1 million additional cancer deaths each year—clearly an unacceptably high risk. However, identifying with statistical confidence a 0.5% incidence of cancer in a group of experimental animals would require a minimum of 1000 test animals, and this assumes that no tumors were present in the absence of exposure (zero background incidence).
Fig. 2-14 shows the statistical relationship between minimum detectable tumor incidence and the number of test animals per group. This curve shows that in a chronic bioassay with 50 animals per test group, a tumor incidence of about 8% could exist even though no animals in the test group had tumors. This example assumes that there are no tumors in the control group. These statistical considerations illustrate why animals are tested at doses higher than those that occur in human exposure. Because it is impractical to use the large number of animals that would be required to test the potential carcinogenicity of a chemical at the doses usually encountered by people, the alternative is to assume that there is a relationship between the administered dose and the tumorigenic response and give animals doses of the chemical that are high enough to produce a measurable tumor response in a reasonable size test group, such as 40 to 50 animals per dose. The limitations of this approach are discussed in Chap. 4. For nonmutagenic pharmaceutical agents, ICH S1C provides the following guidance on dose selection for oncogenicity studies: “The doses selected for rodent bioassays for non-genotoxic pharmaceuticals should provide an exposure to the agent that (1) allow an adequate margin of safety over the human therapeutic exposure, (2) are tolerated without significant chronic physiological dysfunction and are compatible with good survival, (3) are guided by a comprehensive set of animal and human data that focus broadly on the properties of the agent and the suitability of the animal (4) and permit data interpretation in the context of clinical use.”
Statistical limitations in the power of experimental animal studies to detect tumorigenic effects.
Another approach for establishing maximum doses for use in chronic animal toxicity testing of drugs is often used for substances for which basic human pharmacokinetic data are available (eg, new pharmaceutical agents that have completed phase I clinical trials). For chronic animal studies performed on drugs where single-dose human pharmacokinetic data are available, a daily dose that would provide an area under the curve (AUC) in laboratory animals equivalent to 25 times the AUC in humans given the highest (single) daily dose to be used therapeutically may be used, rather than the MTD. Based on a series of assumptions regarding allometric scaling between rodents and humans (Table 2-2), the ICH noted that it may not be necessary to exceed a dose of 1500 mg/kg per day where there is no evidence of genotoxicity, and where the maximum recommended human dose does not exceed 500 mg per day.
Most regulatory guidelines require that both benign and malignant tumors be reported in oncogenicity bioassays. Statistical increases above the control incidence of tumors (either all tumors or specific tumor types) in the treatment groups are considered indicative of carcinogenic potential of the chemical unless there are qualifying factors that suggest otherwise (lack of a dose response, unusually low incidence of tumors in the control group compared with “historic” controls, etc; Huff, 1999). Thus, the conclusion as to whether a given chronic bioassay is positive or negative for carcinogenic potential of the test substance requires careful consideration of background tumor incidence. Properly designed chronic oncogenicity studies require that a concurrent control group matched for variables such as age, diet, and housing conditions be used. For some tumor types, the “background” incidence of tumors is surprisingly high. Fig. 2-15 shows the background tumor incidence for various tumors in male and female F344 rats used in 27 NTP 2-year rodent carcinogenicity studies. The data shown represent the percent of animals in control (nonexposed) groups that developed the specified tumor type by the end of the 2-year study. These studies involved more than 1300 rats of each sex. Fig. 2-16 shows similar data for control (nonexposed) male and female B6C3F1 mice from 30 recent NTP 2-year carcinogenicity studies and includes data from over 1400 mice of each sex. There are several key points that can be derived from these summary data:
Tumors, both benign and malignant, are not uncommon events in animals even in the absence of exposure to any known carcinogen.
There are numerous different tumor types that develop “spontaneously” in both sexes of both rats and mice, but at different rates.
Background tumors that are common in 1 species may be uncommon in another (eg, testicular interstitial cell adenomas are very common in male rats but rare in male mice; liver adenomas/carcinomas are about 10 times more prevalent in male mice than in male rats).
Even within the same species and strain, large gender differences in background tumor incidence are sometimes observed (eg, adrenal gland pheochromocytomas are about 7 times more prevalent in male F344 rats than in female F344 rats; lung and liver tumors are twice as prevalent in male B6C3F1 mice as in female B6C3F1 mice).
Even when the general protocols, diets, environment, strain and source of animals, and other variables are relatively constant, background tumor incidence can vary widely, as shown by the relatively large SDs for some tumor types in the NTP bioassay program. For example, the range in liver adenoma/carcinoma incidence in 30 different groups of unexposed (control) male B6C3F1 mice went from a low of 10% to a high of 68%. Pituitary gland adenomas/carcinomas ranged from 12% to 60% and 30% to 76% in unexposed male and female F344 rats, respectively, and from 0% to 36% in unexposed female B6C3F1 mice.
Most frequently occurring tumors in untreated control rats from recent NTP 2-year rodent carcinogenicity studies. The values shown represent the mean ± SD of the percentage of animals developing the specified tumor type at the end of the 2-year study. The values were obtained from 27 different studies involving a combined total of between 1319 and 1353 animals per tumor type.
Most frequently occurring tumors in untreated control mice from recent NTP 2-year rodent carcinogenicity studies. The values shown represent the mean ± SD of the percentage of animals developing the specified tumor type at the end of the 2-year study. The values were obtained from 30 different studies involving a total of between 1447 and 1474 animals per tumor type.
Taken together, these data demonstrate the importance of including concurrent control animals in such studies. In addition, comparisons of the concurrent control results to “historic” controls accumulated over years of study may be important in identifying potentially spurious “false-positive” results. The relatively high variability in background tumor incidence among groups of healthy, highly inbred strains of animals maintained on nutritionally balanced and consistent diets in rather sterile environments highlights the dilemma in interpreting the significance of both positive and negative results in regard to the human population, which is genetically diverse, has tremendous variability in diet, nutritional status, and overall health, and lives in an environment full of potentially carcinogenic substances, both natural and human-made.
Finally, it should be noted that both inbred and outbred strains have distinct background tumor patterns and the NTP and most other testing programs select strains based on the particular needs of the agent under study. For example, the NTP used the Wistar rat for chemicals that may have the testis as a target organ, based on acute, subchronic, or other bioassay results. Similarly, the NTP used the Sprague–Dawley strain of rat in studies of estrogenic agents such as genistein because its mammary tumors are responsive to estrogenic stimulation, as are humans’.
Neurotoxicity or a neurotoxic effect is defined as an adverse change in the chemistry, structure, or function of the nervous system following exposure to a chemical or physical agent. The structure, function, and development of the nervous system and its vulnerability to chemicals are examined in Chap. 16. When evaluating the potential neurological effects of a compound, effects may be on the central or peripheral nervous system or related to exposure that occurred during development or as an adult. The developing nervous system is particularly sensitive to chemical exposures (see Chap. 10).
In vitro systems often using cell culture techniques are a rapidly developing area of neurotoxicity assessment. Specific cell lines are available to examine effects on neuron or glial cells such as proliferation, migration, apoptosis, synaptogenesis, and other end points. In vitro assays have a number of potential advantages including minimizing the use of animal, lower costs, and adaptable to high-throughput screening. It is also possible to use an in vitro model to examine the interaction of chemicals, such as food additives, on neuronal cells (Lau et al., 2006). The principles and challenges of in vitro neurotoxicity testing are well described (Claudio, 1992; Tiffany-Castiglioni, 2004).
Procedures for the neurobehavioral evaluation of animals were initially developed as part of the scientific investigation of behavioral motivation. Some of these procedures were then used to evaluate the neuropharmacological properties of new drugs. Now animals are commonly used to evaluate the neurotoxic properties of chemicals. A wide range of adult and developmental animal tests are used to access neurobehavioral function. In addition, neuropathological assessment is an important part of the neurotoxicity evaluation and best practices have been developed for developmental neurotoxicity (DNT) (Bolon et al., 2006). Irwin developed a basic screen for behavioral function in mice (Irwin, 1968), which was subsequently refined to the functional observational battery (FOB) (Moser, 2000). The FOB can also be used in the evaluation of drug safety (Redfern et al., 2005).
The US EPA established a protocol for the evaluation of DNT in laboratory animals (US EPA 870.6300 and OECD 426) (EPA, 1998; OECD, 2004). These protocols include tests of neurobehavioral function, such as auditory startle, learning and memory function, changes in motor activity, and neuropathological examination and morphometric analysis. Methods and procedures for DNT evaluation are well established (Claudio et al., 2000; Cory-Slechta et al., 2001; Dorman et al., 2001; Garman et al., 2001; Mileson and Ferenc, 2001). Recent studies examine the neurotoxicity of multiple chemical exposures in animals (Moser et al., 2006). Methods are also available to examine cognitive measures on weanling rodents in DNT studies (Ehman and Moser, 2006). Nonhuman primates have been invaluable in evaluating the effects of neurotoxicants and the risk assessment process (Burbacher and Grant, 2000). Sophisticated assessment of operant behavior, and learning and memory assessment of rodents, has been used to evaluate the effects of lead (Cory-Slechta, 1995, 1996, 2003). Monkeys can also be used to evaluate the low-level effects of neurotoxicants such as mercury on vision, auditory function, and vibration sensitivity (Burbacher et al., 2005; Rice and Gilbert, 1982, 1992, 1995). There is remarkable concordance between human and animal neurotoxicity assessment, for example, in lead, mercury, and PCBs (Rice, 1995).
Human testing for the neurological effects of occupational exposures to chemicals (Anger, 2003; Farahat et al., 2003; Kamel et al., 2003; McCauley et al., 2006), and even the neurotoxic effects of war (Binder et al., 1999, 2001), is advancing rapidly. These methods have also been applied to Hispanic workers (Rohlman et al., 2001b) and populations with limited education or literacy (Rohlman et al., 2003). The WHO has also recommended a test battery for humans (Anger et al., 2000). There are also neurobehavioral test batteries for assessing children (Rohlman et al., 2001a). Evaluation of the childhood neurological effects of lead (Lanphear et al., 2005; Needleman and Bellinger, 1991) and mercury (Myers et al., 2000) has added enormously to our understanding of the health effects of these chemicals and to the methodology of human neurobehavioral testing.
In summary, the neurotoxicological evaluation is an important aspect of developing risk assessments for environmental chemicals and drugs.
Under normal conditions, the immune system is responsible for host defense against pathogenic infections and certain cancers. However, environmental exposures can alter immune system development and/or function and lead to hypersensitivity, autoimmunity, or immunosuppression, the outcome of which may be expressed as a pathology in most any organ or tissue (see Chap. 12). Our understanding of the biological processes underlying immune system dysfunction remains incomplete. However, advances in molecular biology (including use of transgenic/knockout mice), analytic methods (including gene expression arrays and multiparameter flow cytometry), animal models (including adoptive transfers in immunocompromised mice and host resistance to viral, bacterial, or tumor cell challenge), and other methods are greatly advancing our knowledge.
From a toxicologist’s perspective, evaluation of immune system toxicity represents special challenges. Development of hypersensitivity can take various forms, depending on the mechanism underlying the associated immune response, and standard assumptions regarding dose–response relationships may not necessarily apply. For example, a single or incidental exposure to beryllium has been associated with chronic beryllium disease in some individuals. We are only just beginning to understand the biological basis underlying such individual susceptibility. In the case of chronic beryllium disease, a genetic polymorphism in a gene involved in antigen recognition may be associated with increased susceptibility (see Bartell et al., 2000). Although our ability to predict immunogenicity remains poor, research efforts are continuing to identify aspects of the chemical and the individual that confer immunogenicity and underlie hypersensitivity. For example, the increasing incidence of allergic asthma among preschool-age children in the United States since the 1980s may be associated with exposure to allergens (eg, dust mites, molds, and animal dander), genetic factors, and other factors in the in utero and postnatal environment (see Donovan and Finn, 1999; Armstrong et al., 2005).
Immunosuppression is another form of immune system toxicity, which can result in a failure to respond to pathogenic infection, a prolonged infection period, or expression of a latent infection or cancer. Various chemicals have been associated with immunosuppression. Broad-spectrum and targeted immunosuppressive chemicals are designed and used therapeutically to reduce organ transplant rejection or suppress inflammation. However, a large number of chemicals have been associated with immunosuppression, including organochlorine pesticides, diethylstilbesterol, lead, and halogenated aromatic hydrocarbons (including TCDD), and exposures that occur during critical stages may present special risk to development (Holladay, 2005).
Autoimmunity is a specific immune system disorder in which components of the immune system attack normal (self) tissues. Cases of autoimmunity have been reported for a wide range of chemicals including therapeutic drugs, metals, pesticides, and solvents. As with other forms of immune system toxicity, autoimmunity can present in most any tissue.
Finally, new forms of immunotoxicity are appearing based on novel forms of clinical therapy and immunomodulation. These include the variously classified “tumor lysis syndromes” and “cytokine storms” that arise from massive cytokine dysregulation. A recent example involved 6 healthy volunteers who had enrolled in a phase 1 clinical trial in the United Kingdom who developed a severe cytokine response to an anti-CD28 monoclonal antibody leading to systemic organ failures (Bhogal and Combes, 2006). Such cases are stark reminders of the challenges we face in understanding how the immune system is regulated, developing reliable test systems for identifying such risks prior to human use, and developing safe means for testing these agents in humans.
As described in Chap. 12, current practice for evaluating potential toxic effects of xenobiotic exposures on the immune system involves a tiered approach to immunotoxicity screening (Luster et al., 2003). This tiered approach is generally accepted worldwide in the registration of novel chemical and therapeutic products. Most recently, final guidance to the pharmaceutical industry was published in April 2006 by the ICH of Technical Requirements for Registration of Pharmaceuticals for Human Use (Table 2-3). This guidance, which applies to the nonclinical (animal) testing of human pharmaceuticals, is the accepted standard in the United States, EU, and Japan, and demonstrates the continued commitment by these regulatory bodies to understand the potential risks posed by novel therapeutics.
Tiered testing relies on the concept that standard toxicity studies can provide good evidence for immunotoxicity when considered with known biological properties of the chemical, including structural similarities to known immunomodulators, disposition, and other clinical information, such as increased occurrence of infections or tumors. Evaluation of hematological changes, including differential effects on white blood cells and immunoglobulin changes, and alterations in lymphoid organ weights or histology, can provide strong evidence of potential effects to the immune system. Should such evaluations indicate a potential effect on immune system function, more detailed evaluations may be considered, including the evaluation of functional effects (eg, T-cell-dependent antibody response or natural killer cell activity), flow cytometric immunophenotyping, or host resistance studies. Thus, as with other areas of toxicology, the evaluation of immune system toxicity requires the toxicologist to be vigilant in observing early indications from a variety of sources in developing a weight-of-evidence assessment regarding potential injury/dysfunction.
Other Descriptive Toxicity Tests
Most of the tests described above will be included in a “standard” toxicity testing protocol because they are required by the various regulatory agencies. Additional tests may be required or included in the protocol to provide information relating a special route of exposure, such as inhalation. Inhalation toxicity tests in animals usually are carried out in a dynamic (flowing) chamber rather than in static chambers to avoid particulate settling and exhaled gas complications. Such studies usually require special dispersing and analytic methodologies, depending on whether the agent to be tested is a gas, vapor, or aerosol; additional information on methods, concepts, and problems associated with inhalation toxicology is provided in Chaps. 15 and 28. The duration of exposure for inhalation toxicity tests can be acute, subchronic, or chronic, but acute studies are more common with inhalation toxicology. Other special types of animal toxicity tests include toxicokinetics (absorption, distribution, biotransformation, and excretion), the development of appropriate antidotes and treatment regimens for poisoning, and the development of analytic techniques to detect residues of chemicals in tissues and other biological materials.