Are Estimates of Early Education Programs Too Pessimistic? Evidence from a Large-Scale Field Experiment that Causally Measures Neighbor Effects

We estimate the direct and spillover effects of a large-scale early childhood intervention on the educational attainment of over 2,000 disadvantaged children in the United States. We show that failing to account for spillover effects results in a severe underestimation of the impact. The intervention induced positive direct effects on test scores of children assigned to the treatment groups. We document large spillover effects on both treatment and control children who live near treated children. On average, spillover effects increase a child's non-cognitive (cognitive) scores by about 1.2 (0.6 to 0.7) standard deviations. The spillover effects are localized, decreasing with the spatial distance to treated neighbors. Our evidence suggests the spillover effect on non-cognitive scores are likely to operate through the child's social network. Alternatively, parental investment is an important channel through which cognitive spillover effects operate. We view our results as speaking to several literatures, perhaps most importantly the role of public programs and neighborhoods on human capital formation at an early age.

"... I will emphasize again and again: that human capital accumulation is a social activity, involving groups of people in a way that has no counterpart in the accumulation of physical capital..." Lucas (1988) 1 Introduction Evaluations of early childhood programs have played an important role in shaping policy debates on early education. For instance, the Head Start Impact Study (HSIS), a recent randomized control trial of Head Start, reported small effect sizes that fade considerably over a few years (Puma et al., 2010(Puma et al., , 2012. These findings have heightened debate among academics over the cost effectiveness of Head Start (e.g. Barnett, 2011;Gibbs, Ludwig, and Miller, 2013; Kline and Walters, 2016) and have been frequently cited by critics who argue Head Start is ineffective in achieving its mission and should be abandoned or seriously reformed. 1 Given the policy impact of the findings from early education interventions, and more broadly any social intervention, accurate evaluation of the total effect of these programs is crucial.
The standard approach in evaluating social programs is to randomly assign subjects to treatment and control groups. From there, many analysts simply difference the mean outcomes and report the monetized treatment effect within a benefit-cost framework. This approach is based on the assumption that a person's potential outcomes are independent of other participants' treatment assignment; that is, no spillover effects occur. However, if the assignment of treatment to one subject alters the outcomes of control subjects, the difference between the control and treatment means would not reflect the true impact of the program. Specifically, for early education programs, if the intervention generates positive (negative) externalities benefiting (harming) the control children, ignoring spillover effects would result in an understatement (overstatement) of treatment effects and send an overly pessimistic (optimistic) signal of program quality. Accounting for spillover effects becomes even more important when a program is taken to scale. The level of social interactionswhich are often a key channel through which spillover effects operate-increases at scale, resulting in an even larger bias in impact estimates if spillover effects are ignored (Al-Ubaydli et al., 2017a; Deaton and Cartwright, 2018).
In this paper, we provide the first empirical evidence on spillover effects from a large-scale early education intervention by causally estimating neighbor effects. We find that ignoring these effects results in severe underestimation of the total impact. The backdrop is that between 2010 and 2014, a series of early childhood programs were delivered to low-income families with young children in the Chicago Heights Early Childhood Center (CHECC). CHECC was located in Chicago Heights, IL, a neighborhood on Chicago's South Side with characteristics similar to many other low-performing urban school districts. The goals of the intervention were to examine how investing in cognitive and non-cognitive skills of low-income children aged 3 to 4 affects their short-and long-term outcomes, and to evaluate the effectiveness of investing directly in the child's education versus indirectly through the parents. To that end, families of over 2,000 disadvantaged children were randomized into (i) an incentivized parent-education program (Parent Academy), (ii) a high-quality preschool program (Pre-K), or (iii) a control group. The children's cognitive and non-cognitive skills were assessed on a regular basis, starting before the randomization and continuing into the middle and end of the programs. Follow-up assessments were also conducted on a yearly basis. were randomized into the control and treatment groups. These studies reported modest yet positive treatment effects on cognitive and non-cognitive test scores. Both of these studies estimated treatment effects under the assumption that potential outcomes are independent of other children's treatment assignment. Our goal is to relax this assumption and explore the spatial spillover effects on children's cognitive and non-cognitive performance. In particular, we study how control and treated children indirectly benefit from this intervention through their "treated" neighbors.
Furthermore, we shed light on the underlying mechanisms through which these spillover effects operate. Finally, we estimate the total impact of the intervention, accounting for spillovers, and illustrate how ignoring these indirect effects would bias our estimates of the program impacts.
Treatment effects from the programs offered at CHECC can spill over through two main channels.
The first channel is the direct social interactions between children who were randomized during the intervention. Our analysis includes observations from early childhood (3 to 4 years of age), when peer influence at the neighborhood level starts, to middle childhood (8 to 9 years of age), when social interactions within neighborhoods increase dramatically as children enter school (Leventhal and Brooks-Gunn, 2000). Therefore, direct exposure to treated children who live in the same neighborhood is a likely mechanism that can generate spatial spillover effects. 2 The second channel is parental interaction. Observational studies have shown that neighborhoods can influence parental behavior and child-rearing practices (Leventhal and Brooks-Gunn, 2000), which play critical roles in early development (Cunha and Heckman, 2007;Waldfogel and Washbrook, 2011;Kautz et al., 2014;Fryer, Levitt, and List, 2015;Kalil, 2015). Because CHECC also offered education programs to parents, treatment effects can spill over through information and preference externalities, generated by parental social interactions.
We identify the spillover effects from CHECC, by exploiting certain unique features of our data.
First, conditional on the total number of neighbors who signed up to participate, the number of neighbors who were subsequently assigned to treatment is determined exogenously through the randomization process. We leverage this experimental variation in spatial exposure to treatments across children to estimate spillover effects. Second, our main identification strategy also exploits the panel nature of our data and the within-individual variation in exposure to treated neighbors induced by delivery of programs over multiple years. Specifically, by including individual-specific fixed effects, our estimates control for any time-invariant individual, family, and neighborhood unobserved characteristics that might be correlated with spatial exposure to treatments. We also estimate the effects under a second model that relaxes the assumption of time-invariant omitted variables by controlling for the lagged dependent variables (LDV) and dispensing with the fixed effects. Whereas our main identification strategy uses within-individual variations to estimate the spillover effects, the LDV specification estimates the effects by exploiting both within-individual and between-individual variations in spatial exposure to treatments. Our findings, presented below, are robust to using this alternative specification.
We document large and significant spillover effects on both cognitive and non-cognitive skills and find the non-cognitive spillover effects are about two times larger than the cognitive spillover effects.
Our estimates suggest that, on average, each additional treated neighbor residing within a threekilometer radius of a child's home increases that child's cognitive score by 0.0033 to 0.0042 standard deviations (σ), whereas it increases her non-cognitive score by 0.0069σ to 0.0070σ. Given that an average child in our sample has 178 treated neighbors residing within a three-kilometer radius of her home-and making a (strong) assumption of linearity-we infer that, on average, a child gains between 0.6σ to 0.7σ in cognitive test scores and about 1.2σ in non-cognitive test scores in spillover effects from her treated neighbors. Interestingly, we find that the spillover effects are localized and fall rapidly as the distance to a treated neighbor increases. Fryer, Levitt, and List (2015) reported heterogeneity in treatment effects from the parentaleducation component of CHECC along the lines of gender and race. For example, through comparing outcomes between treatment and control children, they find the Parent Academy increases test scores for Hispanics and Whites, but does not improve outcomes of African American children.
These findings prompt us to explore whether such heterogeneities also exist in spillover effects. Our estimates suggest that non-cognitive spillover effects are significantly larger for African Americans than Hispanics. According to our fixed-effects estimates, an additional treated neighbor within a three-kilometer radius increases the non-cognitive test score of an African American child by 0.0100σ, whereas it increases the non-cognitive score of a Hispanic child by only 0.0045σ. We find no significant racial differences in cognitive spillover effects. Focusing on gender, our estimates suggest boys tend to benefit more than girls from cognitive and non-cognitive spillovers, although these gender differences are not significant at the conventional levels. This observation is consistent with previous empirical evidence on neighborhood effects, which tend to be larger for boys (Entwisle, Alexander, and Olson, 1994 To shed light on the mechanisms through which spillover effects operate, we start by comparing the effects from neighbors who were assigned to the parental-education programs with the effects from neighbors who were assigned to the preschool programs. Because, unlike in the Pre-K treatments, the focus of Parent Academies was on educating parents rather than children, if spillover effects are driven by interactions between parents, we might expect Parent Academy neighbors to generate larger effects than Pre-K neighbors. 3 Alternatively, larger spillovers from Pre-K neighbors than from Parent Academy neighbors could imply the peer-influence channel plays an important role in generating the effects. Our estimates suggest non-cognitive spillovers are more likely to operate through preschool neighbors. According to our fixed-effects estimates, whereas an additional Parent Academy neighbor within three kilometers of a child's home induces a 0.0045σ increase in her non-cognitive score, an additional Pre-K neighbor living within the same distance increases her non-cognitive score by 0.0099σ. This finding suggests non-cognitive spillover effects are more likely to operate through children's rather than parents' social networks. We do not find any significant differences in cognitive spillover effects from Parent Academy and Pre-K neighbors. Given our evidence suggesting peer influence at the neighborhood level is a key mechanism in generating non-cognitive spillover effects, we hypothesize that the racial differences in non-cognitive spillovers might be at least partially driven by differences in social interactions within neighborhoods. We explore this idea using data from the National Longitudinal Study of Adolescent Health Survey (Add Health). Our analysis confirms that African American adolescents are significantly more likely than Hispanics to (i) know most people in their neighborhoods, (ii) stop on the street and talk to someone from the neighborhood, and (iii) use recreation facilities in the neighborhood.
Although these results cannot be interpreted as causal evidence, they are consistent with our previous finding that social interactions with peers within neighborhoods is a key channel in generating non-cognitive spillover effects.
Our estimates also suggest older children benefit more from non-cognitive spillovers. Indeed, we find the spillover effect on the non-cognitive test score of a 6-year-old child to be about 0.13σ to 0.14σ larger than the corresponding effect on a 5-year-old child. Because social interactions with peers in the neighborhood increase as a child gets older and enters middle childhood (Higgins and Parsons 1983;Cook and Cook, 2009), this observation is consistent with the hypothesis that peer influence is an important force in generating non-cognitive spillover effects.
Finally, our evidence suggests cognitive spillover effects are likely to operate-at least partiallythrough influencing the parents' decision to enroll their child in a (non-CHECC) preschool program.
Using survey data, we show that families with more treated neighbors are significantly more likely to enroll their child in a preschool program (other than the ones offered at CHECC). Our evidence also suggests children whose parents reported enrolling them in an alternative preschool program perform significantly better in cognitive assessments. Therefore, we conclude that influencing parental investment decisions-as measured by the choice to enroll one's child in a preschool program-is a channel through which spillover effects on cognitive test scores operate.
We conclude our analysis by measuring the total impact of the intervention on children's cognitive and non-cognitive performance, accounting for the spillover effects. Our estimates suggest that, on average, the intervention increased a treatment child's cognitive (non-cognitive) test score by 0.82σ (1.32σ). Spillover effects make up a large portion of this total impact: whereas the average direct effect of the intervention on a treatment child's cognitive (non-cognitive) score is 0.11σ (0.05σ), the corresponding indirect effect is 0.71σ (1.27σ). Control children also gain considerably as a result of the intervention: on average, the intervention increased a control child's cognitive (non-cognitive) test score by 0.75σ (1.25σ). If we were to disregard the spillover effects on the control group and had simply based our estimates of the total impact on the outcome differences between the treatment and control children, we would have severely understated the total impact. Specifically, this approach would have indicated that the intervention only improved the cognitive (non-cognitive) test scores of a treatment child by 0.06σ (0.07σ). Ignoring spillover effects would have also led us to underestimate the effects for African American children. Accounting for spillover effects enables us to document a significant and large impact on non-cognitive performance that is significantly larger for African Americans than Hispanics.
We view our results speaking to three strands of literature. First, we contribute to the growing number of studies that measure spillover effects from programs and policy changes, designed to improve behaviors and outcomes in various domains such as the labor market ( (Duflo and Saez, 2003), and consumption (Angelucci and De Giorgi, 2009). We contribute to this literature by providing the first evidence on spillover effects from a large-scale early education intervention, shedding light on mechanisms, and estimating the total program impact when accounting for spillover effects.
Second, we contribute to the literature that studies the role of neighborhoods in shaping children's short-and long-term outcomes. The empirical evidence on how neighborhoods affect children comes from observational studies that document correlations between neighborhood characteristics and children's outcomes, as well as studies that use experimental and quasi-experimental data to disen-tangle the causal effects of neighborhood from selection effects. 4 We contribute to this literature in two important ways. Our first contribution is to provide causal evidence on neighborhood effects by exploiting a unique form of exogeneity, which was induced by our intervention. The existing experimental and quasi-experimental evidence on how neighborhoods shape children's outcomes identifies neighborhood effects using data from residential movers (e.g. Katz and Hendren, 2018a and 2018b). The identification of neighborhood effects in this literature relies on instruments such as randomly assigned housing vouchers, quasi-random assignment of immigrants to different neighborhoods, or public housing demolitions as sources of exogenous changes in neighborhood quality. We take a different approach in providing evidence on how neighborhoods influence children. Our identification strategy takes advantage of a large-scale intervention, delivered to a randomly chosen group of families, and exploits the experimentally induced variations-both within and between individuals-in spatial exposure to treated families to provide causal evidence on how neighbors influence children's outcomes. Our second contribution to this literature is to provide insights on the role of neighbors in generating neighborhood effects. Neighborhoods have multiple attributes, which can each influence a child's outcomes, such as school quality, crime rate, neighbors, and so on. Unlike previous estimates on neighborhood effects, we are able to isolate and estimate the effect of neighbors' quality as one of the many channels through which neighborhoods can influence children's development. Specifically, our estimates suggest social interactions with other children in the neighborhood play an important role in the development of children's non-cognitive skills.
Finally, our findings provide important insights for academics interested in modeling the formation of human capital. A growing body of literature develops dynamic models of skill formation to explore the role of various inputs in the production of cognitive and non-cognitive skills. Through structurally estimating such models, this literature has found inputs such as parental ability, home environment, and parental investments to be important determinants in the formation of future skills (e.g. Todd and Wolpin, 2007;Cunha and Heckman, 2007;Cunha, Heckman, and Schennach, 2010;Attanasio, Mehgir, and Nix, 2015;Attanasio et al., 2018). A recent study by Agostinelli (2018) develops a model of skill formation in which peer quality is an input into the production of skills and estimates this model using survey data from adolescents. Whereas Agostinelli exploits variation in cohort composition within schools to deal with the inherent endogeneity in the formation of peer groups in his data, our experimentally induced variation in the spatial exposure to treatments enables us to bypass such identification challenges. Agostinelli's estimates suggest peer quality is an important input in the production of skills (as measured by grades and receptive vocabulary skills) for teenagers. Our results complement Agostinelli's by providing empirical evidence for the role of peer influence at even younger ages. Our estimates suggest peer quality plays an especially important role in producing non-cognitive skills.
The remainder of the paper is structured as follows. Section 2 summarizes key features of our intervention, randomization, and assessments. Section 3 describes our data and presents our estimation strategy. We present our main findings in section 4, where we report our fixed-effects estimates of spillover effects on cognitive and non-cognitive test scores, and explore heterogeneities by race and gender. Section 5 presents the lagged dependent variable (LDV) estimates of the spillover effects and discusses the robustness of our findings to using this alternative identification strategy. We discuss the mechanisms in section 6. In section 7, we estimate the total impacts of CHECC, break down these estimates into direct and indirect effects, and discuss how ignoring indirect effects would bias our estimates. We discuss policy implications and conclude in section 8. The main goals of this large-scale intervention were (i) to examine how investing in cognitive and non-cognitive skills of low-income children 3 to 4 years of age affects their long-term outcomes, and (ii) to evaluate the effectiveness of investing directly in children's education versus indirectly through their parents. To that end, families of over 2,000 children were randomized into either one of the four preschool programs (henceforth "Pre-K") or one of the two parental-education programs (henceforth "Parent Academy") or a control group.
The Parent Academy was designed to teach parents to help their child with cognitive skills, such as counting and spelling, as well as non-cognitive skills, such as working memory and self-control.
The curriculum for Parent Academy was adapted from two effective preschool curricula: Tools of the Mind, which focuses on fostering non-cognitive skills, and Literacy Express, which focuses on improving cognitive skills. 5 The curriculum was delivered to parents in eighteen, 90-minute sessions, which were held every two weeks over a nine-month period. Parent Academy families had the opportunity to earn up to $7,000 per year and could participate until their child entered kindergarten. Earnings were based on parents' attendance, their performance on homework, and their child's performance on the interim and end-of-year assessments. The two Parent Academy treatments differed only in how they administered incentives. Payments made to families in the "Cash" treatment were made via cash/direct deposits, whereas payments made to families in the "College" treatment were deposited into an account that could only be accessed once the child was enrolled in a full-time post-secondary institution.
Besides the Parent Academy, CHECC delivered four preschool programs in which children were treated directly. We refer to these programs as Pre-K treatments. These four treatments differed in their curricula, as well as the duration and intensity of delivery. "Tools," "Literacy," and "Preschool Plus" were nine-month full-day programs delivered during the school year, whereas "Kinderprep" was a two-month half-day program delivered during the summer before a child entered kindergarten. 6 The curriculum for "Tools" was designed based on Tools of the Mind, which focused on improving non-cognitive skills, whereas "Literacy" was based on Literacy Express, which focused on fostering cognitive skills. A new curriculum called "Cog-X" was developed for "Preschool Plus" and "Kinderprep", which emphasized both cognitive and non-cognitive skills. 7

Randomization
Between 2010 and 2013, 2,185 children from low-income families in South Side, Chicago were recruited and randomized into either one of the six treatments or the control group. 8 The randomization took place once per year, at the beginning of each academic year. 9 Some children were 5 See Fryer, Levitt, and List (2015) for more information on curriculum selection. 6 Preschool Plus and Kinderprep also offered a parental component that was much less extensive than the Parent Academies, both in terms of education time and incentives. Parent Academy parents could earn up to $7,000 based on their attendance, their performance on homework, and their child's performance on assessments, whereas Preschool Plus and Kinderprep parents could only earn up to $900 and $200, based merely on their attendance to parental workshops. Preschool Plus and Kinderprep treatments also offered fewer instruction time to parents. Whereas Parent Academy parents could spend 27 hours in parental workshop, Preschool Plus and Kinderprep parents were offered a maximum of 21 and 6 hours of parental education, respectively. The intensity of the preschool component of Preschool Plus was similar to that of "Tools" and "Literacy Express" in terms of instruction time. 7 Fryer et al. (2018) evaluated Preschool Plus and Kinderprep under the assumption that the programs did not affect the outcomes of the control group. Through comparing the performance of treatment and control children, the authors found "Cog-X" treatments significantly improved cognitive scores (by about one quarter of a standard deviation), but failed to find any significant effects on non-cognitive scores. For more information on Pre-K programs, see Fryer et al. (2018). 8 See Appendix A for maps of residential addresses. 9 The exceptions were years three and four of the intervention during which randomization took place twice per year: In the first randomizations, children were randomized into either the nine-month preschool program, the summer Kinderprep program, or the control group; and in the second, a smaller group of families were recruited and randomized into either the summer kindergarten preparation program or the control group. Table 1 combines the randomized during more than one year, mainly to encourage families who were initially placed in the control group to stay engaged with CHECC for assessments, by offering them a chance to participate in future years. 10 The yearly randomization schedule created four cohorts of children we refer to by their year of randomization. 11 Table 1 summarizes the randomization schedule for each year of the program. Notes: The number of children randomized into each treatment group in each year of the intervention is reported. The bottom row presents the number of unique children in each group, over the course of four years.

Assessments
Our key outcome measures are children's performances in cognitive and non-cognitive assessments, which were used to evaluate the programs. These assessments consist of a pre-assessment administered to all incoming students prior to randomization, a mid-assessment between January and February, a post-assessment, which occurred in May, immediately after the school year ended, and a summer assessment at the end of the summer. Besides the assessments that took place during the program year, graduated children were also assessed annually in every April, starting the year after they finished the program. These assessments are referred to as age-out assessments. Appendix B presents the assessment schedule for all four cohorts.
Assessments included both cognitive and non-cognitive components and were administered by a team of trained assessors who were blind to the child's treatment. The cognitive component used a series of nationally normed tests, measuring general intellectual ability and specific cognitive two randomizations. 10 As a result, some children who were in the control group in an earlier year were randomized into a treatment group in later randomizations. In a few cases, a child who was randomized into a treatment group in an earlier year was assigned to a different (or the same) treatment in later randomizations. Overall, 1,675 children were randomized only once, 509 were randomized twice, and one child was randomized in three years. 11 Those children who were randomized in multiple years also appear in multiple cohorts.
abilities such as reading, writing, and mathematics. The non-cognitive component included a combination of subtests measuring executive functions such as working memory, inhibitory control, and attention shifting, as well as a questionnaire completed by assessors, which measured selfregulation in emotional, attentional, and behavioral domains.
3 Data and the Econometric Model 3.1 Data

Construction of outcome variables
Our outcome measures are indices generated from standardized test scores on cognitive and noncognitive assessments. 12 The cognitive assessment included the Peabody Picture Vocabulary Test working memory (Operation Span), and attention shifting (Same Game) and the Preschool Self-Regulation Assessment (PSRA), which is designed to assess self-regulation in emotional, attentional, and behavioral domains. 13 A cognitive index was made up of averaged percentile scores on each cognitive subtest, and a noncognitive index was made up of average percent-correct scores on each non-cognitive subtest. The two indexes were then standardized by the type of assessment (pre-assessment, mid-assessment, etc.), including the entire study population (treated and control) who took that assessment, to obtain a zero mean and standard deviation of one.
To explore the spatial spillovers on both treatment and control children, we construct three samples: a pooled sample, including observations from both treatment and control groups; a control sample, 12 These indices were constructed by Fryer et al. (2015Fryer et al. ( , 2018 for the original evaluations of the programs. 13 Because Blair and Willoughby tests are designed for preschool, a new test was added for assessments that were administered to older children (age-out assessments). For children in kindergarten or older, the Same Game test of Blair and Willoughby was replaced with a variant of Wisconsin Card Sort game, which measures attention shifting for children of that age.
including data from control children; and a treatment sample, including observations from treated children. Our treatment sample pools observations from children who were randomized into any of the programs. We include observations from the baseline to the fourth age-out assessment. Our final control, treatment, and pooled samples include 2,442, 3,074, and 5,208 observations, respectively. 14 Appendix C presents the details regarding how we construct these three samples. Table 2 provides summary statistics on the baseline demographic variables for our pooled sample.
Note the majority (90%) of the children are either African American or Hispanic, and 53% live in families with an annual household income under $35,000. Notes: Summary statistics for baseline demographic variables are presented. For education levels, Some high school but not diploma includes parents with a GED or high school attendance without a diploma, College degree includes associate's, bachelor's and master's degrees, Less than high school includes an education level below 9th grade or no formal schooling, and Other includes vocational/technical or other unclassified programs. Standard deviations are reported in parentheses.

Addresses and neighbor counts
To estimate the spatial spillover effects from the intervention, we calculate the number of treated neighbors of a child at a given time and use it as a measure of spatial exposure to treatments.
To do so, we start by calculating commuting distances between the home locations of all pairs of 14 Note that the number of observations in the pooled sample is smaller than the sum of the number of observations in our control and treatment samples. The reason is that in a few cases, when a child was first randomized into the control group and was placed into a treatment group in later randomizations, the pooled sample only includes observations that took place after the child was randomized into treatments. See Appendix C for more information.
children who were randomized during the intervention. 15 Commuting distances are calculated by considering the street network structure and its restrictions (e.g., one-way roads, U-turns, etc.) and finding the closest driving distance between each pair. The average travel distance between a pair of children in our sample is 8.52 kilometers (std. dev.= 8.07), and 99.8% of the sample resides within 60 kilometers of each other. Figure 1 presents a histogram of travel distances between home locations of all children who were randomized during the intervention.
We define a pair as neighbors if the commuting distance between the two is less than "r" kilometers, and we call "r" the neighborhood radius. We conduct our analysis for various values of neighborhood radii. We then calculate the number of treated (N treated i,t|r ) and control (N control i,t|r ) neighbors of each child i at the time of her assessment t, and define the total number of CHECC neighbors of i as . Note that as more children are randomized into treatment and control groups over the four years of the intervention, the number of treated and control neighbors vary over time. 16 Table 3   15 Distances were calculated using the ArcGIS OD Cost Matrix Analysis tool. 16 Because more children were receiving the treatments over the four-year span of the intervention, and a neighbor who was previously in the control group in an earlier randomization might be assigned into a treatment group in later years, N control i,t|r can both increase or decrease over time. However, N treated i,t|r can only increase over time, because no child who was already treated could be assigned into the control group in later years.

Econometric model
We exploit three unique features of our data to estimate the spillover effects. Although the experimentally induced variation in our exposure measure serves as an important feature, which we exploit for identification, our estimation strategy does not rely on it exclusively. Given our limited sample size and the fact that the intervention was not designed to measure spatial spillovers, our exposure measure could be correlated with individual-or neighborhood-level unobservable characteristics. Therefore, we exploit the panel nature of our data to provide clean estimates of the spillover effects. The above three properties allow us to estimate the spillover effects using within-individual variations in our exposure measure (N treated i,t|r ) through a fixed-effects specification. This technique uses the variations in spatial exposure over time and controls for any unobserved time-invariant individual-, family-, or neighborhood-level characteristics that might be We estimate spatial spillover effects from CHECC, using an individual fixed-effects specification of the form where Y i,t is the standardized cognitive or non-cognitive test score of a child i on test t, N treated i,t|r represents the number of treated neighbors of i at time t as previously defined, and N total i,t|r represents (a) r=1 km both within-individual and between-individual variation in spatial exposure to treated neighbors to estimate spillover effects. As we will further discuss in Section 5, our findings are robust to using this alternative specification.

Main findings
We estimate spillover effects for the neighborhood radii of 3, 5, and 7 kilometers. As Figure 2 suggests, when neighborhood is defined too narrowly, the variation in N treated i,t|r becomes too small, limiting our power to estimate the effects. Therefore, we start with a neighborhood radii of 3 kilometers and larger, which provides us with enough variation in N treated i,t|r to estimate the effects.
Arguably, these choices of neighborhood radii are economically relevant. According to the National Household Travel Survey, the average commuting distance to school for a 6 to 12 year-old child is about 6 kilometers (3.6 miles). 18 The average travel time from home to work for a Chicago Heights resident is estimated to be 26.1 minutes (US Census Bureau statistics), which translates to about 21 kilometers for a speed of 30 miles per hour. 19 Because schools and workplaces provide natural interaction spaces for children and their parents, we can reasonably assume our choices of neighborhood radii are relevant distances within which social interactions can generate spillovers. Table 4 presents estimated β 1 's from equation (1)  Columns (2) and (3) parse the effects on cognitive scores by treatment assignment. These estimates reveal that both treatment and control children benefit from living close to treated families. While the control group benefits slightly more than the treatment group from cognitive spillover effects, the difference is not significant at conventional levels. 21 Columns (5) and (6) report the spillover effects in non-cognitive scores by treatment assignment. These estimates illustrate that the treatment and control children both benefit from non-cognitive spillovers. The estimated spillover effects on non-cognitive scores on the control and treatment groups are very similar and are not significantly different across the two groups. 22 These findings are robust to the choice of neighborhood radius, r. 23 In sum, we document significant positive spillover effects on both cognitive and non-cognitive test scores and find the effects are significantly larger for non-cognitive scores. 24 Notes: Spillover effects from each additional treated neighbor (β 1 ) estimated from equation (1) are presented. Columns 1-3 (4-6) represent the average spillover effects from an additional treated neighbor on a child's standardized cognitive (non-cognitive) score. Robust standard errors, clustered at the census-block-group level, are in parentheses; *** p<0.01, ** p<0.05, * p<0.1

Spatial fade-out
A closer look at the estimated β 1 's reported in Table 4 suggests an important spatial pattern: the spillover effects from an additional treated neighbor becomes smaller as we broaden the neighborhood radius from 3 to 7 kilometers. To further explore this pattern and shed light on the relationship between spillover effects and distance, we provide Figure 3, which shows the estimated β 1 's for a broader range of r's. 25 Note that the effects on both cognitive and non-cognitive scores operate very locally. As we increase the neighborhood radius, the marginal spillover effects from an additional treated neighbor monotonically decrease. Because a larger neighborhood radius corre- 22 The p-values from the Wald test of equal β ncog 1 for treatment and control group for neighborhood radii of 3K, 5K, and 7K meters are 0.80, 0.84, and 0.78. 23 Appendix D breaks down these effects by subtests and explores which components of the cognitive/non-cognitive index generate the effects. 24 In Appendix E, we explore the potential role of sorting by estimating the effects using a subsample of children who attended the majority of assessments. Our evidence suggests that selection is not an important factor in generating our results. 25 The point estimates are reported in Appendix F. sponds to a longer average distance to neighbors, the negative relationship betweenβ 1 and r implies that, as the distance between a child and her treated neighbor grows, the spillover effect on both cognitive and non-cognitive scores weakens. Specifically, the average spillover effect on a child's non-cognitive score, from a treated neighbor within a 3-kilometer radius is about twice as large as the effect from a treated neighbor who resides within a 7-kilometer radius (0.0069σ vs. 0.0033σ).
Similarly, the average effect on the cognitive score from an additional treated neighbor who lives within 3 kilometers of a child is about twice as large as the effect from a treated neighbor who resides within a 7-kilometer radius (0.0033σ vs. 0.0018σ). In summary, we find that the spillover effects on both cognitive and non-cognitive scores are localized and decrease as the distance to a treated neighbor's home increases.

Pooled
Control Treated Figure 3: The spillover effect from having an additional treated neighbor on a child's standardized cognitive and non-cognitive scores, as functions of neighborhood radius.

Heterogeneous effects
Fryer, Levitt, and List (2015) evaluated the parent-education component of CHECC (Parent Academy) by comparing the outcomes of treatment and control children, under the assumption that treatments did not induce externalities to the control group. They found that the assignment to Parent Academies increases a child's non-cognitive scores by 0.203σ, but does not significantly impact cognitive scores. Moreover, the authors reported positive treatment effects on cognitive and non-cognitive scores for Hispanics, but did not find any significant treatment effects on African American children. Parent Academy was also reported to have slightly larger effects on girls than boys, although the gender differences were not significant. Motivated by the heterogeneity in treatment effects from the Parent Academy component of the intervention reported in Fryer, Levitt, and List (2015), we investigate whether children of different races (or gender) benefit differently from spillover effects. We do so by estimating equation (1), separately by race and gender.
Since African Americans and Hispanics make up over 90% of our sample, our analysis on heterogeneity along race focuses on these two groups. Panel (a) of Table 5 and Figure 4 presentβ 1 's separately for African American and Hispanic children. Comparing the effects across races, we find no significant differences in cognitive spillover effects between Hispanics and African Americans. In contrast to the effects on cognitive scores, spillovers on non-cognitive scores are significantly larger for African Americans than Hispanics. 26 On average, an additional treated neighbor increases the non-cognitive scores of an African American child by about two to three times as much as a Hispanic child. For instance, an additional treated neighbor within a 3-kilometer radius increases the non-cognitive score of a Hispanic child by 0.0045σ, whereas it increases an African American child's non-cognitive score by 0.0100σ.
Panel (b) of Table 5 and Figure 4 present the estimated effects by gender. Overall, boys benefit more than girls from both cognitive and non-cognitive spillovers. However, these gender differences are not statistically significant at the conventional levels. 27 In section 6, we shed light on the mechanism through which spillover effects operate, and discuss a potential source of heterogeneity in spillover effects along race and gender lines.
Note that our estimates cannot be directly compared with the heterogeneity in treatment effects reported by Fryer, Levitt, and List (2015) due to two main differences between our samples. First, unlike Fryer, Levitt, and List (2015) who report heterogeneous effects from the Parent Academy, our analysis uses data from all CHECC programs and considers heterogeneity in spillover effects on 26 The p-values from Wald tests of the null of equal β1 for Hispanics and African American children's cognitive (non-cognitive) test scores (in the pooled sample) are 0.15 (0.07), 0.10 (0.04), and 0.17 (0.07) for neighborhood radii of 3, 5, and 7 kilometers, respectively. 27 The p-values from Wald tests of the null of equal β1 for cognitive (non-cognitive) test scores for boys and girls (in the pooled sample) are 0.12 (0.47), 0.13 (0.50), and 0.10 (0.47) for neighborhood radii of 3, 5, and 7 kilometers, respevtively.  results. We formulate this alternative specification as where Y i,t , N treated i,t|r , and N total i,t|r are defined as previously. We control for the lagged cognitive and non-cognitive test scores through Y i,t−1 , and include census-block group (σ b ) as well as time and cohort fixed effects (δ t and µ c ). X i represents a vector of time-invariant characteristics including gender, race, and age at the time of the baseline assessment. Under this specification, β 1 reflects the average spillover effect from an additional treated neighbor who lives within radius r on a child's standardized cognitive or non-cognitive test score. Table 6 presents β 1 's, estimated from equation (2), for three different neighborhood radii: 3, 5, and 7 kilometers. The standard errors, reported in parentheses, are clustered at the census-block-group level to allow for common error components within geographical units. Consistent with the results presented in section 4.1, we find significant spillover effects on both cognitive and non-cognitive test scores, with larger effects on non-cognitive scores. 28 The comparison between the effects on treated and control children leads to a conclusion similar to the one we reported under the fixed-effects specification: We do not find any significant differences in spillover effects between children who were assigned to the treatment and control groups. 29 The estimates from the two models are also similar in magnitude: An additional treated neighbor within 3 kilometers of a child is estimated to increase that child's cognitive score by 0.0033σ under the fixed-effects specification and by 0.0042σ under the LDV model. Similarly, an additional treated neighbor residing within 3 kilometers is estimated to increase the child's non-cognitive score by 0.0069σ and 0.0070σ under the fixed-effects and LDV specifications, respectively. Notes: Spillover effects from each additional treated neighbor (β 1 ) estimated from equation (2) are presented. Columns 1-3 (4-6) represent the effects from an additional treated neighbor on a child's standardized cognitive (non-cognitive) score. Robust standard errors, clustered at the census-block-group level are in parentheses; *** p<0.01, ** p<0.05, * p<0.1  Table 7 reports the spillover effects by race and gender estimated from the LDV specification (equation (2)). Focusing on race (panel (a)), we find no significant differences in spillover on cognitive scores between African Americans and Hispanics. 30 Similar to the fixed-effects estimates, the LDV estimates of non-cognitive spillover effects reveal a large and significant racial gap, with

Heterogeneous effects
African Americans benefiting about three to four times as much as Hispanics from their treated neighbors. 31 Table 7 reports the effects from the LDV specification by gender, and reveals that in general, boys tend to benefit slightly more than girls from spillover effects. However, the gender 30 The p-values from Wald tests of the null hypothesis of equal spillover effects on cognitive scores (β1) for Hispanics and African Americans in the pooled sample are 0.59, 0.52, and 0.50 for neighborhood radii of 3, 5, and 7 kilometers, respectively. 31 The p-values from Wald tests of the null hypothesis of equal spillover effects on non-cognitive scores (β1) for Hispanics and African Americans in the pooled sample are 0.001, 0.005, and 0.020 for neighborhood radii of 3, 5, and 7 kilometers, respectively.  Notes: Spillover effects from an additional treated neighbor on cognitive and non-cognitive test scores, estimated from LDV specification (equation (2)). Panel (a) presents the effects, separately for African American and Hispanic receiving children. Panel (b) reports the effects separately for boys and girls. Robust standard errors, clustered at the census-block-group level, are in parentheses; *** p<0.01, ** p<0.05, * p<0.1 differences are not significant at conventional levels. 32 6 Exploring the Mechanisms

Parent Academy versus Pre-K
Since the intervention offered education programs for both children and parents, one might expect the spillover effects to operate though the social network of children via direct interactions between them, or indirectly through their parents' social networks. To shed light on which channel generates stronger effects, we start by comparing spillovers from treated neighbors who were assigned to the parent-education programs (Parent Academy neighbors) with the effect from those who were assigned to the preschool programs (Pre-K neighbors). If spillovers mainly operate through parents' social network, then we might expect larger effects from Parent Academy neighbors than from Pre-K neighbors, because Parent Academy treatments involved parents more directly and more intensely  To compare spillovers from Parent Academy and Pre-K neighbors, we estimate the fixed-effects and LDV specifications of the following forms: 33 N P arent i,t,r and N P reK i,t,r represent the number of Parent Academy and Pre-K neighbors of a child i who reside within a distance r from i at the time of her assessment t, and N total i,t|r , Y i,t , Y i,t−1 , X i , γ i , σ b , µ c and δ t are defined as previously. To simplify the analysis and retain statistical power, we construct N P reK i,t,r and N P arent i,t,r by pooling neighbors who were assigned to any of the preschool programs as Pre-K neighbors, and pooling those who were assigned to any of the two parent-education programs as Parent Academy neighbors. Under the above specifications, β p reflects the average spillover effect from an additional Parent Academy neighbor, holding N P reK i,t,r and N total i,t|r constant. In other words, β p represents the average effect of substituting a control neighbor with a Parent Academy neighbor.
Similarly, β c represents the average spillover effect from an additional Pre-K neighbor on a child's test scores.
Note that a child i may benefit from a Parent Academy neighbor k through two channels. The 33 In a few cases in which a treated neighbor k was first assigned to a Parent Academy (Pre-K) treatment and assigned to Pre-K (Parent Academy) in later years, k is counted as a Parent Academy (Pre-K) neighbor for the observations prior to the second randomization, and as a Pre-K (Parent academy) neighbor for the observations following the second randomization. first channel is the parents' social networks: k's parents may influence the behavior and decisions of i's parents', which may in turn shape i's development. Such effects can occur through information externalities (i.e., k's parents share their acquired knowledge from Parent Academy with i's parents) or preference externalities between parents. The second channel is peer influence: if Parent Academy improves k's outcomes, then child i might benefit from direct interactions with child k. The benefits from a Pre-K neighbor, however, are likely to spill over mainly through direct interactions between children (peer influence) because parents are not the main target of the Pre-K treatments. 34 Thus, althoughβ p might reflect spillovers through both the parents' and the child's social networks,β c is more likely to reflect an effect that is mainly driven by direct interactions between children. Table 8 reports estimated β p and β c for neighborhood radii of 3, 5, and 7 kilometers from the fixedeffects and LDV specifications. Focusing on non-cognitive scores, the estimates from both models suggest larger spillover effects from Pre-K neighbors than Parent Academy neighbors (β p <β c ). 35 According to the fixed-effects estimates, an additional Pre-K neighbor within a 3-kilometer radius of a child increases her non-cognitive score by 0.0099σ, whereas an additional Parent Academy neighbor only induces a 0.0045σ increase in the non-cognitive test score. Similarly, the LDV estimates suggest an additional Pre-K neighbor within a 3-kilometer radius of a child increases her non-cognitive test score by 0.0108σ, whereas an extra Parent Academy neighbor within the same radius induces only a 0.0017σ increase in her non-cognitive test score. The larger spillovers from Pre-K neighbors than from Parent Academy neighbors suggests that direct social interactions between children (rather than between parents) plays an important role in generating the noncognitive effects.
Unlike the effects on non-cognitive scores, we find no significant differences in spillover effects on cognitive scores from Parent Academy and Pre-K neighbors. 36

Heterogeneity in neighborhood-level social interactions
Our estimates presented in Sections 4 and 5 suggested that non-cognitive spillover effects are significantly larger for African Americans than Hispanics. We also found spillover effects to be 34 As previously described previously in Section 2, two out of the four Pre-K treatments (Preschool Plus and Kinderprep) had parental-education components that were not incentivised as heavily, and were much less intensive than the one offered to Parent Academy families. Appendix G examines the spillover from Pre-K neighbors who were randomized to preschool programs with and without the parental component and shows that our conclusions are not sensitive to pooling these two treatments together. 35 The p-values for the fixed-effects estimates from Wald tests of the null hypothesis βp = βc against βp = βc for r=3 km, r=5 km and r=7 km equal 0.004, 0.006, and 0.016, respectively. The corresponding p-values from the LDV estimates are 0.004, 0.010, and 0.03. 36 For the fixed-effects estimates, the p-values from Wald tests of the null hypothesis βp = βc against βp = βc for r=3 km, r=5 km and r=7 km are 0.82, 0.43, and 0.34, respectively. The corresponding p-values from the LDV estimates are 0.76, 0.14, and 0.14. Notes: Columns 1 and 3 (2 and 4) report the average effect of an additional Parent Academy (Pre-K) neighbor who resides within distance r of a child, on her standardized cognitive and non-cognitive scores. Panel (a) presents the estimates from the fixed-effects specifications (equation 3), and panel (b) reports LDV estimates (4). Robust standard errors, clustered at the census-block-group level, are in parentheses; *** p<0.01, ** p<0.05, * p<0.1 larger for boys than girls, although the gender difference was not significant at the conventional levels. Given our evidence suggesting direct interactions with peers at the neighborhood level might be an important mechanism in generating non-cognitive spillover effects, one might hypothesize that the racial and gender differences in non-cognitive spillover effects originate from racial and gender differences in social interactions within neighborhoods.
We explore this idea using data from the National Longitudinal Study of Adolescent Health survey (Add Health), 37 which includes measures of social interactions within neighborhoods. Add Health is a longitudinal study of a nationally representative sample of adolescents in grades 7 to 12 in the United States. Using the public-use data from Add Health, we explore racial and gender differences in variables that reflect children's social interactions within neighborhoods. Table 9 presents our findings. Consistent with our hypothesis, we find African American children are significantly (at p < 0.001) more likely than Hispanics, and boys are significantly (at p < 0.001) more likely than girls, to (i) know most people in their neighborhood, (ii) stop on the street to talk to someone from the neighborhood, and (iii) use recreation facilities in the neighborhood. Although these data speak to a slightly older age group than our subjects, we believe that these findings provide suggestive evidence that the higher level of neighborhood-level social interactions for African Americans (and for boys) is a possible cause of the larger non-cognitive spillover effects on this subgroup. Yet, more research is necessary as we view this result as suggestive. 37 The AddHealth is a project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. Information on how to obtain the Add Health data files is available on the Add Health website (http://www.cpc.unc.edu/addhealth). No direct support was received from grant P01-HD31921 for this analysis.

Spillovers and age
We might expect peer influence to play a more salient role in generating spillover effects for older than for younger children. This is because at younger ages children depend on their parents to put them in contact with peers; thus, their friendship network tends to be limited to children of their parents' acquaintances (Hartup, 1984). As children mature into middle childhood and enter school, their interactions with peers increase dramatically and start growing independently from their caregivers' social network (Higgins and Parsons, 1983). Previous research has documented that children's social contact-both in terms of the number of peers as well as the time spent with peers-substantially increases between the ages of 6 and 12 (Cook and Cook, 2009). Following this line of reasoning, one might expect the peer-influence channel of the spillover effects to play a smaller role in earlier years when peer effects at the neighborhood level are more limited, and to become more pronounced as a child grows older. By exploring how the magnitude of cognitive and non-cognitive spillovers vary with age, we can gain deeper insight into the mechanisms generating the effects. We examine this idea by including an interaction term in equations 1 and 2: where Age i,t represents the age of child i (in months) at the time of assessment t, and other arguments are defined as previously. Under the above specifications, β 2 reflects how spillover effects change as a child grows. The fixed-effects and LDV estimates of β 2 are reported in Table   10. Note that under both specifications, estimated β 2 's for non-cognitive outcomes are positive and significant, suggesting older children benefit more than younger children from non-cognitive spillover effects. According to our fixed-effects (LDV) estimates, on average, the spillover effect on the non-cognitive test score of a 6-year-old child from treated neighbors within a 3-kilometer radius is 0.13σ (0.14σ) larger than the corresponding effect on a 5-year-old child. Although the positive relationship between non-cognitive spillovers and age cannot be taken as causal evidence for mechanisms and should be interpreted with caution, it is consistent with the hypothesis that direct peer influence is a likely channel in generating non-cognitive spillover effects. The estimates of β 2 for cognitive spillovers are negative, which can be regarded as a decline in cognitive spillover effects with age. However, these point estimates are generally not significant at conventional levels.
Empirical results presented in Sections 6.1, 6.2 and 6.3 provide suggestive evidence that peer influence at the neighborhood level serves as an important mechanism in generating non-cognitive spillover effects. Notes: Estimatedβ2's from equations (5) and (6) are presented. The point estimates represent how spillover effects from an additional treated neighbor change with a child's age (in months). Robust standard errors, clustered at the census-block-group level, are reported in parentheses; *** p<0.01, ** p<0.05, * p<0.1

Parental Investment
Spillover effects can also operate through influencing parental decisions, which affect children's development. Parents might learn from their neighbors about returns to investments and the most productive forms of investments in their children and adjust their choices accordingly. Our data include a self-reported measure of investment concerning parents' decision to enroll their child in preschool programs (other than CHECC). This variable was collected through a survey completed by parents, which was administered at the end of each program year (at the time of the children's post-assessment). 38 38 For the last cohort of families who were randomized in the program, this information was also collected from parents at the time of pre-assessment (in addition to the post-assessment).
We start by exploring whether this self-reported measure of parental investment has any predictive power regarding children's cognitive and non-cognitive performance. That is, whether enrolling one's child in a preschool program (other than CHECC) is associated with the child's cognitive or non-cognitive development. We do so by estimating an LDV model of the following form: 39 where Z i,t is a binary variable indicating whether parents of a child i reported enrolling i in a preschool program during the school year prior to t. 40 All other terms are defined as previously. Under the above specifications, κ reflects whether a parent's decision to enroll her child in a preschool program is correlated with the child's skill development (Y i,t ). 41 Table 11 presents the estimated κ from equation (7). The estimates suggest a parent's decision to enroll her child in other (non-CHECC) preschool programs is significantly correlated with cognitive development and increases the child's cognitive test score by 0.134σ, whereas the effect on non-cognitive test-scores is not significant at conventional levels. In the next step, we explore whether spillover effects occur on a parent's decision to enroll her child in other programs. Our analysis of the spatial spillover effects on cognitive and non-cognitive test scores exploited the panel nature of our outcome variables, allowing us to estimate the causal spillover effects using individual fixed-effects or lagged dependent-variables specifications. Unfortunately, for a large majority (over 90%) of our sample, the data on parental decisions to enroll children in non- 39 Since for the vast majority (over 90%) of our sample, we observe the investment measure only once, we cannot estimate their effects on test scores using within-individual variations. 40 Controlling for N treated i,t|r and N total i,t|r or the child's treatment assignment does not change our point estimates very much. 41 Note that since Zi,t is not assigned randomly, we cannot interpret κ as the causal effect of preschool enrollment on children's development without making additional assumptions. The purpose of this excercise is to merely explore whether this measure of parental investment is associated with child's development rather than to establish a causal relationship.
CHECC programs were collected only once (at the time of children's post-assessment). Therefore, we cannot use the previous identification strategies to estimate the causal spillover effects on parents' investment decision. Instead, we rely on the following OLS specification to explore this channel: where Z i,t represents parents' investment decision, and other variables are defined as previously.
Under the above specification, β 1 represents the relationship between having an additional treated neighbor residing within distance r and a parent's decision to enroll her child in a non-CHECC preschool program. Table 12 presents estimated β 1 's from equation 8. Interestingly, our estimates suggest positive and significant spillover effects on this measure of parents' investment decision.
Each additional treated neighbor residing within a 3-kilometer radius of a child's home increases the likelihood of the child's parents enrolling her in a preschool program by 0.55 percentage points.
Given our previous finding that enrolling a child in a non-CHECC program significantly improves her cognitive performance, this result provides suggestive evidence that influencing the parental decision to enroll her child in a preschool program is an important channel through which cognitive spillover effects operate. The above finding should be interpreted with caution for two reasons. First, our measure of parental investment decision is self-reported, and therefore is not an objective measure of the actual choices.
Second, given that the structure of our survey data prevents us from using individual fixed-effects or LDV methods in estimation, interpreting β 1 as the causal spillover effect on parents' decision would require an additional assumption that (conditional on N total i,t|r , X i , σ b , µ c and δ t ) our exposure measure-N treated i,t|r -is uncorrelated with unobserved individual-level characteristics. Recognizing these two caveats, we believe our results provide suggestive evidence that cognitive spillover effects might operate-at least partly-through influencing parents' decisions to enroll their child in a preschool program.

Total Impact
After measuring positive spillover effects from the programs delivered at CHECC, we now turn to estimating the total impact of the intervention, accounting for these indirect effects. Beyond estimating the total impact, we also (i) disentangle the direct and indirect (spillover) effects from the intervention, and (ii) estimate the size of bias that would arise if we were to ignore spillovers.
Before presenting our evaluation strategy, we should emphasize three key features of this exercise.
First, whereas CHECC offered multiple education programs to both parents and children, the aim of this exercise is not to separately evaluate each program. Instead, we provide an overall evaluation of the intervention as a whole by pooling all treatments and accounting for spillovers. Second, given that we estimate spillovers using panel data over multiple years, our total effect is also based on observations over multiple years, starting at the time of randomization and terminating four years after the program ends for the cohort. Therefore, our analysis provides an average estimate of the total impact of CHECC over time, which includes the immediate as well as the longer-run effects. 42 Finally, to fix ideas and simplify the presentation of our results, we set the neighborhood radii to 3 kilometers. Appendix H presents our estimates of the total impact for a broader range of neighborhood radii and discusses the robustness of our findings to varying r.
The total impact of the intervention (T otal) on a child i who was randomized into one of the treatments (Parent Academies or Pre-K) can be expressed as the sum of the direct treatment effect We evaluate the total impact of the intervention by estimating the following LDV model, using our pooled sample, which includes observations from both the treated and control children. We focus on the LDV specification for this estimation, because unlike the fixed-effects model, it allows us to exploit between-individual variation to estimate the direct time-invariant treatment effect: T i is a treatment indicator, which equals 1 if i was assigned to a treatment group, and 0 otherwise, , N total i,t|r , X i , δ t , µ c , and σ b are defined as previously. We include an interaction term (T i × N treated i,t|r ) to allow for different spillover effects on treatment and control 42 For this reason, our estimates of the total impact cannot be directly compared to the ones reported in Fryer, Levit, and List (2015), which are based on test scores from the assessments administered immediately after the programs ended.
children. Under this specification, θ represents the average direct effect of the intervention on a treatment child (Direct), β 1 represents the average spillover effect from an additional treated neighbor on a control child (SC 1 ), and β 1 + λ represents the average spillover effect from an additional treated neighbor on a treatment child (ST 1 ). Assuming linearity, the average total spillover effect from all treated neighbors on a control child (SC N ) can be expressed as N treated r ×β 1 , where N treated r represents the average number of treated neighbors who reside within distance r of a child. 43 Likewise, the average total spillover effect from all treated neighbors on a child who was randomized into a treatment (ST N ) can be expressed as N treated r × (β 1 +λ). Therefore, the total impact of the intervention on a treatment child (T otal) is the sum of the direct and the spillover effects:T otal =θ + N treated r × (β 1 +λ). Table 13 reports the estimated coefficients from equation (9) for r = 3 kilometers. The first two columns present the estimates for the pooled sample, and the last four report estimates separately for African Americans and Hispanics. Focusing on the pooled sample, we find the average direct effects of being randomly assigned to a treatment group on a child's standardized cognitive and non-cognitive scores are 0.11 and 0.05, respectively. The average total spillover effects on a control child 's standardized cognitive and non-cognitive scores (SC N ) are estimated to be 0.75 and 1.25.
The corresponding spillover effects on a treatment child are 0.71σ and 1.27σ. The total impact of being assigned to treatment (including both the direct and spillover effects) on a child's standardized cognitive and non-cognitive test scores is estimated to be 0.82 and 1.32, respectively. Note that the total spillover effects on both cognitive and non-cognitive scores of treatment children are larger than the direct treatment effects, suggesting a large portion of the total impact is due to the network effects that emerge from interactions with other treated individuals. This finding implies that if one were to treat a single child in isolation, the average cognitive and non-cognitive treatment effects would be about ( 0.82 0.11 ≈) 7 and ( 1.31 0.05 ≈) 26 times smaller than the estimated impacts in the presence of spillovers from other treated children.
The comparison across the last two rows of Table 13 shows that if we were to ignore the spillover effects, we would severely underestimate the total impact of the intervention. For example, if we were to disregard the spillover effects, we would have wrongly concluded the intervention only induced a 0.06σ and 0.07σ increase in cognitive and non-cognitive test scores, respectively. 44 These 43 Appendix I shows these findings do not change much if we relax the linearity assumption and allow for quadratic and cubic terms. 44 Note these estimates are considerably smaller than the estimated effects of the Parent Academy arm of the intervention previously reported in Fryer, Levitt, and List (2015). Fryer, Levitt, and List (2015)-who ignore spillover effects-report that Parent Academies increased cognitive and non-cognitive test scores by 0.119σ and 0.203σ, by the end of the program year. Our T otal Standard estimates of the whole intervention (which ignore spillover effects to the control group) on cognitive and non-cognitive scores are 1.99 (=0.119/0.06) and 2.9 (=0.203/0.07) times smaller than the reported effects by Fryer, Levitt, and List (2015). Two important factors cause the difference in these two sets of estimates: First, Fryer, Levitt, and List (2015) focus only on one treatment arm of the intervention (Parent Academies), whereas we estimate the impact of CHECC as a whole, and our sample includes observations from all children who were randomized over the four years of the intervention. Second, Fryer, Levitt and List (2015) focus estimates on cognitive and non-cognitive scores are considerably smaller than our estimated effects, which account for spillovers to the control group (T otal).
We report our estimates of impacts on African Americans and Hispanics in columns (3)-(6) of Table   13. 45 The first observation is that the direct effects of being randomized into a treatment group (Direct) on both cognitive and non-cognitive test scores are larger for Hispanics than African Americans, although the difference is only significant for cognitive skills. 46 While the racial difference in cognitive spillovers are small and insignificant (p > 0.40), non-cognitive spillovers are significantly larger for African Americans (p < 0.05). Overall, the intervention increases the cognitive scores for African American and Hispanic children who were randomized into treatment by 0.75σ and 1.38σ, which are not significantly different from each other (p = 0.45). By contrast, African American children who were offered the chance to participate in one of the programs gain more than their Hispanic counterparts in non-cognitive skills as a result of the intervention. The average total program impact (T otal) on the non-cognitive test score of an African American treatment child is on the short-term treatment effects and compare test scores from assessments that were administered immediately at the end of a program year. Our analysis, on the other hand, uses data from up to four years after a program ends, meaning we rely on both short-and longer-term outcomes to estimate the impact. In fact, the treatment effects wear off and the differences in test scores between the treatment and control groups become smaller over time, which suggests our estimates of T otal Standard should be smaller than the effects reported in Fryer, Levitt, and List (2015). 45 In our sample, on average, an African American (Hispanic) child has 129 (237) treated neighbors within 3 kilometers from her home. 46 The corresponding p-values for cognitive and non-cognitive scores are 0.006 and 0.17.
1.99σ, which is significantly larger than 0.93σ, the corresponding effect on a Hispanic treatment child (p = 0.07). Importantly, disregarding spillover effects and evaluating CHECC by naively differencing the outcomes between control and treatment children results in a quite conservative representation of findings. Our estimates, presented in the last row, suggest that ignoring spillovers would have led us to conclude that whereas Hispanic children benefited as a result of the intervention, African Americans gained nothing. Note that this conclusion was in fact the one Fryer, Levitt, and List (2015) reached in their evaluation of the parent academy intervention.

Conclusions
The traditional approach to evaluate early education programs is to assign children to treatment and control groups randomly and report the difference in mean outcomes. This approach rests on many assumptions, including that the program does not indirectly alter the outcomes of the control group-that is, there are no spillover effects. More broadly, any empirical exercise that evaluates the benefits and costs of a program maintains this general structure.
Using a standard approach from the literature to estimate spillover effects, we explore their empirical relevance in a large-scale early childhood intervention. In doing so, we document spillovers that are economically significant and much larger than we anticipated: on average, the intervention increases cognitive and non-cognitive test scores of a control child by 0.75σ and 1.25σ, respectively.
Beyond the main spillover effects, we observe interesting heterogeneities. For example, we find that non-cognitive spillover effects are larger for African Americans than Hispanics. In addition, our evidence suggests that non-cognitive spillovers are more likely to operate through children's rather than parents' social networks. We also find suggestive evidence that cognitive spillover effects are generated through influencing the parents' decision to enroll their child in an alternative form of treatment-an outside preschool program.
Our findings speak to several literatures. As aforementioned, we contribute to the growing number of studies that model human capital formation by estimating effects from programs and policy changes. We draw from, and advance, the literature on measuring the role of neighborhoods in shaping children's short-and long-term outcomes to complete our empirical exercise. Given the importance of non-cognitive skills in children's future labor market and educational outcomes Specifically, our results suggest that interventions that promote social interactions both within participants and between participants and non-participants are likely to generate larger positive externalities on non-cognitive skills.
Our work also speaks to policymakers interested in the science of scaling programs (see, e.g., Al-Ubaydli et al., 2017b). As experimentalists, we have focused almost exclusively on how best to generate data to explore intervention effects and disentangle mechanisms. Yet, what has been lacking is a scientific understanding of how to make an optimal use of the research insights generated.
In particular, how should we use the experimental insights for policy purposes? Our findings suggest that traditional measures of early education impacts, which ignore externalities, are likely to be too pessimistic when such programs are taken to scale. In this way, our findings suggest that ignoring the spillover effects can result in a severe underestimation of the program impact, leading to fewer programs being taken to scale than is optimal. Of course, this need not be a general result, as it is possible that in some cases those treated suppress outcomes of those in the control group. More work is necessary in order to detail the nature and extent of scale-up effects when moving from scientific insight to policy. Notes: Superscripts are cohort identifiers (randomization years). Pre= pre assessment; Mid= mid assessment; Post= post assessment; SL= summer loss assessment; AOx= age-out assessment x years after the treatment ended

C Details on Constructing the Samples
Here we present how we treat observations from children who were randomized in multiple years in constructing our panel data set for the control, treatment, and pooled samples.
We follow two rules in constructing the control sample: (i) For those control children who were randomized into treatments in later years, we only keep the observations that took place before their treatment started; (ii) for those who were randomized into the control group in more than one year, we only keep the observations corresponding to their first randomization.
In construction of our treatment sample, we follow two rules: (i) For those treatment children who were first randomized into the control and later into a treatment group, we only keep the observations after their treatment started; (ii) for those who were randomized twice and both times into a treatment group, we only keep the observations corresponding to the first randomization.

A4
Electronic copy available at: https://ssrn.com/abstract=3385107 Finally, we use the following three rules in constructing our pooled sample: (i) For children who were randomized twice and both times into the control group, we only keep the observations that correspond to the first randomization; (ii) for children who were randomized twice and both times into a treatment group, we only keep the observations corresponding to their first randomization; and (iii) for those who were randomized twice, the first time into the control group and the second time into a treatment group, we only keep the observations corresponding to the second randomization.
Three factors can result in missing an observation for a child in our control sample: (i) The child was absent on the assessment day; (ii) the child was moved to a treatment group in a later randomization and thus her outcomes (for the times after she had entered the treatment groups) are not included in the sample; or (iii) the child belongs to later cohorts for which the assessment is taking place at a later date (April 2018 or after). Similarly, an observation from the treatment sample would be missing if (i) the child was absent on that assessment day; (ii) the child was previously in the control group and thus her outcomes (for the times before she entered the treatment group) are not included in the sample; or (iii) corresponding assessment is taking place at a later date (April 2018 or later). To explore which components of the cognitive and non-cognitive measures are more important in generating the spillover effects, we estimate the following fixed-effects model:

D Sub-tests
where Y k i,t is the standardized score of a child i at time t on subtest k, and N treated i,t|r and N total i,t|r represent the number of treated neighbors and total number of neighbors, as previously defined.
We include the time and individual fixed effects, and cluster standard errors at the census-blockgroup level. Under this specification, β 1 represents the average spillover effect from an additional treated neighbor who resides within a radius r of a child. As point estimates for β 1 presented in Table D

E Estimated Effects on a Restricted Sample
Sections 4 and 5 present the estimated spillover effects using all observations. One potential concern is the role of sorting and whether selection into taking assessments is the factor deriving our findings.
We address this concern by estimating the main effects from equations 1 and 2 for a subset of our sample who attended at least five out of the eight possible assessments. Our data includes 1,792 observations from 313 children who attend a minimum of five assessments. Note these children represent under 20% of of the total number of children in our pooled sample. The estimates on non-cognitive spillover effects are especially close to the ones presented in Tables   4 and 6. The fixed-effects estimates of β 1 on cognitive spillover effects from the whole sample for neighborhood radii of 3, 5, and 7 kilometers are 0.0033σ, 0.0021σ, and 0.0018σ, respectively, whereas the corresponding estimates from the restricted sample are 0.0040σ, 0.0021σ, and 0.0018σ. Likewise, the LDV estimates of β 1 on cognitive spillover effects from the whole sample for neighborhood radii of 3, 5, and 7 kilometers are 0.0042σ, 0.0033σ, and 0.0027σ. The corresponding estimates for

A6
Electronic copy available at: https://ssrn.com/abstract=3385107 the restricted subsample are 0.0029σ, 0.0017σ, and 0.0010σ. The fixed effects estimates on noncognitive spillover effects from the whole sample for neighborhood radii of 3, 5 and 7 kilometers were 0.0069σ, 0.0043σ, and 0.0033σ, whereas the fixed-effects estimates from the restricted sample are 0.0070σ, 0.0036σ, and 0.0027σ. Similarly, the LDV estimates on non-cognitive spillover effects from the whole sample for neighborhood radii of 3, 5, and 7 kilometers are 0.0070σ, 0.0059σ, and 0.0054σ, whereas the corresponding fixed-effects estimates from the restricted sample are 0.0064σ, 0.0050σ, and 0.0045σ. Notes: Estimated spillover effects from equations (1) and (2) for a subsample of observations from children who attended a minimum of five out of eight assessments. Robust standard errors, clustered at the census-block-group level, are in parentheses; *** p<0.01, ** p<0.05, * p<0.1 F Spatial Fade-out Notes: Columns 1-3 (4-6) represent the effect of an additional treated neighbor on a child's standardized cognitive (non-cognitive) score. Robust standard errors, clustered at the censusblock-group level, are in parentheses; *** p<0.01, ** p<0.05, * p<0.1 Notes: Columns 1-3 (4-6) represent the effect from an additional treated neighbor on a child's standardized cognitive (non-cognitive) score. Robust standard errors, clustered at the censusblock-group, are level in parentheses; *** p<0.01, ** p<0.05, * p<0.1 G Spillovers from Parent Academy, Preschool, and Cog-X Treatments In section 6, we compared the spillover effects from the Parent Academy treatments, which exclusively offered education program for parents, to the four Pre-K treatments, which offered pre-school programs to children. Two of the four Pre-K treatment groups (Preschool-Plus and Kinderprep), also included parental components, which, compared to Parent Academies, were shorter and not as heavily incentivized. We refer to these two treatments as Cog-X treatments. In this section, instead of estimating the overall spillover effects from Pre-K treatments, we separately estimate the effects from Cog-X to the ones from the preschool treatments, which exclusively targeted children, using the following specification: Y i,t = β 0 +β parent N P arent i,t|r +β parent child N Cogx i,t|r +β child N P reschool i,t|r +λN total i,t|r +ηY i,t−1 +X i α+σ b +µ c +δ t + i,t , (G. 1) where N P arent i,t,r , N Cogx i,t,r , and N P reschool i,t,r represent the number of neighbors residing within distance r of a child i who were assigned to Parent Academy, Cog-X , and the two preschool treatments that exclusively targeted children. All other arguments are defined as in section 6. Under the above specifications, β parent , β parent child , and β child represent the spillover effects from an additional treated neighbor who was assigned to Parent Academies, Cog-X, or the two preschool treatments with no parental components. Table G.1 presents the estimates ofβ parent ,β parent child andβ child . Notes: Columns 1-3 (4-6) represent the effect of an additional treated neighbor of each type, on a child's standardized cognitive (non-cognitive) score. Robust standard errors, clustered at the census-block-group level, are in parentheses; *** p<0.01, ** p<0.05, * p<0.1 Our estimates suggest the programs that included parental-education components are likely to generate larger cognitive spillovers than those that exclusively targeted children. 5 Focusing on noncognitive spillovers, our estimates confirm our previous findings that programs that directly targeted children (Cog-X and preschool treatments) generate significantly larger non-cognitive spillovers than the ones that exclusively targeted parents (p < 0.10).

H Total Impact Evaluation and Neighborhood Radius
In section 7, we estimated the impact of the intervention under the neighborhood radius of 3 kilometers. As we broaden the definition of the neighborhood, the estimated total spillover effect on both the control (SC N ) and treatment (ST N ) children increases. This finding is intuitive, because broadening the neighborhood would allow for neighbors who live farther away to also impact a 5 The differences in spillover effects are insignificant for r = 3 km, but as we broaden neighborhood radii to 5 and 7 kilometers, the difference in spillovers on cognitive skills from Cog-X and Parent Academy to the ones from the preschool treatments become significant (p < 0.10). 6  for r1=3K and r2=5K; r1=3K and r2=7K; and r1=5K and r2=7K are 0.28, 0.12, and 0.08. The corresponding p-values for non-cognitive scores (θ N cog ) are 0.75, 0.85, and 1.00. child's outcomes. The increases in the total spillover effects result in larger estimates of the total impacts (T otal) as we increase the neighborhood radius. The average estimated total impact of the intervention (T otal) on a treatment child's cognitive scores is 0.82σ for r = 3 km and increases to 1.16σ and 1.24σ for r = 5 km and r = 7 km. Likewise, the average estimated total impact of the intervention (T otal) on the non-cognitive score of a child increases from 1.32σ to 1.97σ and 2.32σ as we increase the radius from 3 to 5 and 7 kilometers. Finally, our estimates for program impacts if we were to ignore spillovers to control children (T otal Standard ) are very similar and not significantly different across various radii. 7 I Exploring Non-linearities in Measuring the Total Spillover Effects We calculated our estimation of the total spillover effects under the assumption of linearity. In this section, we explore whether and how allowing for nonlinearities affects our estimates. We explore While under the linear specification, the marginal spillover effect from the j-th treated neighbor is given by α 1 for a control child and by α 1 + λ 1 for a treated child, the corresponding effects under polynomials of degrees 2 and 3 are given by α 1 + 2α 2 j and α 1 + 2α 2 j + 3α 3 j 2 for a control child, and α 1 + λ 1 + 2(α 2 + λ 2 )j and α 1 + 2(α 2 + λ 2 )j + 3(α 3 + λ 3 )j 2 for a treated child. Therefore, the average spillover effects on a control child from all neighbors, using polynomials of degrees 1, 2, and 3, can be calculated as follows: 8 N tr j=1 j = α 1 N tr + α 2 N tr (N tr + 1) N tr j=1 j 2 = α 1 N tr + α 2 N tr (N tr + 1) + 0.5α 3 (N tr + 1)(2N tr + 1). Note the coefficients of the quadratic and cubic terms are all insignificant, suggesting our linear specification is an appropriate representation. Our estimates of the average cognitive spillover effects from all treated neighbors become slightly smaller as we add quadratic and cubic terms, but the changes are small. The estimated non-cognitive spillover effects become larger as we move away from the linear specification. However, these increases are small. Overall, we find no strong evidence suggesting the spillover effect from an additional treated neighbor has a non-linear relationship with the number of treated neighbors. Notes: Estimated coefficients from equation (I.1) for neighborhood radius of r = 3 km. Columns 1 and 4 correspond to the linear specification; 2 and 5 correspond to polynomials of degree 2; columns 3 and 6 correspond to polynomials of degree 3. Robust standard errors, clustered at the census-block-group level, are in parentheses; *** p<0.01, ** p<0.05, * p<0.1