Compulsory Schooling and the Returns to Education: A Re-examination

This paper re-examines the instrumental variable approach to estimating the effect of compulsory school law on education in the US pioneered by Angrist and Krueger (1991). We show that the approach not only yields empirically inconsistent estimates but is conceptually confused. The confusion arises from a rejection of the key causal variable as a valid conditional variable. By route of a causally explicit model design assisted by machine learning techniques, we identify the circumstances under which the wrongly rejected variable yields valid inference values. Our investigation demonstrates the importance of data-guided model selection over the choice of consistent estimators.


Introduction
Over the past century, compulsory school law (CSL) was introduced in virtually every middle and high-income country (Goldin 1998;Goldin and Katz 2007). Empirical investigations into the effect of the CSL on educational attainment and income are pioneered by Angrist and Krueger (1991). They use CSL indicators as instrumental variables (IVs) to 'randomize' latent ability across educational attainment groups in order to correct for the assumed bias in the ordinary least square (OLS) estimator; see also and Acemoglu and Angrist (2001). The influence of this seminal paper reaches far beyond the estimation of CSL effects.
The empirical strategy is now common practice in research on the average return to education (ARTE) as well as program evaluation modelling, e.g. Harmon et al. (2003), Ludwig et al. (2012;2013), and the two papers have since entered the standard economics curriculum as evident from their appearance in two popular textbooks by Angrist and Pischke (2009;2015).
Despite the far-reaching influence of these studies, causal interpretation of the estimated CSL effect is confused. This is exemplified by two interlinked developments in the literature: a) a shift in the interpretation of the CSL instrumentalized returns to schooling coefficient despite identical modelling choice, and b) empirical results which vary significantly with the choice of CSL indicators. Angrist and Krueger (1991), who approximate CSL with quarter of birth dummies, interpret their results as unbiased estimates of the ARTE, and find that the IV estimates are not statistically different from estimates obtained via OLS. 1 Acemoglu and Angrist (2001) construct alternative CSL indicators based on labor law that produce IV estimates which, although significantly different from OLS estimates, are insignificant or negative. 2 Acemoglu and Angrist (2001) no longer interpret the IV estimates as unbiased returns to schooling, but as the causal effect of CSL on earnings via schooling. These CSL indicators are further refined by Stephens Jr. and Yang (2014). Their results verify the findings by Acemoglu and Angrist (2001) andStephens Jr. andYang (2014: 1789) conclude that there is 'no evidence of benefits to additional schooling' due to CSL. This paper draws upon concepts and techniques of causal modelling and model selection in statistical machine learning to probe into the confusion over the CSL instrumentalized ARTE estimator. By careful re-examination of two data sets from a causal modelling angle, one from 1 E.g. column (5) versus (6) in Table 4, (7) versus (8) in Table 5, (1) versus (2) and (5) versus (6) in Table 6 in Angrist and Krueger (1991). More evidence in Hoogerheide and van Dijk (2006, Table 5) and in Harmon et al. (2003, Section 5). 2 See Angrist and Pischke (2015, Table 6.3) for a summary. Angrist and Krueger (1991) (AK hereafter) and the other from Stephens Jr. and Yang (2014) (SY hereafter), 3 we expose the methodological obfuscations underlying the confusion.
We start from the root of the confusioncausal model specification. Close inspection of the causal links between the explanatory variables of interest with the help of causal graphs in Section 2 reveals two key insights. First, the use of IV estimators amounts to making, albeit implicitly, an a priori assertion of the education variable being an invalid conditional variable, thus undermining its causal status. We reason that the ensuing replacement of the education variable by an instrumentalized alternative amounts to a change in model specification.
Conceptualizing the IV approach as model choice, the implicit assertion underlying the use of IV estimators can be explicitly specified into a testable hypothesis. Second, two types of  (2001) study as 'a failed research design' and ascribe the failure to the choice of inappropriate CSL indicators that fail to meet identifying assumptions; see also Stock et al. (2002), Kolesár et al. (2015). In contrast, our analysis reveals the root of the failure to be the choice of the IV approach, not the choice of instruments. We argue that the IV approach is a detour into deadlock, as it effectively denies direct translation of causal postulates of interest into data-consistent conditional relationships; an argument that applies to the use of the IV approach as a panacea for endogeneity bias in 4 general; see Qin (2019). Our re-examination of the CSL case clearly shows the importance of empirical model design via data learning over estimator choice for untested theoretical models.

Model Specification of Schooling Effects Under CSL Treatment: An Anatomy
To disentangle the conflicting causal interpretations presented in AK and SY, let us start from their basic model design. Briefly, denote education by s and outcome by y, the ARTE is represented by in (1): (1) = + + .
AK and SY assume ( ) ≠ 0, based on the argument that η contains omitted variables which are not directly observable but collinear with s, such as innate ability. The assumption determines OLS to be an inconsistent estimator for (1) which should be replaced by consistent IV estimators. 4 AK and SY use the CSL as a key instrument. The resulting IV estimator amounts to appending (1) by: where represents the CSL and other IVs. Equation (2) can be seen as the first stage of the two-stage least square (2SLS) estimation. Combining it with (1) and taking into consideration the shift in the interpretation of from the OVB-free ARTE to the CSL effect via schooling, we see that the IV approach actually implies a rejection of (1) in favor of: where denotes the fitted s from (2) and superscript L represents the CSL. The expected ≠ entails ≉ , which leads to the shifted interpretation. Hence, the essence of the IV approach is to assert , instead of s, as the valid conditional variablean implicit change in model specification. In other words, the choice between OLS and IV estimators is essentially a choice of a valid conditional variable of interest among and , and consequently a valid conditional model among (1) and (3); see Qin (2015;2019). Motivated by this insight, we proceed with an anatomy of model specification that goes beyond the narrow estimation perspective.
In order to understand what has caused the assertion of being an invalid conditional variable and the shift in interpretation of , we further scrutinize schooling effects under the CSL treatment. Since the CSL effect on income via schooling is a sequential event, an intuitive way to represent the event is a directed acyclic graph (DAG), a popular tool for assisting causal modelling; see Cox and Wermuth (1996), Cox (2011) andPearl (2009 The DAG in the left panel of Figure 1 shows us that, when the effect of L forms the focal causal interest, s takes the role of an intermediate variable, or a mediator, but when s is the causal variable of interest, L takes the role of a moderator exclusive for s. Whatever the causal variable of interest, ( , , ), the joint density of the three variables can be recursively factorized and reduced as below: (4) ( , , ) = ( | , ) ( | ) ( ) = ( | , ) ( | ), since ( ) = 1 when retrospective cross-section data samples are used. From Pearl (2009), it can be seen that the CSL treatment implies a rule of intervention, 5 namely ⊥ | , so that we can further factorize the conditional density in (4) as: The sequential nature of the ATE of L on y via s is expressed by the conditional expectation of (5), ( , | ) = ( | ) ( | ). In a linear model setting, the ATE, denoted by , can be derived from a chain of two simple regressions corresponding to ( | ) and ( | ) respectively; see Cox and Wermuth (2004): 6 (6) = + + = + + ⟹ = .
Model (6) tells us that ≠ holds in general unless = 1 can be verified, which is highly unlikely in view of available findings, e.g. see Goldin and Katz (2011). Hence, we should expect that ≪ .
If forms the only parameter of substantive interest, the chain route appears a long way round to estimate it, because the parameter can be estimated directly from: However, L is latent in the cross-section samples used by AK and SY and is approximated by various observable indicators, ℒ, i.e.: This direct route may result in ℒ ≠ ℒ . The defectiveness of CSL indicators due to entanglement with regional factors and other controls leading to indirect effects has already been identified by SY. In other words, ℒ may fail the rule of intervention such that ℒ. ≠ 0 from the following regression: This situation is illustrated in the modified DAG in the right panel of Figure 1. Two consequences follow. First, the chain route of (6) is more reliable than (7') for estimating the ATE of the CSL via schooling. Second, a test of ℒ. = 0 using (8) can be exploited as an additional criterion for CSL indicator selection purpose; see Zhang et al. (2017) for implication of measurement error for the estimation of causal chains.
The advantage of the chain route becomes even more evident when the presence of control variables, denoted by Z, is taken into consideration. Although Z is chosen primarily from consideration of ( ) ≠ 0, some of the control variables are likely to correlate with CSL indicators, such as age and regional dummies in the two data sets by AK and SY. The DAGs with Z included are shown in Figure 2. The potential correlation between Z and CSL indicators complicates the estimation of the CSL effect. Extend (6) by Z:  Model (9) differs from model (3) in two essential aspects. First, s is taken as a valid conditional variable in the first equation of (9). Second, the lower equations are part of the chain representation rather than an instrumental step. The two income effects can thus be estimated separately.
The separation of the two effects enables us to disentangle the potential problems arising in their estimation. The OVB-based argument underlying the IV-treatment applies to . , in (9) because of the inclusion of Z. or defines OVB or part of it if viewed from of (7) when estimating the CSL effect. Hence, the question is, whether . is consistent when controlling for a set of covariates Z.
Further, concerns over measurement-error are far more relevant to CSL indicators than to in the present context. 7 However, a concern over selection bias remains pertinent to the estimation of the schooling effect. Specifically, the CSL treatment could alter the population composition of educated workers, as compared to that of the pre-treatment population, e.g.
through a diluted concentration level of 'aptitude' (see Angrist and Pischke 2009: Chapter 4), so much so that the post-treatment schooling effect becomes significantly different from the pre-treatment one.
The IV remedy of this selection-bias induced effect is to maintain ≠ from the perspective of retrospective data. As pointed out earlier, the IV approach amounts to rejecting the causal validity of s and substituting it with . Figure 3 illustrates the situation in two DAGs. Comparison of Figure 3 with Figure 1 reveals a key difference between models (3) and (8) as different alternatives to (1). While (8) extends (1) via a causal chain model specification as illustrated in the left panel of Figure 1, the IV route by (2) confines L to an instrumental role in producing a new variable , whereby blockings the route to a chain model extension. 8 Consequently, the IV approach is unsuitable for tackling the question whether the CSL treatment has indeed resulted in a compositional shift to such an extent that it has caused a parametric shift in the ARTE, since of (1) is already rejected a priori as an inconsistent parameter. Under the chain model specification, the only feasible way to tackle this question is to carefully divide the available samples into two partsan L-treated part versus a CSL unaffected partso as to investigate whether there exists a parametric difference: . ≠ . , where denotes schooling of the L-treated part, and ̃ the treatment unaffected part.
It should be noted that even if the inequality is supported by data, the evidence alone is insufficient for rejecting s as a valid conditional variable for y, e.g. see Engle et al. (1983).
Conversely, evidence against such a parametric shift does not imply = 0 unless = 0 can be verified.

Where Does the Research Design Fail?
Section 2 has shown that the IV approach amounts to a model re-specification by replacement of with as the valid conditional variable. This re-specification is based on the premise of inconsistency of the OLS model specification relative to its IV counterpart. By exposing the IV approach as a model re-specification, the choice of the key conditional variable and therefore the choice between the IV and OLS model becomes a question of model selection which is testable. According to statistical machine learning theory, model selection is based on empirical risk minimization over a given hypothesis space that spans over the competing model specifications. A model is selected against its alternatives based on the interlinked criteria of generalizability (or predictivity), stability and consistency, whereby Mukherjee et al. (2006) show that stability is equivalent to empirical consistency. One common tool for model selection is CV, which provides a formal assessment of predictivity and consistency by splitting the sample into a training and testing component, e.g. see Arlot and Celisse (2010). The tool is not new to the impact evaluation literature, e.g. Athey and Imbens (2015).
In the following, we first replicate and then, by use of CV, reassess the results presented by AK and SY against the above listed criteria. Since the CSL is latent, it is approximated by observable indicators, ℒ. Quarterly birth dummies are chosen by AK (ℒ ). 9 SY, with reference to Acemoglu and Angrist (2001), propose two alternative indicators based on state school and labor law. These indicators capture required years of schooling (ℒ 1 ) and compulsory attendance (ℒ 2 ). 10 Let us inspect the replicate of SY's results first (see Figure 4). The IV-based model specifications appear to lack empirical consistency and robustness relative to their OLS counterpart. fails to show convergence and standard errors remain large as the sample size increases. Although these findings are common in the literature, their implications are rarely discussed; see Deaton and Cartwright (2016). 9 The indicator choice is based on the insight that the CSL requires a minimum age which must be reached before students can drop out of school. Those born in the first quarters of the year reach this age sooner than those born in later quarters and hence are less constrained by the law than their peers. Accordingly, AK define three birth dummies for those born in the first (ℒ 1 ), second (ℒ 2 ) and third (ℒ 3 ) quarter of the year; see also Angrist and Krueger (1992). 10 As in AK the indicators compose of three dummies. ℒ 1 1 , ℒ 1 2 , and ℒ 1 3 capture those with minimum of 7 or below, 8, and 9 or above required years of schooling and ℒ 2 1 , ℒ 2 2 , and ℒ 2

FIGURE 4. OSL AND IV ESTIMATOR CONSISTENCY
Notes: IV1 and OLS1 are IV and OLS estimates without regional control variables and IV2 and OLS2 are IV and OLS estimates with regional control variables included. The x-axis provides the sample size and the y-axes coefficient values (left axis for IV estimates and right axis for OLS estimates Column (1) in T1B of our Table 1 is the only exception, with no rejection of Sargan's null of valid overidentifying restrictions and rejection of Hausman's null of OLS estimator consistency relative to IV. Although the validity of instruments is not rejected for column (2) in T1A of Table 1, the IV estimates remain insignificant.
In contrast to , seems to strongly correlate with covariates such as interaction terms that allow for regional differences in year of birth effects. The inclusion of these interaction terms leads to large changes in whereas remains virtually invariant; see Figure 4 and Table 1 columns (2) and (4) of T1A and T1B. At the same time, the inclusion of interaction terms invalidates the claim of endogeneity if using ℒ 2 indicators and leads to an insignificant estimate if using ℒ 1 indicators. The sensitivity of IV estimates to regional factors is pointed out by SY and reiterated by Hogerheide and van Dijk (2006: Table 5). 11 This raises the question whether ℒ solely represent the CSL treatment; a potential case of measurement error.
11 CSL indicators based on quarter of birth dummies face similar problems and Bound and Jaeger (2000) and Carneiro and Heckman (2002) show an entanglement of indicators with social status.  The insignificance and empirical inconsistency of , identified in Figure 1 and Table 1, could also be caused by a negligible share of 'compliers' in the full sample; a point made by Oreopoulos (2006a) in the context of the CSL effect when using minimum years of schooling indicators. Since most people remain in school beyond the required years, the great majority of the sample belongs to a sub-population for which the ATE of the CSL is expected to be zero.
In other words, the CSL is potentially binding only for school leavers, but by and large not for those who have continued education beyond the compulsory years of schooling. Using ℒ 2 indicators, roughly 4.11 per cent of the 1930-39 born cohort complies 12 with the law. The share of compliers is even smaller for the later born cohort with 2.31 per cent. Using ℒ 2 indicators instead, the share of compliers is similarly small with 4.18 and 2.53 per cent in the 1930s and 1940s birth cohort respectively (see Table 1A, Appendix). Our rough estimates of complier shares are slightly lower than in Bolzern and Huber (2017), who report a complier share of 6-12 per cent for European countries based on comparison of mean potential outcomes using binary treatment and instrument variables.
Following from the above, the difference in the IV estimates for the ARTE when using different CSL indicators is commonly explained to be the result of localized treatment, i.e. treatment which is confined to specific complier groups; see Angrist et al. (1996) and Angrist and Imbens (1995). The estimates are thus interpreted as 'local average treatment effect' (LATE) instead of ATE, e.g. see Angrist and Pischke (2009: Chapter 4), and also Heckman and Urzua (2009), Deaton (2009), Imbens (2010. A simpler way of examining the localness is to estimate the models using sub-sample data. Therefore, we replicate SY Tables 1 and A2 and AK Tables V and VI, using sub-samples to separate 'always takers' from 'compliers '. 13 Specifically, those who receive 12 or less years of schooling are allocated to the School subsample and those with more years of education are allocated to the Higher sub-sample. Further, the tails of the two sub-samples, Higher and School, are cut to investigate whether dissimilarities between the sub-samples arise due to outliers (cf. Figure 1A, Appendix). The aim of this experiment is to examine the empirical consistency of the two models under the localized treatment condition. The results of the above experiments show a notable lack of consistency in the IV case. To formally compare the two model specifications, we draw on the aforementioned insight from machine learning theory; see Mukherjee et al. (2006) and Shalev-Shwartz et al. (2010). CV techniques separate the sample into k-folds with k-1 folds being used to train the model and the k th fold to test the model. The competing model specifications can hence be evaluated by comparison of the relative MSE in a k-fold CV, e.g. see Arlot and Celisse (2010), Zhang and Yang (2015). Figure 5 shows that the OLS-based model clearly outperforms the IV-based one in generalizability, stability and consistency. Remarkably, results for AK are close to or even worse than for SY, despite the finding of ≈ in AK. 13 The separation remains imperfect, as some always takers will be contained in the complier group. 14 The only exception is the later born cohort with AK's model specification where all return to education estimators are insignificant or diagnostics reveal problems with the model design. 15 This effect is more pronounced for the later born cohort, potentially due to educational inflation. These data patterns are undetectable by the CSL-based IV method, since instruments narrowly target school goers but not those attaining higher education and large standard errors hide significant difference across cohorts. Notes: SY data follows model specification (1) Tables 1 and A2; AK data follows Tables V and VI columns (5) and (6) model specifications.
The discrepancy between OLS estimates obtained with SY and AK data sets is due to differences in the construction of the education variable; see Table 3 and discussion. P-values for Sargan and Hausman test in (.). a Standard errors cluster adjusted. ** Significant at the 1 percent level. * Significant at the 5 percent level. As expected, the MSE is decreasing as the training sample increases, that is, with increasing k, for both models. However, the IV-based model shows no sign of convergence as training samples grow and the OLS-based model outperforms the IV-based one at all k, even though the CV experiment presented here does not adjust for degrees of freedom. 16 When decomposing the MSE into test bias and variance, we find little evidence of asymptotic bias in the OLS estimates ( Figure 6). Instead, there are small but discernible decreases of the OLS bias while the IV bias increases with k, suggesting over fitting in the IV-based model. Given the small bias, the large difference in the MSE between the two models clearly stems from a greater variance or instability of the IV model specification, putting the consistency claim of IV into question. While our findings are specific to the CSL case and the chose instruments, results by Young (2017) who re-evaluates 1,359 published IV regressions, suggest that the conclusion drawn from the CV exercise are the norm rather than an exception.

CSL Treatment in A Multiple Model Framework
Given the overwhelming rejection of in favor of , we now probe further into the evidence of as a valid conditional variable by rout of the chain model approach presented in Section 2.
The chain model approach involves the estimation of the two outcome effects on earnings: (a) the ARTE effect, . , and (b) the CSL effect or ATE of the CSL via schooling, . We then 16 The IV approach uses up more degrees of freedom than the OLS counterpart due to the first stage. Therefore, the MSE of the IV model specification understates the error when compared to the OLS counterpart.  investigate the possible presence of a L-treatment induced parametric shift in the schooling effect, . ≠̃. .

Sub Estimating the Schooling Effect:
.
The presentation of varying OLS-based ARTE estimates by AK and SY, despite the use of almost identical samples, indicates problems in the choice of appropriate covariates (Table 2).
Therefore, we proceed with the question of how to specify Z in order to find an empirically adequate specification of (9), which is as parsimonious as possible and also can align the ARTE estimates by AK and SY data respectively. This is achieved through firstly, unification of the education variable and secondly, a parsimonious model specification.
Towards a unification of the education variable, the AK education variable is capped at 17 years to resemble the SY education variable. The unification is found to play a vital role in aligning the ARTE estimates across the two data sets. As for the experiments reported in Table   2, we rely on AK's division between those born in the 1930s and 1940s respectively using observations from the 1980 census. Towards a more parsimonious model, year of birth dummies included by both AK and SY are replaced with quadratic age (age2). 17 Regional dummies for individual states are replaced by a single variable distinguishing between four regions for SY and nine regions for AK data (region). Considering a possible regional effect on school quality, variables capturing school quality (pupilt, term, reltwage) suggested by Card and Krueger (1992a;1992b) are used by SY and included in our model as well.
A notable pattern of parameter inconstancy in Table 2 is the variation in the ARTE estimates with the level of education. The variation reflects 'sheepskin effects', which are well documented phenomena in the literature 18 and clearly discernible in the AK and SY data; see Figure 1A and Table 1A, Appendix. A dummy/binary variable (uni) is thus added as a classifier for those who obtained a university degree (15 or more years of schooling).
The key results of this model search are reported in  Pischke and von Wachter (2008) and Acemoglu and Angrist (2001), who report estimates of 0.061 and 0.075 respectively.  1930-1939 1940-1949 1930-1939 1940-1949 1930-1939 1940-1949 1930-1939 1940-1949 ,  The row reports the correlation coefficient between in (9) and the residuals from the auxiliary regression with a value close to zero confirming consistency. Non-Gaussianity of the residuals was testerd before and strongly supported by data. d See SY and AK for variable names. ** Significant at the 1 percent level. * Significant at the 5 percent level.
As discussed earlier and indicated by the findings in Table 3, the risk of OVB for . comes from inadequately specified . Hence, we evaluate the choice of by use of a simple statistical test of consistency developed by Entner et al. (2012). Recalling the DAG in Figure 2, we can immediately see that in the presence of OVB, that is, missing covariates in , the residuals in (9) would be statistically dependent on . Entner et al. (2012) exploit this insight by means of a simple two step algorithm to test the consistency of . against the risk of OVB. In a first step, the key conditional variable is regressed on the set of covariates Z. 19 If residuals of this auxiliary regression are non-Gaussian -Gaussian residuals are a rarity in large cross-sectional data setsit is tested for statistically independent between from (9) and the error term of the auxiliary regression in a second step. If independence is confirmed, . is consistent with regards to the choice of covariates Z. The test results are reported in the last row of Table 3. In all cases, consistency is strongly supported by the data, rejecting the presupposition of OVB underlying the IV treatment. 19 The auxiliary regression takes the form = + ′ . + . If is non-Gaussian, statistical independence between and confirms consistency of . in (9).

Estimating the ATE of the CSL via Schooling:
Given the potential measurement error in CSL indicators identified by SY and briefly discussed in Section 3, we conduct two simple experiments to further test the appropriateness of the indicator choice before continuing with the estimation of . Since CSL is only binding for school leavers, we would expect the ATE to be insignificant or at least smaller for those with higher education than for those without. Following this reasoning, we estimate the middle equation of (9) using subsample groups by educational attainment as in Table 2, with the expectation that ̂≠ 0 for School and ̂= 0 for Higher.
It is shown in Table 4 that, although ̂ tends to be larger for the School subsample than for the Higher sub-sample, none of the indicators confirms the hypothesis of ̂= 0 for Higher.
Noticeably, the size of those ̂≠ 0 in the first cohort has almost doubled that of the second cohort in the case of SY indicators. This shift appears to reflect a general shift towards more years of education. As seen from Table 1A (Appendix), the share of those attaining less or equal the minimum years of schooling is halved in the later cohort. Notes: 1980 census, data for SY white male with positive weekly earnings, data for AK male with positive weekly earnings. a Robust cluster adjusted standard errors. b Robust standard errors. ** Significant at the 1 percent level. * Significant at the 5 percent level.
In a second step, we test whether the rule of intervention ℒ. = 0 holds for the different CSL indicators by estimation of (8) with additional controls Z. In reference to earlier experiments, we conduct the test for the School sub-sample in addition to the full sample estimation. It is shown in Table 5 that the condition ℒ. = 0 is validated for SY's ℒ 1 indicator across cohorts and also for AK's ℒ indicator for the early born cohort. But it is violated without exception if using ℒ 2 as CSL indicator. Where conditional independence is rejected in Table 5, we have also failed to confirm ̂= 0 for the Higher sub-sample (see Table 4) and rejected instrument validity (see Table 2). In cases like this, we should be cautious with the estimate of via the chain representation of (10). Notes: 1980 census, data for SY white male with positive weekly earnings, data for AK male with positive weekly earnings. Z as specified in 'Alternative' in Table 4. Standard errors reported in (.). a Robust cluster adjusted standard errors. b Robust standard errors. ** Significant at the 1 percent level. * Significant at the 5 percent level. Table 6 provides estimated via (10). Where conditional independence was verified, the chain approximation yields significant ATE estimates that confirm our expectation of a larger effect of schooling on earnings than that of the CSL, i.e. . ≫ . Direct ATE estimates ℒ obtained via (7') exceed estimates obtained via chain approximation for the later born cohort (see Table 2A, Appendix). The effect is indicative of positive indirect CSL effects through control variables Z in later years. Further, chain approximations using SY indicators are much more varied across cohorts than across sub-samples, due to the varying estimates of in Table 4. The estimated ATE almost doubles for the later born cohort from 1-3 to 3-5 per cent using ℒ SY1 indicators. The ATE estimates using AK indicators are relatively constant across both sub-samples and cohorts. It should be noted that the negative sign here actually implies a positive ATE because people born in the first three quarters ℒ 1 , ℒ 2 and ℒ 3 are associated with less years of schooling as compared to those born in the fourth quarter. The CSL effect is strongest for those born in the first quarter and weakens with the second and third quarter born consecutively.
Our ATE estimates are in rough agreement with findings reported in the literature. For instance, Pischke and von Wachter (2008) report the ATE of CSL in Germany to be between 0.012 and 0.017 for different datasets. Results presented by Oreopoulos (2006b) for Canada suggest a larger ATE between 0.031 and 0.107, which is similar to our estimates using ℒ 1 . Acemoglu and Angrist (2001) report the ATE of CSL to be between 0.008 and 0.009. A moderate positive impact of CSL on schooling was further confirmed by Lleras-Muney (2002), Oreopoulos (2006a) and Goldin and Katz (2011). Notes: 1980 census, data for SY white male with positive weekly earnings, data for AK male with positive weekly earnings. See Tables 3 and   4 for . and estimates respectively. Significance of ℒ based on χ 2 statistics estimated following Weesie (1999), reported in [.]. a Robust cluster adjusted standard errors. b Robust standard errors. ** Significant at the 1 percent level. * Significant at the 5 percent level.

Testing for a CSL Induced Shift in the Schooling Effect:
. ≠̃.
If the introduction of the CSL has altered the ability composition of workers, the post treatment schooling effect, . , might differ from the schooling effect before treatment, For the AK law indicators, the treated group is defined as those born in the first and second quarters of the year, while the untreated group is defined as those born in the remaining quarters. Since the CSL is binding for only a minority of the treated groupas evident from the negligible share of compliers (Table 1A, Appendix) 20the ability composition is unlikely to render a significant parameter shift using the full sample. As before, sub-sample division is used to minimizes defectors and always takers in the treated group of the School sub-sample.
A parametric shift should hence be discernible for the School but not the Higher sub-sample.
As can be seen from   1930-19391940-1949 Higher (  As can be seen from Table 8, using ℒ 1 indicators, the break-point Chow tests provide no evidence for a treatment induced parametric shift in the ARTE parameter at the 1 per cent significance level and some evidence at 5 per cent significance level for the School Binding sub-sample. Using ℒ 2 as indicator, there is some evidence for a parametric effect for the later born cohort. The effect is absent from the Higher sub-sample as expected, but detectable at the 5 per cent level for the Full sample and School sub-sample. However, given measurement errors identified for the ℒ 2 indicators in Table 4, the evidence is too weak to conclude on a parametric shift. Overall, the evidence for the presence of an L-treatment induced parametric shift is weak and rattled with deficiencies in the CSL indicators. However, if the substantive interest is with a data-consistent ARTE effect, data patterns originating from sheepskin effects and educational inflation are found to be of far greater concern than an L-treatment induced parametric shifts (see Tables 2 and 3). Notes: 1980 census, white male with positive weekly earnings. P-values in (.). a The treated sub-sample is defined as those born in a state with some school law in place, that is, minimum years of schooling unequal zero. b Untreated sub-sample is defined as those born in a state with no school law in place, that is, minimum years of schooling equal to zero. c Definition of treated and untreated change for this experiment.
The treated sub-sample is defined as those who drop out after the minimum years of schooling and the untreated sub-sample comprises of the remaining observations.

What have we learnt?
Angrist and Pischke (2015: 227) discard the Acemoglu and Angrist (2001) study as 'a failed research design' and ascribe the failure to inappropriate CSL indicators, while maintaining the IV approach as appropriate. Our analysis shows that the failure lies in model design rather than choice of IVs, and that the failure is caused by nescience of the causal model alternation nature of the IV approach. A re-assessment of the approach focusing on its effects on causal model design in Section 2 leads us to identify several inconsistencies. The IV treatment was motivated by the desire for consistent estimation of model (1) out of concerns over the presence of OVB, measurement error and/or self-selection bias linked to innate ability when (1) is applied to data.
However, an extension of model (1), by taking into careful consideration the causal implications of these concerns via DAGs in Section 2, shows that the relevance of these concerns crucially depends on the specified parameter of interest subsequent to the choice of the key conditional variable.
If the parameter of interest is the ATE of the CSL via schooling, concerns regarding selection bias and measurement error in the schooling variable are irrelevant, since merely acts as a mediator for L. As evident from Tables 4-6, measurement error in CSL indicators is a major concern. CSL indicators fail to consistently target school goers, fail the rule of intervention and result in non-robust ATE estimates. These weaknesses in the CSL indicators call for a more careful indicator selection; a concern raised repeatedly in the literature. If the parameter of interest is the ARTE, the effect of L is irrelevant, unless its moderating effect on s induces a parametric shift. We find little evidence of such a shift using the CSL indicators of AK and SY (Tables 7 and 8).
The IV approach, in contrast, tries apparently to uphold model (1)  The CV experiments help us deselect model (3) in favour of (1); see Arlot and Celisse (2010). Moreover, they also teach us the primary importance of building and selecting dataconsistent models to minimize empirical risk. It is a problematic strategy to go for consistent estimators before model selection is accomplished; see an insightful discussion on learnability versus consistency by Shalev-Shwartz and Ben-David (2014: Chapter 7).  a. 1930-1939SY b. 1930-1939AK c. 1940-1949SY d. 1940-1949 FIGURE 1A. YEARS FOR SCHOOLING DENSITY Notes: AK education variable is capped at 17 years of schooling for comparability between the AK and SY datasets.  Notes: 'N' is sample size, 'equal' is share of those with years of education equal to school law, and 'less' years of education and 'more' years of education respectively for those treated by the respective law. For the untreated group, share of those with less than 7, 8, and 9 years of education among untreated is given. 'Total' compares 'Untreated' against total of the sample in the equal to the law, less than the law and more than the law of schooling categories. Notes: 1980 census, data for SY white male with positive weekly earnings, data for AK male with positive weekly earnings. T-statistics reported in (.). a Robust cluster adjusted standard errors. b Robust standard errors. ** Significant at the 1 percent level. * Significant at the 5 percent level.