Estimating the Marginal Abatement Cost Curve of CO 2 Emissions in China : Provincial Panel Data Analysis *

This paper estimates the Marginal Abatement Cost Curve (MACC) of CO2 emissions in China based on a provincial panel for the period of 2001-2010. The provincial marginal abatement cost (MAC) of CO2 emissions is estimated using a parameterized directional output distance function. Four types of model specifications are applied to fit the MAC-carbon intensity pairs. The optimal specification controlling for various covariates is identified econometrically. A scenario simulation of China’s 40-45 percent carbon intensity reduction based on our MACC is illustrated. Our simulation results show that China would incur a 559-623 Yuan/ton (roughly 51-57 percent) increase in marginal abatement cost to achieve a corresponding 40-45 percent reduction in carbon intensity compared to its 2005 level.

The Marginal Abatement Cost Curve (MACC) has recently attracted extensive attention and been increasingly applied in climate change policy. Its growing popularity is mainly due to its simplified representation of the complex relationship between emissions abatement effort and the marginal cost of cutting one unit of CO 2 emissions. For policy-makers, scholars and stakeholders in climate negotiation, MACC provides an illustrative guide to demonstrate the benefits of the emissions trading system. It helps to guide the estimation of permit prices and carbon taxes. It helps in the determination of a solution to achieve the most cost-effective emissions constraint target. Finally it helps in assessing the cost-effectiveness of various policy regimes (Ellerman and Decaux, 1998;Kesicki and Ekins, 2011;Klepper and Peterson, 2006).
The MACC has been widely used for global and country-specific scenario analyses.
Researchers have investigated the MACC in some recent and related studies.
However, provincial-level studies of this phenomenon are rare. Since China's government is a federal system and provinces are the major units responding to policy (Qian and Weingast, 1997), a province level MACC analysis can help to inform policy at a national level. In other words, it can assist the policy-makers to better identify the Marginal Abatement Cost (MAC) gap among regions and design a burden-sharing strategy. This paper attempts to fill this gap. We adopt a new strategy to develop a MACC for China's provinces by identifying the optimal modeling specification from among a set of competing specifications. The method which we use to estimate the MACC is one which is empirically tractable and easy to solve.
The main contribution of this paper is threefold. First, individual MACs for each province across the different years are estimated for a given production-based technology and then a MACC is estimated based on these individual MACs. Second, four types of commonly used MACC specifications are compared and the optimal one is chosen where the choice criteria are based on the model's in-sample and out-of-sample performance. Third, we apply this newly proposed MACC estimation method to simulate the cost of China's carbon reductions for the year 2020, which has important policy implications for China's low carbon strategy.
The remainder of the paper is as follows. Section 2 will discuss previous studies and summarize the advantages and weaknesses of each approach. Section 3 presents four types of empirical specifications. Section 4 introduces the data and variables. The empirical results for all specifications are presented and compared in section 5.
Section 6 simulates the economic cost for achieving China's carbon reduction target.
The last section is the conclusion.

Literature Review
The previous research on MACC falls into three broad categories in terms of modeling approach (De Cara and Jayet, 2011;Kesicki and Strachan, 2011).

Expert-based MACC
The first category is the expert-based MACC, which is also called the technology cost curve. It is an engineering bottom-up approach that assesses the emissions reduction potential and corresponding cost of each single technical option based on assumptions developed by experts. Then the technical options are ranked from least to most expensive to represent the costs of achieving incremental levels of emissions reductions.
The earlier MACC serves as a supply curve. For instance, Jackson (1991) constructs a least-cost supply curve for GHG abatement and uses this methodology to evaluate the cost-effectiveness of 17 technical options. His analysis shows that the main determinants of cost savings are energy efficiency and whether it is a renewable energy source. The most well-known case is the global MACC developed by the McKinsey Company. In their latest version, they conduct an in-depth evaluation of the reduction potential and corresponding cost for more than 200 GHG abatement opportunities across 10 sectors and 21 countries/regions in 2030 (Nauclér and Enkvist, 2009). Expert-based MACC also can be applied to specific sectors. To investigate the abatement potential in the agricultural sector, Moran et al. (2011) develop a MACC for crop and soil pollution in the UK by examining a range of specific abatement technologies/options in terms of their cost-effectiveness and mitigation potential.
Although it is easy to understand, the traditional expert-based MACC has also been heavily criticized (Kesicki and Strachan, 2011). Firstly, it treats each measure individually and neglects interactions between technical options and the associated co-benefit/co-cost. Secondly, it assesses only the technological cost and ignores the associated transaction cost. Thirdly, it only evaluates single-way impacts while neglecting the institutional and behavioral contexts. Finally, it works in a static way and neglects issues of inter-temporal dynamics and inertia.
In a recent study, Vogt-Schilb and Hallegatte (2011) attempt to improve the traditional expert-based MACC by incorporating a "cost in time" dimension. They argue that the more expensive options should be implemented before the potential of the cheapest ones has been exhausted if their potential is high and their inertia is significant. They suggest that policy-makers should consider these dynamic and inertia effects in determining the optimal implementation time of various GHG abatement measures.

Model-derived MACC
The second category is model-derived MACC. This specification integrates partial or general equilibrium models. For this branch of models, the most common way to generate a MAC curve is to run the model with different strict emission limits and to derive the corresponding CO 2 prices or to run the model with different CO 2 prices and calculate the corresponding CO 2 emission levels. Finally, researchers use the price-emission pairs to form a MAC curve.
The model-derived MACC can be further divided into two different types. One is engineering-oriented bottom-up models, such as the energy system model. Another is economy-oriented top-down models, such as Computable General Equilibrium (CGE) models (De Cara and Jayet, 2011;Klepper and Peterson, 2006). Both types of model simulate the equilibrium by either minimizing the system costs or maximizing consumer and producer surplus. However, the bottom-up models are usually partial equilibrium models which only cover the energy sectors, while the top-down models are usually general equilibrium models which cover endogenous economic responses in the whole economy. Ellerman and Decaux (1998)  baseline year, China's marginal abatement cost is expected to reach 12-216 $/t which corresponds to a reduction rate range from 5-45%. Klepper and Peterson (2006) exploit the Dynamic Applied Regional Trade (DART) model to simulate the mechanism path of how abatement level affects MACC through energy price. They found that the MACC is indeed determined by the initial energy price, the energy supply structure and the low-carbon potential. Morris et al. (2012) argue that the MAC is affected by policies abroad, i.e., the policies adopted in other third countries, the historic policy efforts and the coverage of tradable GHG. Fischer and Morgenstern (2006) adopt a meta-analysis strategy to explore the factors that contribute to the vast discrepancies in the MAC for different studies. They show that several modeling assumptions may alter the estimates of MAC. For example, the Armington trade elasticity assumption may underestimate the MAC while the perfectly mobile capital assumption may upwardly bias the estimates.
Although model-derived MACCs have the advantage of taking into account interactions between mitigation measures and inter-temporal interactions, they also have disadvantages. Firstly, the estimated results of the system models diverge significantly. They are quite sensitive to the choice of the model used and the model's underlying assumptions. Secondly, system models are usually large complicated black boxes. Most of the time, readers are made aware of a particular result but have no understanding of how this results is obtained and whether the parameters used in the model are properly set.

Supply-side/Production-based MACC
The third approach bases on production theory to derive the MAC. The production possibility set is determined by a set of detailed technical and economic constraints.
Given that the production process will generate both desirable output and undesirable by-products, the production unit has to sacrifice some profit by reallocating its productive resources to abatement activities to cut emissions at the margin. This constraint induced marginal cost can be interpreted as an opportunity cost (De Cara and Jayet, 2011;Klepper and Peterson, 2006).
There are two strategies to implement the empirical analysis. The first one is to specify a certain total cost function and then to obtain the marginal cost model by the first-order derivations, or to specify and estimate a marginal cost function directly.
Related studies include Hartman et al. (1997) on the US, De Cara and Jayet (2011) on the EU, Dasgupta et al. (2001), Wei and Rose (2009) and Zhou et al. (2013) on China, etc. 3 The main problem of this method is how to derive reliable cost information when such information is usually confidential.
Another major strand of the literature employs distance function frameworks to model the environmental production technology. Both the Shephard distance function and the directional distance function have been widely used (Chung et al., 1997;Shephard et al., 1970). The main advantage of the distance function method is that it only requires data on inputs and outputs which are much easier to derive than cost information. Chambers et al. (1998), Färe et al. (1993 and Färe et al. (2005) have undertaken much pioneering work in this field. Some empirical studies apply the non-parametric Data Envelopment Analysis (DEA) approach. DEA estimation is based on linear programming, and it aims to construct a piecewise linear combination of all observed inputs and outputs. Examples include Boyd et al. (2002), Kaneko et al. (2010), Lee et al. (2002), Maradan and Vassiliev (2005) and Choi et al. (2012) etc. The major advantage of the DEA approach is that it does not need to impose a specific functional form on the underlying technology (Zhang and Choi, 2014) The shadow price also can be estimated parametrically. The main advantage of the parametric approach is that the estimated frontier is everywhere differentiable. In previous studies, the Shepard distance function is usually specified with a translog functional form while the directional distance function is commonly represented by a quadratic functional form. Both the Shepard/translog and the directional/quadratic setting can be estimated using the Linear Programming (LP) method. Related studies include Coggins and Swinton (1996), Marklund and Samakovlis (2007) Wei et al. (2013), etc. The advantage of the SFA relative to the LP method is that the former takes statistical noise into account 5 . 4 More detailed review of the DEA approach in energy and environment analysis can be found in Song et al. (2012), Zhang and Choi (2014) and Zhou et al. (2008). 5 Recently, increased efforts have been put into improving carbon performance modeling and shadow price estimation techniques. The first research strand uses non-radial directional distance functions to incorporate slacks into the efficiency measure (Barros et al., 2012;Färe and Grosskopf, 2010;Zhang and Choi, 2013b;Zhou et al., 2012). The second strand uses the meta-frontier technique to incorporate group heteroskedasticity into the analysis (Battese et al., 2004;Oh, 2010;Zhang et al., 2013). The third branch uses the bootstrapping method to provide As most of the above studies reveal, discrete estimations of the MAC for different geographic spaces and time periods are not combined to form a continuous curve. This prevents researchers from being able to conduct a cost-benefit analysis using various abatement scenarios. This paper adopts the supply-side/production-based MACC strategy and aims to overcome its deficiencies. We first derive the MAC under the directional distance function framework by using panel data at the provincial level 6 . In the second step, various empirical specifications are used to fit our estimated MAC and the optimal parameterized MACC model will be identified according to different econometric selection criteria.
Compared to the expert-based MACC and the model-derived MACC, the supply-side/production-based MACC is solidly based in production theory and its interpretation is straightforward. It is also relatively transparent so that readers can easily appreciate the model in its entirety. Another attractive feature of our approach is that our estimations are based on provincial panel data, so that we are able to capture regional characteristics and a time trend. Moreover, we provide a series of functional forms for the MAC curve and choose the optimal one by using both in-sample fitting criteria and out-of-sample criteria, while previous studies usually only provide one option. Thus, our method has its own advantages. At least in some aspects, it represents an improvement to the approach used in previous studies.

Empirical Specifications
Assume the marginal CO 2 abatement cost curve is as follows: where y is the marginal CO 2 abatement cost; x is the carbon intensity (CO 2 /GDP), estimation errors and confidence intervals for the non-parametric DEA and the parametric linear programming methods respectively (Simar and Wilson, 1999;Zhang and Choi, 2014;Zhou et al., 2010). 6 An anonymous reviewer has pointed out that it would be interesting for future research to conduct comparative studies on the MACC internationally e.g. to conduct a comparison among China, Korea, Japan and other East Asian countries. Such a comparison might provide more comprehensive policy implications (Zhang and Choi, 2013a).
Z is the vector of covariates, ( ) f ⋅ is the function relating these variables 7 .
It is worth noting that our definition of the marginal abatement cost curve differs from the traditional one since we do not use the absolute quantity of CO 2 abatement but rather we use carbon intensity on the right-hand side of Eq.(1). CO 2 emissions in China increased persistently during the time window but these increases are relative to China's economic growth for the period. Therefore, simply taking the levels of CO 2 emissions makes it difficult to pinpoint how much abatement is taking place. In order to capture the idea of CO 2 abatement more accurately, Zhou et al. (2013) calculate the amount of CO 2 emissions reduction by multiplying GDP for the current year by the change in carbon intensity between the current and previous year 8 . Following the latter, we deal with this problem in a more convenient way, i.e. by using carbon intensity to proxy for the quantity of emissions reduction and by investigating the relationship between marginal abatement cost and carbon intensity.
It is reasonable to use carbon intensity as a measure of the quantity of emissions reduction. First, the measure of marginal abatement cost which we use in this paper is highly related to GDP and CO 2 emissions (this will become clear in the next section).
Second, adopting this carbon intensity proxy will not affect our resulting policy conclusions. In fact, using this carbon intensity measure aligns our analysis with those of policy-makers in China -the carbon reduction policies in China's 12 th FYP (2011)(2012)(2013)(2014)(2015) and other documents are mostly based on carbon intensity.
It is helpful to take a first look at the relationship between marginal abatement cost and carbon intensity by using a nonparametric method to begin with since the functional form is unknown. Specifically, we use the Locally Weighted Scatterplot Smoother (LOWESS) with a bandwidth of 0.8 and a tricube weighting function. 7 We are grateful to an anonymous reviewer for noting that our estimations capture the composite relationship of the shadow price and the other variables. Our analysis reports average effects, holding other things constant. We should note that access to more disaggregated sectoral information would open up interesting possibilities to researchers focusing on potential within-sector interactions among the covariates. Additionally, access to such data would allow researchers to relax the conventional assumption that the covariates exercise an equivalent impact across the different sectors. 8 Wei and Rose (2009) use the similar method when investigating the marginal cost curve of energy efficiency improvement.
[ Figure 1 is here] Figure 1 plots the LOWESS estimates of the relationship between marginal CO 2 abatement cost and carbon intensity. From Figure 1, we can observe an unambiguous nonlinear relationship between these two variables. The downward sloping curve means that it is more costly to reduce an additional unit of CO 2 emission for provinces in China with lower carbon intensities.
To estimate the relationship between the marginal CO 2 abatement cost and carbon intensity parametrically, we will consider four different functional forms for the function ( ) f ⋅ , which are widely used by the previous studies, i.e. quadratic ) functional forms (Chen, 2005;Criqui et al., 1999;Ellerman and Decaux, 1998;Morris et al., 2012;Nordhaus, 1991;Zhou et al., 2013). We will first estimate the MACC for all these four functional forms and then choose among these competing functional forms for the optimal specification.

Data and Descriptive Statistics
We use provincial level aggregate data that covers 30 provinces in China 9 . We constrain our analysis to the period of the 10 th and 11 th FYPs covering the years from 2001 to 2010 because policy was relatively stable during this period 10 .
The most important data needed for this paper is the MAC of CO 2 (denoted as MAC). Given that the cost data are unavailable, we have to estimate it ourselves. To do this, we resort to the method of shadow price estimation pioneered by Färe et al. (2005) which is based on the directional output distance function and multi-input multi-output production theory 11 . 9 Tibet is excluded because of the problem of data availability. The directional output distance function describes the simultaneous maximum expansion of good outputs and contraction of bad outputs that is feasible for any given production technology 12 . Typically, the directional output distance function can be defined as: is the vector of bad output, and ( , ) R is the directional vector. The shadow price of the j-th bad output given the market price of the m-th good output is as follows: Empirically, we employ the quadratic functional form to parameterize the directional output distance function. Additionally, we set the directional vector ( , ) (1,1) y b g g = to seek a simultaneous expansion of good output and reduction of bad output, a stylized fact arising from our reading of previous studies. The parameters of the quadratic function are estimated by the linear programming method (For more detailed description of the method, please refer to Appendix 1) 13 .
It is worth noting that the choice of different direction vectors will lead to different shadow price estimates. Given the direction of good output y g , a larger value of bad the accuracy of the estimates. 12 The shadow price of the pollutant can also be estimated using the Shephard distance function. The difference is that the Shephard distance function expands the good and bad outputs proportionally, whereas the directional output distance function allows for a particular direction in which each output is to be expanded or contracted. The directional distance function is comparatively more flexible than the Shephard distance function. Indeed, the latter represents a special case of the directional distance function (Chung, et al., 1997). Moreover, Vardanyan and Noh (2006) find that the quadratic-based directional output distance function is more appropriate for application in shadow-pricing studies than the translog-based Shephard output distance function due to its mapping flexibility. 13 The directional output distance function also can be estimated parametrically using the Stochastic Frontier Analysis (SFA) method. The advantage of SFA is that it takes statistical noise into account. However, the SFA method cannot incorporate constraints into the estimation. Previous studies proceed by initially running the SFA estimation and ignoring the constraints. Researchers subsequently examine whether the results meet the constraints ex-post, only retaining for further analysis those results meeting the constraints (Färe, et al., 2005;Murty, et al., 2007). However, the exclusion of some observations may introduce inconsistency into the estimated parameters because these parameters have been estimated using the overall sample. As a result, the estimated shadow prices may be biased. output direction b g will lead to a higher estimated shadow price (Vardanyan and Noh, 2006;Zhou et al., 2014a).
In order to estimate the provincial shadow price of CO 2 reduction, we consider the case of one good output, annual regional Gross Domestic Product, one bad output, carbon dioxide emissions, and three inputs, labor, capital and energy. The data for GDP is deflated to the 2005 price to net out the effect of inflation. Labor input is measured as the number of employed persons at the end of each year. The data for GDP and labor inputs are both obtained from the China Statistical Yearbooks. Energy consumption is measured in the standard coal equivalent, which is collected from the provincial statistical yearbooks.
The data for capital stock is not directly available from any of the statistical yearbooks. Thus, we estimate it by the following perpetual inventory method as Zhang et al. (2004) have proposed: where , Similarly, we need to estimate the bad output, provincial CO 2 emissions. Following IPCC (2006) and Du et al. (2012), we estimate CO 2 emissions from the burning of fossil fuels by the following formula: where i represents an index of different types of fossil fuels. We consider the consumption of 7 different primary fuel types, i.e. coal, coke, gasoline, kerosene, diesel, fuel oil and natural gas. The term 44/12 is the ratio of the mass of one carbon atom when combined with two oxygen atoms to the mass of an oxygen atom. The variables E i , CF i , CC i and COF i represent the total consumption, the relevant transformation factor, the carbon content and the carbon oxidation factor of fuel i, respectively. The data for provincial fuel consumption is taken from the regional energy balance tables in the China Energy Statistical Yearbooks.
It is interesting to briefly examine the estimated shadow prices. 14 Figure 2  [ Figure 2 is here] Once the estimates for CO 2 emissions are derived, then the provincial carbon intensities (denoted as Cintens) can be calculated. Furthermore, in the regression model, we consider the following covariates to control the provincial characteristics 16 .
Composition of Energy Consumption (denoted as ratio_coal). The relative CO 2 emissions of fossil fuels vary considerably. Specific CO 2 emissions from coal burning are 1.6 times that from natural gas and 1.2 times that from oil (Zhang, 2000). To control for potential provincial varying trends in fuel mix, we use the share of coal usage in total energy consumption to proxy for the composition of energy 14 If the readers are interested in more detailed shadow price estimates, please contact the authors. 15 The MAC estimates itself can be applied to emission allowance allocation. Zhou et al. (2014a) show how emission allowance can be allocated to different provinces and periods with efficiency as a criterion based on several centralized DEA models. Zhou et al. (2015) further argue that grandfathering allocation plan of initial emission allowances may benefit the heavy industries while hurt the light industries. They suggest that the MACs of the participants should be used as a supplementary criterion in the initial allocation of the allowance in order to establish a fair carbon market. 16 For our regressions, we used economic theory and previous research to inform our choice of control variables. Thus, we are reasonably confident that omitted variable bias is not a major issue in our analysis. Adding further control variables can certainly reduce the risk of omitting variables and generally reduce the variance as well. However, we have to achieve a good balance between the need for accuracy and parsimony.
consumption. The data on coal and total energy consumption is derived from the China Energy Statistical Yearbooks.

Industrial Composition (denoted as ratio_heavy).
Usually heavy industry is more energy intensive than light industry and consequently produces more CO 2 emissions.
Thus, it is helpful to control for the potential variation in the composition of industry among provinces over time. We measure industrial composition by the share of heavy industry over total industry for each province in terms of the value of gross output.
The required data is taken from the statistic yearbooks of each province over the relevant years.
Urban Concentration (denoted as ratio_urban). The relationship between urban concentration and its effect on energy consumption and CO 2 emissions has been investigated in several studies (Karathodorou et al., 2010;Shim et al., 2006). Thus, it is appropriate to control for the variations of provincial urbanization level over time.
We use the proportion of the non-agriculture population to the total population to proxy for the level of urbanization. All data is derived from the China Population Statistics Yearbooks and the China Population and Employment Statistics Yearbooks.
Privately-Owned Vehicles (denoted as private_car). China has experienced a tremendous growth in motor vehicles during the past decade. Until the late 1990s, automobiles in China were mainly owned by state-owned enterprises and government officials, but recently the number of privately-owned cars has grown rapidly (Auffhammer and Carson, 2008). The CO 2 emissions from motor vehicles have already had a detrimental impact in China (Riley, 2002). Thus we include privately-owned vehicles in our regressions, as measured by the number of privately-owned vehicles per 10 thousand persons. The data for privately-owned vehicles is derived from the China's Auto Market Almanac. [

Empirical Results
In this section, we will present the regression models and report the estimation results for the four different types of functional form. To search for the optimal specification, we estimate 6 step-wise regressions for each functional form by including different covariates. Then, we use both in-sample fitness criteria and out-of-sample forecasting criteria to choose the optimal regression model.

Quadratic Functional Form
For the quadratic MACC estimation, we consider the following two-way panel regression model 19 : we should also note that they are not identical. The shadow price reflects the opportunity cost, whereas the market price is heavily determined by the supply and demand of permits. Accordingly, the market price does not necessarily reflect all the abatement costs (Smith, et al., 1998;Wei, et al., 2013). Indeed, Vardanyan and Noh (2006) similarly find that no single estimation technology produces outcomes that are consistently close to the market prices of allowances. 18 Matsushita and Yamane (2012) use almost the same methodology to estimate the shadow price of CO 2 emissions in Japan and obtain a much lower value than ours. Possible reasons for this discrepancy are as follows. First, Matsushita and Yamane (2012) confine their analysis to Japan's power sector while we focus on China's overall economy. China's power sector is recognized as the main CO 2 emissions emitter (especially coal-fuelled plants). Thus, it is relatively cheap for the power sector to reduce CO 2 emissions compared to other sectors (e.g. services sector). We consider all sectors (i.e. agriculture sector, financial sector and education sector, etc), almost all of which are less carbon intensive than the power sector implying that it becomes more expensive for these sectors to cut CO 2 emissions. In this context, our higher estimated average shadow price is plausible. Secondly, this results discrepancy may be due to the use of different estimation techniques. Though both papers employ a quadratic directional functional form, there are slight differences. In our paper, we add both province and time dummies to capture the effects of provincial idiosyncratic effects and technology change while Matsushita and Yamane (2012) only consider time effects. 19 We are grateful to an anonymous reviewer for pointing out that our model is not focused on intertemporal aspects, i.e. the influence from previous years or expectations about future years.
where it y is the shadow price of province i in year t ; it x is the carbon intensity; it Z is the vector of covariates; i µ is the provincial specific characteristics; t λ is the time effect; and it ε is the error term. As mentioned before, we estimate 6 step-wise regressions. Then, we search for the optimal regression by applying in-sample and out-of-sample criteria.
We test the group-wise heteroskedasticity for all six regressions by using the modified Wald statistic (Greene, 2000). The null hypothesis of this test is that there is no group-wise heteroskedasticity. At the same time, we implement the Wooldridge test for serial correlation in panel data models for all the six regressions (Wooldridge, 2002 Table 2.
[ Table 2 is here] Model 1 includes only carbon intensity and its quadratic term as independent variables. The estimation results show that both coefficients are significant at the 1 percentage level. The coefficient for the quadratic term is positive, indicating that the shape of the estimated MACC should be U-shaped (quadratic).
Model 2-5 controls for industrial composition, the structure of energy consumption, urban concentration, and privately-owned vehicles. Model 6 furthermore includes the time trend in its logarithmic form in line with Auffhammer and Carson (2008) to control for possible technology change. The estimation results show a relatively stable relationship between the marginal CO 2 abatement cost and the carbon intensity. All regressions report negative coefficients for the carbon intensity and positive coefficients for its quadratic term. They are all significant to the 1 percentage level.
Additionally, the coefficients of carbon intensity as well as its quadratic term in Model 2-6 are very close to those coefficients obtained in Model 1.
The last four rows reported in Table 2 are in-sample information and out-of-sample information criteria. Both the AIC and BIC criteria show that Model 6 is the optimal specification in the sense of its in-sample fitness. However, Model 5 is the optimal one when we judge based on out-of-sample forecasting criteria since both the MAE and RMSFE for Model 5 outperform the corresponding values in the other regressions.
Because the coefficient of the time trend in Model 6 is insignificant even at the 10 percent level, we therefore consider Model 5 to be the optimal specification for the quadratic MACC estimation.
Once the optimal regression model is determined and the coefficients are estimated, it is easy to calculate the axis of symmetry of the parabola to be about 6 ton/10000 Yuan. Accordingly, the overall average for carbon intensity in China during the sample period is about 2.88 ton/10000 Yuan and none of the yearly average carbon intensity exceeds 3.2 ton/10000 Yuan. This means that the marginal CO 2 abatement cost curve relates to the downward part (left-side) of the U-shaped curve.
[ Figure 3 is here] Figure 3 simulates the estimated quadratic functional form MACC with the covariates set at their average values. We can see from the figure that for values of carbon intensity lying below 6 tons/10000 Yuan, the marginal CO 2 abatement cost will rise increasingly rapidly with decreasing carbon intensity.

Logarithmic Functional Form
Consider the following two-way logarithmic functional form MACC regression model: where the variables are defined as in Eq.(6). The difference is that carbon intensity in Eq. (7) is expressed in logarithmic form and its quadratic term is excluded from the regression.
Similarly, we test the group-wise heteroskedasticity and autocorrelation for all the six regressions by using the modified Wald statistic and Wooldridge test respectively.
The results show that all six regressions suffer from problems of heteroskedasticity and autocorrelation. As before, we resort to using the method of FGLS for panel data, with the assumption of heteroskedasticity error with no cross-sectional correlations and panel specific AR(1) autocorrelation within panels. Table 3 reports the estimation results.
[ Table 3 is here] As before, Model 1 simply investigates the relationship between marginal CO 2 abatement cost and carbon intensity without controlling for any of the other factors.
The estimation results show that the coefficient of carbon intensity is significant at the 1 percent level. The coefficient is negative, indicating that the marginal CO 2 abatement cost curve has a downward slope.
Similarly, Model 2-6 furthermore includes industrial composition, the composition of energy structure, urban concentration, privately-owned vehicles and a time trend.
The estimation results show that the coefficients of carbon intensity in all these regressions are significantly negative. The results are stable and very similar as well as being similar to the results obtained in Model 1.
The results for the AIC and BIC show that Model 6 is the optimal specification in the sense of in-sample fitness. However, Model 5 is the optimal specification according to the out-of-sample forecasting criteria, MAE and RMSFE. Since the time trend is seen to be insignificant, we consider Model 5 as the optimal specification for the logarithmic functional form MACC estimation.
[ Figure 4 is here] Figure 4 simulates the marginal CO 2 abatement cost curve adopting the logarithmic functional form. From the figure, we can observe that the curve is downward sloping and convex, which means that China has to sacrifice proportionately more to reduce an additional unit of CO 2 emission as carbon intensity further decreases.

Exponential Functional Form
To estimate this exponential functional form ( ax b y e + = ), we need to take the logarithm on both sides first, and then include the covariates and error term. Thus, we have the following two-way exponential functional form MACC regression model: Note that the shadow price in the regression is now expressed in its logarithmic form while the carbon intensity is expressed as a level. The other variables are similar to Eq.(6) and Eq. (7).
Again, we test the group-wise heteroskedasticity and autocorrelation for all six regressions by implementing the modified Wald test and Wooldridge test respectively.
Our results indicate that all six regressions experience problems of heteroskedasticity and autocorrelation. Thus, we employ the method of FGLS once again, assuming that the structure of the error term within groups is heteroskedastic and there is panel specific within-group AR(1) autocorrelation. Table 4 reports the estimation results.
[ Table 4 is here] The estimation results listed in Table 4 show that, regardless of whether we control or not for additional covariates, the coefficients of carbon intensity are negative and significant at the 1 percent level, indicating that the MACC is a downward sloping curve. Compared with the previous two functional forms, the results are similar except for the coefficient of the time trend. The information criteria show that regression 5 is the best one in the sense of its in-sample fitness. However, regression 6 performs best in terms of its out-of-sample forecasting ability. We therefore choose regression 6 as the optimal model for the exponential functional form MACC estimation since our investigation is aimed at providing a basic tool for future policy analysis.
[ Figure 5 is here] Figure 5 simulates the exponential functional form MACC with the covariates set at their average values. From the figure, we can again observe that the marginal abatement cost will increase with the decline in carbon intensity. The curve is convex, though this convexity is not as pronounced as the previous two functional forms.

Power Functional Form
For the power functional form ( b y ax = ), we need to take the logarithm on both sides. Then, we can construct the following two-way panel regression model: We can see that both the shadow price and carbon intensity in the regression model are now expressed in the logarithmic form. The covariates are as used before.
Likewise, the modified Wald test and Wooldridge test are implemented, and the results show significant evidence of group-wise heteroskedasticity and autocorrelation for all six regressions. To avoid these problems, we run the FGLS, assuming that the structure of the error term is heteroskedastic without cross-sectional correlations as well as auto-correlated with panel specific AR(1) specification. Table 5 reports the estimation results.
[ Table 5 is here] From Table 5, we find that the coefficients of carbon intensity are negative and significant at 1 percent level, indicating a downward sloping curve. On average, a 1 percent decrease in carbon intensity will induce a 0.245-0.397 percent increase in the shadow price, ceteris paribus. The coefficient of the time trend in regression 6 is not significant even at the 10 percent level. The AIC and BIC show that regression 5 is the optimal one from the view of its in-sample fitness, while the MAE and RMSFE reveal that regression 6 is the best in terms of its out-of-sample forecasting ability. We choose regression 5 as the optimal specification since the time trend coefficient is insignificant.
[ Figure 6 is here] Figure 6 plots the simulated power functional form MACC. From the figure, we find that the MACC clearly represents a downward sloping convex curve. This means that the marginal abatement cost increases more rapidly with additional decreases in carbon intensity.
To choose the optimal functional form, we can resort to the in-sample fitness criteria and out-of-sample forecasting criteria as well. From our estimation results, we can observe that the quadratic and logarithmic functional forms perform much better than the exponential and power functional forms since both the in-sample and out-of-sample criteria of the former two functional forms are much lower. The choice between the quadratic form and the logarithmic form is not so clear cut, but it is still possible for us to make an unambiguous choice. Although the BIC of the logarithmic form is lower, the other three criteria of this functional form, i.e. AIC, MAE and RMSFE, are higher. Thus, we are inclined to take the quadratic functional form as the optimal one. Moreover, the quadratic functional form is more flexible than the logarithmic functional form. Thus, we suggest to choose the quadratic functional form for policy analysis.

Simulation the Cost of China's Carbon Reduction
Having estimated the MACC 20 , we are able to conduct some policy analysis. One obvious place to start for policy is to estimate the economic cost of achieving the Chinese government declared CO 2 reduction goal -i.e. to reduce the carbon intensity 20 Strictly speaking, the national MAC for each year should be firstly estimated and then a national MACC can be econometrically derived. However, this procedure requires a considerable stretch of time-series data. Given our short period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010), we use the convention of taking the panel dataset of provincial MACs derive a weighted national MACC. The national MACC accordingly is comprised of the average of provincial MACCs, assuming that all parameters of the provincial MACCs are the same.
by 40-45 percent by the year 2020, compared with 2005 levels. We simulate the cost of CO 2 reduction based on the quadratic MACC since it is the optimal functional form.
To focus on the relationship between the marginal abatement cost and carbon intensity, the covariates in 2020 should be predetermined. We consider three scenarios according to the setting of covariates, including Business As Usual (BAU) scenario, Fast Development (FD) scenario and Slow Development (SD) scenario. Table 6 reports the details of the scenario setting.
[ Table 6 is here] The share of heavy industry. He et al. (2009) (2011)(2012)(2013)(2014)(2015) declares that the government aims to reduce the share of coal consumption by about 3 percentage points. Under pressure to undertake further cuts in carbon emissions, it is reasonable to assume that the share of coal consumption in China will decline further over the period 2016-2020. We assume that, for the BAU scenario, it will decrease 3% for 2011-2015 and will decrease a further 5% for the period 2016-2020. For the FD scenario, we assume that it will decrease 5% for both the 2011-2015 and 2016-2020 periods respectively. For the LD scenario, we assume that it will decrease by 3% for both the 2011-2015 and the 2016-2020 periods.
Considering that the urbanization level increased by 7 percentage during the period 2001-2010 and the government stresses the importance of urbanization for China's economic development during the next ten years, we have good reasons to expect a continuing trend of increased urban concentration for the next decade. We assume that, for the BAU scenario, it will increase 4% for both the 2011-2015 and the 2016-2020 periods. For the FD scenario, we assume that it will increase 5% both the 2011-2015 and the 2016-2020 periods. For the LD scenario, we assume that it will increase 4% for 2011-2015 and increase 3% over the duration of the 2016-2020 periods.
The private-owned car ownership. Huo and Wang (2012)  [ Table 7 is here]

Conclusion
This paper tries to estimate the MACC of CO 2 emissions in China. In a first step, The estimated MACCs are downward sloping and convex when specified in logarithmic, exponential and power functional forms. It means that China has to incur increasingly high costs in the process of cutting down its CO 2 intensity and achieving the ambitious 40-45 percent target as promised. In the U-shaped quadratic case, the turning point of carbon intensity is around 6 tons/10000 Yuan. For the provinces having CO 2 intensities in excess of the turning point 22 , it means that CO 2 abatement activities are beneficial since the marginal cost will decrease with the decrease in carbon intensity. However, for other provinces with carbon intensity lower than the turning point, it will become more expensive to control an additional unit of emissions since the MAC will increase more rapidly with successive cuts in CO 2 intensity.
The simulation of cost changes due to carbon intensity reduction shows that, the Chinese government has to bear a 51-57 percent increase in marginal abatement cost for achieving a 40-45 percent reduction in carbon intensity compared with its 2005 level. Fortunately, the decline of carbon intensity (or low-carbonization) does not only necessitate an economic cost. The improvement of environmental quality and social welfare is normally treated as a social benefit although it is hard to measure this benefit in money terms.

Our results have important implications for different stakeholders and MACC users.
For empirical researchers, this production-based approach provides, in some aspects, a better alternative way to estimate the MACC. Our approach offers sufficient flexibility that relevant environment variables can be integrated. Additionally, it is relatively transparent and easy to apply. At least, the estimation via a production-based approach can be used as a benchmark or comparison when different approaches are adopted.
For policy-makers, this MACC offers a strong tool and is sufficiently informative to guide policy design and implementation. It can be used to simulate the cost consequences of various reduction exercises e.g. the cap-and-trade system and the carbon tax policy etc. Then, policy-makers can choose an optimal option from among the different affordable options, such as how to set a feasible carbon reduction target, how to allocate the initial permits for the cap-and-trade market and how to decide the carbon tax rate, etc. Relative to carbon tax, the Chinese government is more interested 22 There are four provinces with carbon intensity higher than 6 ton/10000 Yuan in specific years: Shanxi (2001)(2002)(2003)(2004)(2005), Inner Mongolia (2004Mongolia ( ,2006, Guizhou (2001Guizhou ( ,2003Guizhou ( -2007, Ningxia (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) in carbon emissions trading systems. Actually, China has already launched seven regional pilot carbon emissions trading schemes in Shenzhen, Shanghai, Beijing, Guangdong, Tianjin, Chongqing and Hubei. For the next step, the Chinese government aims to build a nation-wide carbon emission trading market to improve the efficiency and fairness of carbon reductions.
It is worth noting that our method proposed in this paper is heavily relied on the

Appendix 1: Estimation of Shadow Price
The shadow price of a pollutant can be estimated within the framework of multi-input multi-output production technology which considers pollutants as byproducts.
Suppose that a producer employs a vector of inputs The directional output distance function describes the simultaneous maximum expansion of good outputs and contraction of bad outputs that is feasible for any given production technology. Formally, the directional output distance function is defined as ( , , ; , ) R is a directional vector which specifies the direction of the output vector.
The directional output distance function satisfies the translation property: ( , , ; , ) ( , , ; , ) where α is a scalar. This property says that if the desirable output is expanded by y g α and the undesirable output is contracted by b g α simultaneously, the resulting value of the directional output distance function will be reduced by α .
By evoking the duality between the directional output distance function and the revenue function, Färe et al. (2005) is able to derive the shadow price of the j-th bad output given that the market price of the m-th good output is known.
We consider the case of three inputs, one good output and one bad output. Assume that there are k=1,…,K provinces producing in t=1,...,T years. Then, the quadratic directional output distance function for province k in year t can be represented as  The first set of restrictions (i) ensures that all observations are feasible, which implies that each observation is located either on or below the boundary. Restrictions in (ii) impose the null-jointness property, which means that, for y>0, the output bundle (y, 0) is not technically feasible. Restrictions in (iii) and (iv) are monotonicity assumptions in bad and good outputs respectively, which ensures the correct sign of the calculated shadow prices. Restrictions in (v) impose positive monotonicity constraints on the inputs for the mean level of input usage, which means that, at the mean level of inputs, an increase in input usage holding good and bad outputs constant causes the directional output distance function to increase. The parameter restrictions given by (vi) impose translation property. Additionally, the symmetry restrictions are imposed in (vii).
Once the parameters of the directional output distance function have been estimated, we are able to calculate the shadow price of the bad output for each province in each year. The shadow price of the bad output can be written as It's preferred to choose the model with the smallest value of RMFSE and MAE. In 36 this paper, we use the first 5 years' observations for the estimation of the coefficients and save the last 5 years' observations for the calculation of RMFSE and MAE.    2) *** , ** and * represent significant at 1%, 5% and 10% levels respectively.   2) *** , ** and * represent significant at 1%, 5% and 10% levels respectively. 2) *** , ** and * represent significant at 1%, 5% and 10% levels respectively. Ln (  2) *** , ** and * represent significant at 1%, 5% and 10% levels respectively.