Determinants of Urbanization

In light of the United Nations' (UN) latest urbanization projections, particularly with respect to India and the People's Republic of China, a good understanding is needed of what drives aggregate urbanization trends. Yet, previous literature has largely neglected the issue in favor of studying urban concentration. Taking advantage of the latest UN World Urbanization Prospects, we use an instrumental variables approach to identify and analyze key urbanization determinants. We estimate the impact of gross domestic product (GDP) growth on urbanization to be large and positive. In answer to Henderson's (2003) finding that urbanization does not seem to cause growth, we argue that the direction of causality runs from growth to urbanization. We also find positive and significant effects of industrialization and education on urbanization, consistent with the existence of localization economies and labor market pooling.


I Introduction
The purpose of this paper is to identify and analyze determinants of the urbanization rate, an issue which has so far received limited attention in the academic literature. Ongoing and future urbanization, particularly in Asia and Africa, presents both opportunities and challenges for many, and a good understanding of the determinants of urbanization is crucial for development planning, business strategy setting, and even allocation of aid ows.
Previous literature has prioritized urban concentration, i.e., the degree to which a country's urban population is concentrated in one or two major cities (such as in Cambodia, Mongolia, or Japan), rather than spread over many smaller cities (such as in India and the People's Republic of China (PRC)).
Labelling urbanization a transitory phenomenon, Henderson (2005) argues that the priority given to urban concentration is arguably appropriate. Indeed, Figure 1 illustrates that developed countries as well as former Eastern European Socialist countries seem to have converged to a steady state level of urbanization around 1980, with little change over the last 30 years. However, the gure also illustrates a historical gap between the developed world, Latin America, and former Soviet countries on one side, and Asia and Africa on the other. Asia and Africa persistently lag behind, but have started to catch up. 1 In light of the United Nations' (UN's) urbanization projections for the upcoming decades, particularly with respect to the PRC and India, urbanization will continue to be high up on the policy agenda of developing countries for substantial time to come. While it may be a transitory phenomenon, it is an ongoing and essential part of economic development, and as such an interesting subject for academic research.
Looking at the data, the past 50 years have seen a surge in the urban population of many countries around the world, with little indication of slowing down in the near future. The latest revision of the UN World Urbanization Prospects (2011) 2 predicts the world's urban population to increase by 1.4 billion between 2010 and 2030, implying that close to 60% of the world's population (currently 50%) will live in cities by 2030. The PRC alone (which accounts for 270 million of the predicted increase) will have 221 cities with a population of one million or more compared with 35 such cities in Europe today. 3 In addition to the size of current urbanization trends, the speed with which metropolitan areas attract rural residents is unprecedented: A comparison of the time that it took large cities to grow from 1 million to 8 million inhabitants yields a period of 130 years for London, 45 years for Bangkok, 37 years for Dhaka, and 25 years for Seoul. 4 This ruralurban migration has wide-ranging implications along many dimensions, most notably economic performance and eciency, environment 1 Note that there is substantial within-continent heterogeneity, such as between North Africa and Sub-Saharan Africa, or between South Asia and West Asia/the Middle East. One implication is that the large urbanization surges experienced by individual countries (like the PRC or Brazil) may be diluted in the gure due to continent aggregation. 2  and infrastructure, as well as education and health.
An immediate question to ask is why cities develop and exist. Why is it such an economic law that countries urbanize as they develop? The standard answer suggested by an extensive body of research is that economic development involves the structural transformation from an agricultural-based economy to an industry service-based economy.
5 Industrialization in turn is believed to involve urbanization, as externalities of scale in manufacturing and services attract rms and workers into the cities. 6 The literature on scale externalities and knowledge spillovers is enormous, and has served as a basis to explaining the forces of agglomeration that are central to the study of urbanization. The idea of scale externalities goes back to Marshall (1890), who suggested that rms' production costs decrease with the size of their own industry, e.g., through better local infrastructure and within-industry knowledge spillovers. The subsequent literature distinguishes between such localization economies (scale externalities arising from the local concentration of economic activity within an industry, i.e., from local industry size) and urbanization economies scale economies arising from the agglomeration (and possibly diversity) of economic activity per se, i.e., from city size. As suggested by Jacobs (1969), the latter may be relevant in particular for industries which rely heavily on R&D and marketing. Attempts to model the microfoundations of such externalities are numerous and include discussions on labor market pooling, input sharing, and knowledge spillovers. 7 However, few studies focus on the occurrence of urbanization as such, despite a considerable literature on urban concentration, i.e., the geographical dispersion of a given urban population. Much of the theory literature has focused on equilibrium city sizes, and endogenized the trade-o between scale externalities in production versus rising costs of housing and congestion. 8 Zipf 's Law has been promoted as an approximation to the equilibrium distribution of city sizes, whereas Gibrat's Law arguably provides insights into city growth processes.
9 Finally, an important strand in the literature are the so-termed coreperiphery models, following the inuential work of Krugman (1991) on spatial agglomerations. The core-periphery models examine the conditions under which manufacturing and population agglomerations concentrate in one region, rather than spreading over several regions. 10 However, both endogenous models of city sizes and core-periphery models provide few insights into what determines the total urban population of a country, independently of its distribution across cities. What causes people to relocate from rural areas to the cities in the rst 5 . 6 See e.g., Henderson (1974), Quigley (1998), and Duranton and Puga (2001). 7 See e.g., Rosenthal and Strange (2001) for an examination of the microfoundations of agglomeration economies for the United States (US) manufacturing industries.
8 See e.g., Henderson (1974), as well as the core-periphery reversal in Helpman (1998) and Tabuchi (1998). 9 Zipf's Law suggests that the equilibrium distribution of city sizes can be approximated by a pareto distribution, such that city rank multiplied by city size is a constant. Gibrat's Law alleges that a city's growth rate is independent of city size. For a study on empirical validation, see Ioannides and Overman (2003) or Black and Henderson (2003). 10 For a review of core-periphery models, see . intervals. An instrumental variables (IV) approach is employed to identify, as well as attempt to quantify, the eect of key drivers of urbanization. Particular consideration is given to GDP growth, education, and industrialization. In addition, we include and review a comprehensive set of controls, including trade, infrastructure, and political factors. Results can be used to infer how much of empirically observed urbanization rates is associated with these key drivers, and provide a rst indication regarding which part of urbanization may be due to Determinants of Urbanization | 5 country-specic factors (such as the Hukou system in the PRC). 11 Our country-level panel data approach constitutes a departure from the often more micro-level studies on the determinants of agglomeration economies. 12 It also departs from empirical studies on urban concentration, which generally use data from cities or metropolitan areas, but do not include rural data. While it is certainly useful to look at factors of urbanization at a micro-level, perhaps focusing on industry-specic scale externalities, a big picture is missing as to which factors drive aggregate urbanization trends. This paper attempts to provide this big picture, asking which factors cause population shifts from rural to urban areas. One advantage of this approach is that the results are more likely to incorporate general equilibrium eects, especially since our data runs in 5- year intervals. From a policy perspective, aggregate changes in urbanization in themselves are highly relevant to many policy debates, as currently in the case of the PRC with its wealth of small cities around 1 million in addition to several megacities, and the resulting policy implications for infrastructure and public services.
Our analysis nds that the well-known and large positive correlation of GDP level with urbanization rate (as measured by percentage of population living in urban areas) disappears as soon as we control for a range of other factors, such as education level, industrialization, and trade. This suggests that urbanization may be better explained with a country's development in a range of economic and human dimensions, rather than just with income per se. As expected, we nd a negative conditional correlation of urbanization with GDP growth (faster growing countries are, as yet, less urbanized). However, our instrumental variables estimates suggest that the causal impact of GDP growth on urbanization may be large and positive. Given the inability of previous studies to nd a signicant eect of urbanization on growth, we argue that the direction of causality runs from GDP growth to urbanization, rather than vice versa. We also nd positive and signicant eects of industrialization as well as education on the urbanization rate, which is consistent with the existence of localization economies and labor market pooling. We conduct several robustness checks, and nd that the eect of growth is somewhat sensitive to specication. In contrast, the eects of education and industrialization on urbanization are robust in both qualitative and quantitative terms.
The paper proceeds as follows: Section II reviews the related literature on determinants of urbanization. Section III outlines the empirical strategy, discusses the data, and presents regression results. Section IV concludes.
11 The latter inference relies on the strong assumption that the drivers considered in our analysis are the only determinants which are relevant in a cross-country setting. This is unlikely, so inference about country-specic residuals can only constitute an upper bound. 12 See the studies on agglomeration in the US and Brazil by Rosenthal and Strange (2001) and Michaels, Rauch, and Redding (2012).
Determinants of Urbanization | 6 II Related Literature In spite of the substantial literature on scale externalities and spatial concentration, very few studies focus explicitly on the factors driving urbanization rates. Most research modeling urbanization as such takes as given an exogenous productivity gap between rural and urban areas, with migration limited by migration costs, exogenous skill acquisition, and inecient labor allocation rules (such as minimum wages). These so-called dual economy models then study the eect of government policies (such as trade protection policies, migration restrictions, and infrastructure investments) on migration ows.
13 An immediate implication of this literature is that rural-urban dynamics are heavily inuenced by government favouritism towards the urban sector (or in some cases of former planned economies, by a government bias towards rural areas).
An early empirical study on urbanization is Pandey (1977), who uses Indian state-level census data to regress urbanization rates on population density, industrialization (as measured by non-agricultural employment), cropping intensity (as a proxy for agricultural development), per worker income, literacy rate, and population growth. He nds a signicant positive eect of industrialization, a negative eect of cropping intensity, and no eect of average worker income.
As his estimates are based on a simple cross-section OLS, they do not permit causal inference due to endogeneity issues. Similar concerns apply to the study of Chang and Brada (2006), who run a pooled cross-section OLS of urbanization on per capita GDP and apply their results to the Chinese context. Moomaw and Shatter (1996) look at a wider range of determinants (such as per capita GDP, industrialization, export orientation, foreign assistance, and political factors), and study how their link with the urbanization rate compares to their link with metropolitan concentration (percentage of urban population in cities greater than 100,000) and with urban primacy (percentage of urban population in largest city). Given a limited dataset of 3 observations per country, they rely on a pooled cross-section approach with regional and time dummies, which also suers from endogeneity concerns. A paper worth mentioning specically with respect to the importance of knowledge accumulation in cities is Black and Henderson (1999), who nd that individual city sizes in the US grow with human capital accumulation, as measured by the percentage of college educated workers in the labor force.
To the authors' knowledge, the only paper which attempts to quantitatively examine the causal mechanisms relating urbanization and GDP growth via an IV/GMM approach is . In a cross-country panel setting, he estimates the eect of both urbanization and urban concentration ( primacy ) on productivity growth (growth of output per worker), using instrumental variables to deal with endogeneity. He nds a signicant eect of urban concentration on productivity. His quadratic functional form specication allows him to calculate an optimal level of urban concentration, which turns out to decline with economic development (as measured by output per worker). More importantly for 13 One of the most prominent models is Harris and Todaro (1970). Also see Renaud (1981). For a comprehensive review, see .
Determinants of Urbanization | 7 our analysis, his study nds no signicant causal eect of urbanization on per worker output. His results suggest that GDP growth is not strongly driven by urbanization rate per se. Considering a raw correlation of 0.85 between urbanization and GDP in his data, an obvious question to ask is whether the causality runs in the opposite direction, i.e., whether GDP growth causes urbanization. This is one of the questions that our paper sets out to answer.

III.A Empirical Model
The aim of our analysis is to quantify the relationship between urbanization and its key determinants. Given the focus of the theory literature on scale externalities, structural transformation and knowledge spillovers, we hypothesize these to be growth of per capita GDP, industrialization, and education. To establish basic conditional correlations, we start with a naïve OLS panel estimation of the equation where urban it is the urbanization rate of country i in year t (dened by the share of total population living in urban areas), µ it is a country xed eect (for country-specic factors like geography and culture), λ it is a year xed eect (for country-invariant time shocks or trends), education it is measured in average years of schooling of the adult population, indus it is industrialization, measured as non-agricultural share of GDP, popdensity it is the population per square kilometer of land, popgrowth it is the average annual rate of population growth (in 5-year growth averages), and trade it is the volume of exports plus imports as a percentage of GDP. The interaction indus * trade it serves as a proxy for manufactured exports rather than agricultural exports. P rimacy it is a measure of urban concentration (population of the largest city as a percentage of the total urban population). Democracy it is an index for democratic systems (it is the polity2 indicator from Polity IV), which takes on values between +10 (for a fully democratic system) and 10 (for a fully autocratic one). Instability it is a self-constructed dummy for times of political instability, which switches on if there has been a regime change in the last 5 years (where a regime change is dened as a change of three or more points in the democracy index). Finally, roaddensity it (km of roads per square km of land area) is used as a proxy for infrastructure. The results of the OLS estimation are in Table 1. Econometric issues with this specication are discussed below, and Section III.C presents an instrumental variables regression as well as an estimation in rst dierences.

Discussion of Regressors
The choice of controls in equation 1 is based on the literature. Per capita GDP has been included in logs rather than in levels, as our data show a clear loglinear relationship between GDP and urbanization rate (see Figure 2). Moomaw and Shatter (1996) suggest that the eect of economic development (as proxied by GDP) on urbanization rates may work through two main channels: Economic development is associated with increasing market size, which leads to more specialization and division of labor. More specialization (as opposed to a subsistence economy) places greater importance on transport costs, as rms rely on inputs from external sources, and distribute their output more widely.
Thus, economic activity may agglomerate in urban areas to minimize cost of transportation. The second channel works through industrialization: Economic development usually entails changes in aggregate demand patterns, with the structure of the economy shifting from agriculture towards industry and services. Given that both localization economies and agglomeration economies (as dened in Section I) are more likely to cause cost advantages in manufactured products than in agricultural goods, structural change may drive urbanization.
Note that these two channels can work independently of each other Increased division of labor within sectors may lead to higher urbanization even when sectoral composition is held constant. Likewise, industrialization (i.e., a change in the sectoral structure of the economy) may occur without an increase of per capita output. To keep these two inuences apart, we account for economic development (as measured by per capita GDP) and industrialization (as measured by non-agricultural share of GDP) separately.
The impact of education on urbanization is likely related to knowledge spillovers: Within-industry spillover eects are a major source of agglomeration, particularly when the level of technological sophistication is high. The existence of high-tech industries presumes an educated workforce. As a result, education and technological sophistication may be complementary in driving urbanization. More generally, knowledge spillovers increase the returns to private human capital, 14 leading competitive rms to pay higher wages to city workers. For instance, Rauch (1993) shows that, controlling for individual education level, a higher local average education level in US cities translates into higher individual earnings. A similar argument can be made for labor market pooling economies of scale from labor market pooling are likely to be strong when the workforce is highly skilled and specialized. Finally, education may be a driver of urbanization in its own right if it changes individuals' preferences towards urban environments.
While we focus on the impact of GDP growth, industrialization and education, we also control for the degree of trade openness (sum of exports and imports as a share of GDP). Trade has been thought to increase urbanization via at least two channels: First, trade increases the importance of transporta-14 See Black and Henderson (1999), who examine the eect of education level on city size in the US empirically. See Lucas (1988) for a discussion of knowledge spillovers, and Henderson (1988) for the eect of education on urbanization. tion hubs, which are usually located in urban environments. Second, the setup and maintenance of trade connections often requires higher levels of marketing and nancing compared to domestic sales. 15 Both channels imply that trade may increase the share of economic activity in urban areas. Nevertheless, Elizondo and Krugman (1996) argue that the sign of the trade coecient should be negative for developing countries, as the giant Third World metropolis is an unintended by-product of import-substitution policies, and will tend to shrink as developing countries liberalize. 16 Their story is that strong backward and forward linkages in a closed economy lead to excessive city size in other words, the presence of trade barriers limits rms to the domestic market, and the concentration of demand and inputs in the capital city makes it protable for new rms to locate there as well. This process reinforces itself, leading to excessive urban concentration (and possibly urbanization). It is reversed with trade liberalization. Therefore, the sign of the trade coecient in the urbanization equation is ex ante ambiguous.
Two political factors have been included in the regression an index of democracy, and a measure of political instability. Both have received attention in the literature to some extent, even though the focus has been more on their impact on urban primacy than on urbanization. The intuition is straightforward: In autocratic regimes, power is generally concentrated in the capital city.

Determinants of Urbanization | 10
Political representation and access to power of the rural population are virtually nonexistent. Autocratic governments are able to make decisions without consideration of a spatially dispersed wider population. Instead, they rely on the support of small wealthy elites to stay in power. As a consequence, they will tend to strongly favour urban elites in the allocation of public resources. Such urban favouritism has implications both for consumption of public goods (e.g., health and education services) as well as for investment and economic growth (rural areas will receive less investment in infrastructure, which further deters private capital ows and impedes economic growth of these regions).
17 As a result, autocratic regimes create strong incentives to migrate to urban areas. A necessary reservation for autocracies is that the political agenda in former socialist economies may have a rural focus rather than an urban one. Given that these regimes are just as likely to rely on the support of small elites, however, it is not clear whether this will translate into a de facto rural bias. In contrast, democracy grants higher political representation to dispersed rural majorities, thus reducing migration incentives. While the quantitative impact of democracy on urbanization is unknown, Davis and Henderson (2003) nd the eect of democracy on urban concentration to be signicant and positive.
Independent of the form of government, political instability in itself can cause urbanization. As a regime struggles to stay in power, organized popular resistance in the cities where the ruling elite is located poses a more serious threat than a disorganized and geographically dispersed rural population. As a consequence, the regime is more likely to give in to the demands of the urban population, and divert resources to content the urban population through consumption subsidies, protection from high taxes and the like. 18 Cities may also provide higher safety levels than rural areas in times of political conict. All of these factors increase the relative attractiveness of living in a city.
The importance of urban concentration, as often measured by primacy (share of urban population living in the largest city) has been widely recognized in the literature, as illustrated in Section I. 19 We control for it in our regression of urbanization to allow for the possibility that a higher concentration of population in a country's largest city is also associated with a higher urbanization rate overall. A measure of population density is included to account for countries with a small area of land relative to their population size, which neccesarily leads to more urban agglomeration. Population growth can aect urbanization either directly (via dierential growth in urban vs. rural areas), or through an eect on migration. For instance, high rural population growth in areas of subsistence agriculture may trigger grown-up children to move to the cities as family sizes outgrow the economic possibilities of the farm.
Finally, we expect infrastructure to play a signicant role in urbanization.
Better infrastructure is associated with lower transport costs, which in turn reduces incentives to locate economic activity in overcrowded cities where land 17 See e.g., the discussion of political factors by Petrakos and Brada (1989). 18 Petrakos and Brada (1989) provide a more detailed explanation of urbanization forces during times of political instability, including possible eects on investment levels. 19 See  as a main reference for the discussion on urban primacy.

Determinants of Urbanization | 11
prices are high. In contrast, lack of infrastructure gives rms no choice but to locate close to their input markets and consumers, which fuels agglomeration. The role of infrastructure has been prominently featured in the core-periphery literature: Core-periphery models following Krugman (1991)  Second, standard linear regression assumes independently distributed errors across countries and time. More plausibly, errors will be clustered at country level. For instance, if a shock hits a country in one period, the impact of this shock will often last for several periods, leading to serial correlation in the error structure. To account for this, we cluster errors at country level, which means the estimation is robust to both heteroscedasticity and serial correlation of the error term. In addition, we allow for country xed eects in order to deal with country heterogeneity in urbanization rates.
Third, the regression includes nonstationary variables. We would expect the time series on GDP, industrialization, trade, education, population density, primacy, democracy and infrastructure to be integrated of order 1, i.e., to have unit roots. While this constitutes a possible concern, our analysis uses a panel which is between 107 and 118 countries wide (depending on specication), and on average six observations (per country) long. This implies the variation which is used to estimate the coecients of interest comes to a large extent from cross-country variation, rather than variation over time. Due to the relatively short time series component, for simplicity we stick to the strong assumption of stationarity.
Note that clustering errors at country level accounts for strong serial correlation of the error term, which further mitigates nonstationarity concerns. Finally, we also provide an estimation in rst dierences, which estimates the change in We focus our analysis on countries which are still in the process of urbanizing, rather than those who have reached a steady state urbanization rate. To do this, we exclude all country observations with an urbanization rate higher than 80%. This has the eect of excluding present-day observations of many developed countries, but it does include data from the less urbanized past of these countries. An additional eect is that city states like Singapore and Monaco are excluded from the analysis. We also restrict our dataset to countries with a total population larger than 1 million: Given the large number of tiny states Determinants of Urbanization | 13 below 1 million). 23 Since the resulting data set is unbalanced, we cannot rule out selection bias: Which data points are missing is not a random process, but in itself a function of multiple variables. As a rule, poor countries are more prone to data availability problems, especially in early years. This implies that our coecient estimates might be driven by rich countries' experience. See Section III.D for robustness checks.

Basic OLS Results
The results of our basic OLS specication are in Table 1. The reported standard errors are robust to heteroscedasticity and account for country clusters, which includes serially correlated errors. Time eects λ t have been included except in column (5). Testing for joint signicance of the time dummies yields F (9, 114) = 3.84, with p = 0.00 for the null hypothesis of no time eects.
Clearly, time eects do need to stay in our regression. The case is even more obvious for country xed eects µ i (excluded in column (4)): An F-statistic of 6148 (p = 0.00) suggests that country-specic eects play a strong role in explaining a country's urbanization rate. Note that country xed eects will also soak up the eect of factors that have not been included in the regression: If our regression does not include all factors determining urbanization (most likely), and the omitted factors are more present in some countries than in others (on a time average), then this will inuence country xed eects. The interpretation of country xed eects is thus restricted to be the time-averaged part of a country's urbanization rate that cannot be associated with any of the regressors in our analysis. To mitigate country heterogeneity, we keep country xed eects for the remainder of the analysis, and focus on columns (1) to (3).
Starting from a limited set of regressors and gradually adding in more controls, it is reassuring to see that the coecients for GDP growth, education and industrialization stay roughly the same in sign and magnitude, even though indus it appears somewhat sensitive to specication. At the same time, they are the only coecients which are consistently signicant, no matter which combination of regressors we tried. In contrast, it seems unexpected that the large unconditional correlation of urbanization with per capita GDP (in our sample, r u,ln y = 0.78 for the log of per capita GDP, and r u,y = 0.56 for the level of per capita GDP) vanishes completely as soon as we control for either education or industrialization. We do not nd any conditional correlation of GDP with urbanization in any specication (the exception being the one without country xed eects). While we cannot draw causal inference, it does suggest that urbanization may be associated less with income level per se, but more with the structure of the economy as well as other indicators of human development.
23 Four observations are excluded as outliers because of extreme growth experiences: Liberia 19902000 (GDP declined by 90% between 19851995, then rose by 241% by 2000) and Tajikistan 1995 (GDP declined by 65% in 5 years).

Determinants of Urbanization | 15
Looking at the coecients for growth, education and industrialization, we nd a robust negative correlation of urbanization rate (a level variable) with per capita GDP growth. From column (3), a one percent increase in a country's per capita GDP growth rate in our data is associated with a roughly 0.14 percentage point lower urbanization rate (note growth is measured as a decimal while urban is in percentage points). This is not surprising: Countries which experienced high income growth in the past decades tend to be developing or middle-income countries. At the same time, developing and middle-income countries are typically at an earlier stage of the urbanization process. This serves as a prime example for the dierence between correlation and causation. An IV approach will provide further insights.
The magnitude of the education coecient is robust to the inclusion of control variables, time and country eects, and centers around 1.6. This suggests that an additional year of schooling in the adult population is associated with a 1.6 percentage point higher urbanization rate. Similarly, our estimates for industrialization (which generally feature the highest signicance levels among all regressors) indicate that an additional percentage point in the share of nonagricultural GDP is associated with a roughly 0.25 percentage point higher urbanization rate. Both estimates are consistent with the notion that countries urbanize as a part of their development process, which goes alongside progress in a number of economic, social and human dimensions. We nd some signicance for other variables, such as population density, trade, democracy and road density. However, these are generally sensitive to specication.

IV Estimation
While the conditional correlations found in the previous section may provide interesting insights, causal inference is invalid due to possible endogeneity of the regressors. In other words, we expect all of the regressors from equation 1 to be correlated with the error term. For instance, we might think that factors like geography or rainfall impact both urbanization rate and GDP growth, biasing the GDP coecient. An instrumental variables approach will help but which instruments can be used? For GDP growth, we follow  approach in instrumenting current changes of variables with past levels of these variables, i.e., current growth of per capita GDP is instrumented with GDP(t-2 ) (note GDP(t-1 ) cannot be used as it enters GDP growth(t ) by construction).
Our rst stages show that past income levels are a strong predictor of current changes in income. For education and industrialization, which are both level variables, we instrument with education(t-2 ) and industrialization(t-2 ). Only one third lag has strong predictive power, and is thus added to our set of instruments: education(t-3 ), which strongly predicts industrialization. We do not add third lags of GDP or industrialization, as they have little predictive power and come in patchy data quality, implying unneccessary loss of observations.
Past levels of these variables predict current levels, which qualies them as relevant instruments in our regression. But do they satisfy the orthogonality criterion? Orthogonality requires E[Z ] = 0, i.e., instruments must not be cor-

Determinants of Urbanization | 16
related with the error term (where Z is the matrix of instruments). For instance, conditional on the same level of industrialization today, a higher industrialization level in the past should not be able to predict a higher urbanization rate today. This may seem counterintuitive, as we may expect past levels of education and industrialization to belong in the urbanization equation themselves.
Two points are worth noting: The rst is that we are using 5-year data, which means we are instrumenting today's industrialization level with that of 10 years ago. The second is that adding country xed eects (i.e., using the within estimator) eectively means that our dependent variable is urban it − urban i , where urban i is the time averaged urbanization of country i. So what we seek to explain are a country's deviations from its own time average. The question becomes: Does a shock to industrialization 10 years ago that may have caused urbanization to deviate from its trend at that time still have an eect on urbanization today, holding constant the level of current industrialization? 24 This question is much less obvious, and we look to the data to answer it. As a test of overidentifying restrictions, we regress the IV residuals in the 2SLS case on the full set of instruments, yielding a Sargan's statistic of 0.539 (p = 0.46), which supports the null hypothesis that our instruments are uncorrelated with the error term. A possible interpretation of this is that the urbanization process adjusts relatively fast to the current environment, and that the impact of past shocks diminishes quickly. We thus proceed with an IV estimation of urban it = α+µ i +λ t +β 1 pcGDP growth it +β 2 education it +β 3 indus it + it (2) We focus on these key regressors for the sake of parsimonious modeling with all control variables being potentially endogenous, we would have to instrument all of them. Note that we also eliminate GDP level as a regressor, and choose to focus on GDP growth instead.
Our IV rst stages are strong: GDP growth is strongly predicted by GDP(t-2 ) and industrialization(t-2 ), but not by lags of education. Education is predicted by education(t-2 ). Both GDP growth and education have strong time eects.
Industrialization is predicted by education(t-3 ) (but not by education(t-2 )) and industrialization(t-2 ). The F-tests for the joint signicance of the four instruments in the rst stages for GDP growth, education and industrialization are F g (4, 106) = 12.01, F e (4, 106) = 49.24, and F i (4, 106) = 17.58, respectively, with p-values of 0.00 in all cases. Even with strong individual rst stages, the model may be underidentied if there is multicollinearity in the common matrix of rst stages. To account for this, we run an underidentication test, which tests the relevance condition that the matrix E[Z X] has full column rank. We nd a Kleibergen-Paap LM statistic of 13.13, with a p-value of 0.001, implying the matrix has full column rank and the relevance condition is satised.
Columns (2) and (3) of Table 2 present our IV estimates. Column (2) includes 2SLS estimates, which are robust to heteroscedasticity and error clustering on country level. We also present LIML estimates, which in theoretical and 24 Similarly, there could be a lagged eect of industrialization(t-2 ) on urban(t), without urban(t-2 ) being aected.

Determinants of Urbanization | 17
The objective of this complementary analysis is twofold: First, it gives a dierent angle to the research question. As mentioned in the previous section, the country xed eects imply that we have explained a country's deviations from its time averaged urbanization levels using deviations from time averaged regressors, i.e., we have implicitly estimated techniques. Note that any serial correlation will be accounted for through error clustering at country level.
We start with a full OLS specication to establish conditional correlations in column (1) of Table 3, analogous to the level specication in Table 1. Signicance levels dier markedly from the level specication, and growth of per capita GDP is now positively correlated with changes in urbanization. An interesting correlation emerges between the change in the urbanization rate and political instability: Periods of political regime changes are frequently associ-25 This is an approximation. Given a constant area of land, and popgrowth it = ∆pop it /pop i,t−1 , we have ∆popdensity it = ∆pop it /area i = popgrowth it · (pop i,t−1 /area i ). For changes in the democracy index, we lose some information by restricting ourselves to the instability dummy, which switches on when democracy changes by 3 or more. 26 Strictly speaking, since we still include country xed eects, the dependent variable is the change in urbanization since the last period minus the average change in urbanization over all 5-year periods. The interpretation is very similar. 27 We do not conduct unit root tests on our data set as the time series component is too short to allow reliable inference. However, it is a common nding in the empirical literature that macroeconomic time series such as GDP and industrialization tend to be integrated of order 1, implying that their rst dierence is stationary. Columns (3) to (5) of Table 3 report IV estimates, with the set of regressors reduced to the key factors of interest. As before, we use GDP(t-2 ), education(t-2 ), education(t-3 ), and industrialization(t-2 ) as instruments, but this time they instrument for GDP growth, Δeducation, and Δindustrialization.

Determinants of Urbanization | 19
The individual IV rst stages provide further support to  assertion that past levels of variables are good instruments for current changes 28 : GDP(t-2 ) and industrialization(t-2 ) strongly predict GDP growth (as before), education(t-2 ) strongly predicts Δeducation, and industrialization(t-2 ) strongly predicts Δindustrialization. Individual F-tests are F (4, 106) = 12.08/14.05/13.95, 28 Henderson (2003) (2) in Table 2 (which contains our preferred estimates), we get coecients of 1.99* for education and 0.35*** for industrialization, which is close to the original estimates.
Furthermore, estimates are robust to relaxing the population restriction: We run regressions using the full sample of all (including very small) countries to see how our restriction to countries with a population over one million aects our results. We nd coecients of 83.50** (GDP growth), 2.50** (education) and 0.44*** (industrialization), suggesting that the population restriction has little eect. In contrast, results are moderately sensitive to the urbanization restriction: If we include countries with urbanization rates above 80% (i.e., countries which are more likely to have reached a steady state level of urbanization, as well as city states), the coecient on GDP growth is reduced to 41.88 and loses its signicance. This does not come as a surprise, given that many rich countries have completed their urbanization process but continue to grow economically.
We further test for sample selection eects by excluding observations before Determinants of Urbanization | 21 1970 (which means predicted values will start from 1985), resulting in a data set that is more balanced between rich and poor countries. Once again, we nd that the eects of education and industrialization are robustly estimated, yielding coecients of 2.83* and 0.58** (N = 494). As before, the coecient on GDP growth is reduced to 41.69, suggesting that our estimate of the eect of GDP growth may be inuenced disproportionately by the (early) experience of rich countries.
To test the functional form specication, we conduct a BoxCox transformation of the dependent variable. BoxCox regressions nd maximum likelihood estimates using various transformations of the left-hand side variable, and then select the transformation which maximises the likelihood of observing the data.

IV Conclusion
This paper provides new evidence on the impacts of economic growth, education, and industrialization on a country's urbanization rate. In contrast to much of the previous literature, we do not focus on the distribution of a given urban population across cities, but aim to provide a big picture as to how key factors drive aggregate urbanization trends. Addressing the well-known correlation between urbanization and GDP growth, we argue that the direction of causality likely runs from growth to urbanization, rather than vice versa. We base this on our IV estimate of the causal eect of growth, in conjunction with (i) a large number of studies which ascertain the empirical correlation between urbanization and growth, and (ii) the fact that attempts to identify a causal eect of urbanization on growth have so far been unsuccessful (see e.g., Henderson (2003)). Quantitatively, we estimate a 0.9 percentage point increase in urbanization for each 1% increase in growth. However, we observe some sensitivity to specication. We nd a signicant positive causal eect of education on urbanization rate, suggesting that one year of average schooling increases urbanization by two percentage points. This eect is remarkably robust to changes in specication. Consistent with theoretical work on scale externalities, we also nd signicant positive eects of industrialization (a 0.4 percentage point increase per one percentage point increase in non-agricultural share of GDP).
Several reservations must be made: As with any IV approach, a causal interpretation of our estimates is conditional on the validity of our instruments.
Unfortunately, there is no single test that guarantees exogeneity of instruments.
Further research into the dynamic adjustment process of urbanization is needed to verify whether lagged values of covariates provide sensible instruments. A second reservation is that the impacts of growth, education, and industrialization on a country's urbanization process are likely to be heterogeneous (depend-

Determinants of Urbanization | 22
ing on a country's level of economic development, institutional framework, and other factors). With our simple linear framework, we are estimating an average eect for countries that are presumed to be still urbanizing. Finally, our results have to be considered with a view to common data problems, such as the non-uniformity in national measurements of urbanization.