Unit Root Testing with Slowly Varying Trends

A unit root test is proposed for time series with a general nonlinear deterministic trend component. It is shown that asymptotically the pooled OLS estimator of overlapping blocks filters out any trend component that satisfies some Lipschitz condition. Under both fixed-$b$ and small-$b$ block asymptotics, the limiting distribution of the t-statistic for the unit root hypothesis is derived. Nuisance parameter corrections provide heteroskedasticity-robust tests, and serial correlation is accounted for by pre-whitening. A Monte Carlo study that considers slowly varying trends yields both good size and improved power results for the proposed tests when compared to conventional unit root tests.


Introduction
It is widely debated in the time series literature whether macroeconomic variables such as GDP, inflation, and interest rates are I(1) or I(0) around a deterministic trend. Dickey-Fuller-type unit root tests often fail to reject the null hypothesis for these time series. The trend component of a time series y t is typically treated as known up to some parameter vector. The most commonly applied unit root tests, such as those developed by Dickey and Fuller (1979), Said and Dickey (1984), Phillips (1987), Phillips and Perron (1988), and Elliott et al. (1996), impose either a constant or a linear trend model. If, however, the deterministic trend component is nonlinear, highly persistent trend-stationary processes can be hardly distinguishable from unit root processes (see, e.g., Bierens 1997 andBecker et al. 2006).
It is not only a misspecified trend model that may lead to high power losses, as an overparameterized model can also reduce the power of unit root tests. Therefore, many authors have suggested applying trend models that seem more suitable for macro data.
Broken trend models with one-time changes in mean or slope with known breakpoint were first studied by Perron (1989) and Rappoport and Reichlin (1989). Christiano (1992) demonstrated that a broken trend model with an unknown breakpoint is more adequate, and Zivot and Andrews (1992), as well as Banerjee et al. (1992), proposed unit root tests for this framework. Structural changes in innovation variances were studied by Hamori and Tokihisa (1997), Kim et al. (2002), and Cavaliere (2005), while Cavaliere et al. (2011) considered unit root testing under broken trends together with nonstationary volatility. Leybourne et al. (1998), Kapetanios et al. (2003), and Kılıç (2011) allowed for exponential smooth transitions from one trend regime to another. Bierens (1997) approximated a nonlinear mean function with Chebyshev polynomials, and Enders and Lee (2012) proposed a Fourier series approximation of the trend, which are approaches that can be used when the exact form and date of structural changes are unknown. For a comprehensive review on the research on unit root testing see Choi (2015).
Dickey-Fuller-type tests are based on the t-statistic of the first-order autoregressive parameter. In case of a constant trend, the estimator is derived from a regression of ∆y t on (y t−1 − y), where y is the sample mean. Schmidt and Phillips (1992) estimated the constant by the initial observation, which results in a regression of ∆y t on (y t−1 − y 1 ).
Whereas a constant is often not a good global approximation, in a small block, a smoothly varying trend can be approximated quite closely by a constant. To exploit this fact, we propose a block procedure to filter out the unknown trend component. Blocking was also used in Rooch et al. (2019) to estimate the fractional integration parameter in a similar situation. We divide the series into T − B overlapping blocks of length B. As the blocks can be considered as units of a panel, we follow the panel unit root tests proposed by Breitung (2000) and Levin et al. (2002) and consider a pooled regression of ∆y j+t on (y j+t−1 − y j ) for 2 ≤ t ≤ T and 1 ≤ j ≤ T − B. The deterministic function is approximated locally by a constant. One could also use higher order local approximations of the trend function, but these approximations do not work well in samples of usual size. For this reason, we focus on constant local approximations. Under a general class of piecewise continuous trend functions, the resulting pooled estimator is consistent as B, T → ∞.
The limiting null distribution of the t-statistic is a functional of a Brownian motion under fixed-b asymptotics. Under small-b asymptotics, a normal distribution is obtained.
The paper is organized as follows: In Section 2 the autoregressive model with independent and heteroskedastic errors is analyzed together with the asymptotic behavior of the pooled least squares estimator in the presence of a general nonlinear trend component.
For both fixed-b and small-b block asymptotics, the limiting distributions are derived under both the unit root hypothesis and under local alternatives. In the presence of heteroskedastic errors, nuisance parameters appear in the limiting distributions, and the estimation of these parameters is discussed. Section 3 considers pseudo t-tests for the unit root hypothesis, and heteroskedasticity-robust test statistics are provided. In Section 4, a pre-whitening procedure is proposed in order to account for short-run dynamics, while Section 5 reports on Monte Carlo simulations. The tests are found to have only minor size distortions in small samples and are sized correctly in larger samples. It is shown that in the presence of slowly varying trends, pooled tests tend to yield higher power than conventional unit root tests. Finally, Section 6 presents the conclusion.
In the following, W (r) denotes a standard Brownian motion and "⇒" stands for weak convergence on the càdlàg space D[0, 1] together with a suitable norm. Θ(·) denotes the exact order Landau symbol, that is, a T = Θ(b T ) if and only if a T = O(b T ) and b T = O(a T ), as T → ∞. Moreover, · is the integer part of its argument, and ∆y t stands for the differenced series y t − y t−1 . Finally, d −→ and p −→ denote convergence in distribution and convergence in probability.

The pooled estimator
We are interested in inference concerning the autoregressive parameter ρ in the model where ρ is close or equal to one. The deterministic trend component d t is treated as nonstochastic and fixed in repeated samples, where its functional form is nonparametric and unknown.
Assumption 1 (trend component). The trend component is given by d t = d(t/T ), where d(r) is a piecewise Lipschitz continuous function.
Note that any continuously differentiable function is Lipschitz continuous. Lipschitz functions are locally close to a constant value in the sense that there exists some C < ∞ such that |d(r) − d(s)| ≤ C|r − s| for all r, s ∈ R. The piecewise Lipschitz condition allows for a partition with a finite number of intervals, such that d(r) is Lipschitz continuous on each interval. This includes both smooth changes as well as abrupt breaks in the trend function. For the initial value, it is assumed that E[x 2 0 ] < ∞. We introduce the pooled estimator and the unit root test statistics under the following assumptions on the error term: The function σ(r) is càdlàg, non-stochastic, strictly positive, and bounded.
The principal approach to dealing with a general, slowly varying trend is to approximate the unknown trend locally by a constant. Let B be some blocklength that satisfies 2 ≤ B < T . We divide the time series into T − B overlapping blocks of length B and then block-wise estimate ρ via OLS under a constant trend specification. In the fashion of Schmidt and Phillips (1992), as well as Breitung and Meyer (1994), the constant trend is estimated by the first observation in each block, which corresponds to the maximum likelihood estimator under the unit root hypothesis ρ = 1. Thereafter, by pooling the T − B individual block regressions, we obtain the regression equation In the following, we derive the asymptotic properties for the numerator and the denominator separately. The numerator and denominator statistics are defined as Their counterparts without deterministics are given by In what follows, we show that, under the block procedure, the deterministic component can be ignored asymptotically. All asymptotic results are jointly derived for B, T → ∞. While the statistics X 1,T and X 2,T are infeasible if d t is unknown, they can be well approximated by Y 1,T and Y 2,T in the following sense: BT with c ≥ 0, let d t satisfy Assumption 1, and let u t satisfy Assumption 2. Then, as B, T → ∞, Y 1,T − X 1,T = O P (B −1/2 ), and Y 2,T − X 2,T = O P (T −1/2 ).
Accordingly, we obtain (Y 1,T − X 1,T , Y 2,T − X 2,T ) p −→ (0, 0) jointly, and the block procedure filters out the trend component in the numerator and the denominator asymptotically.
Hence, applying Slutsky's theorem, we can write This result is valid without any rate restrictions for B. In order to obtain the limiting distribution, we formulate some properties for the numerator and denominator statistics.
Lemma 2. Let ρ = 1 − c/ √ BT with c ≥ 0, and let u t satisfy Assumption 2. Then, as B, T → ∞, the following statements hold true: The previous results suggest distinguishing between different rates for B, which leads to two fundamentally different types of blocklength asymptotics. The fixed-b approach denotes the case where the relative blocklength B/T converges to some value b with 0 < b < 1, such that B and T grow at the same rate. In the small-b approach, we consider a relative blocklength that converges to zero, while B, T → ∞. 1 As the blocks are overlapping, the error terms in the pooled regression equation are correlated, but, fortunately, the correlation structure is known by construction. Together with the central limit theorem for martingale difference arrays, the following asymptotic result can be established for the small-b case: BT with c ≥ 0, let d t satisfy Assumption 1, and let u t satisfy Since Y 2,T converges in probability to a constant, we have joint convergence of (Y 1,T , Y 2,T ), and the pooled estimator is asymptotically normally distributed under small-b asymptotics.
Under the unit root hypothesis ρ = 1, or, equivalently, if c = 0, it follows that The asymptotic variance ofρ involves integrals of the second-and fourth-order powers of the function σ(r), where the factor 1 0 σ 4 (r) dr/( 1 0 σ 2 (r) dr) 2 is equal to unity in case of homoskedasticity. This factor also appears in the asymptotic variance matrix of the OLS estimator of the autoregressive coefficient under unconditional heteroskedasticity (see Phillips and Xu 2006). Cavaliere (2005) showed that permanent changes in volatility induce a time-shift in the right-hand-side process of the functional central limit theorem. A variance-transformed Brownian process W η (r) appears in the limiting distributions of Dickey-Fuller-type unit root tests. Given the variance profile η, where η(s) = ( 1 0 σ 2 (r)dr) −1 s 0 σ 2 (r)dr, the transformed process is defined as W η (r) = W (η(r)), where W (r) is a standard Brownian motion. When imposing fixed-b asymptotics, the numerator and denominator statistics can be represented as a partial sum process of the innovations, which leads to the following limiting result: Theorem 2. Let ρ = 1 − c/ √ BT with c ≥ 0, let d t satisfy Assumption 1, and let u t satisfy Assumption 2. Let 0 < b < 1, and let B/T → b as B, T → ∞. Then, The limiting distributions are represented as functionals of the process J c,b,η , which is an Ornstein-Uhlenbeck type process that is driven by a variance-transformed Wiener process. Consequently, the pooled estimator is asymptotically represented as a functional of a standard Brownian motion. If ρ = 1, the continuous mapping theorem and Theorem 2 imply that under fixed-b asymptotics. In comparison to the limiting distribution of the ρ-statistic in the Dickey-Fuller framework, the functional includes an additional integral, which results from pooling the block regressions.
3 Pseudo t-statistics for unit root testing The principal concept of Dickey-Fuller-type unit root tests is to consider a t-test for the null hypothesis H 0 : ρ = 1. Following this approach in the pooled regression framework, the usual standard error is given by sρ =σ( T −B j=1 B t=2 (y t+j−1 − y j ) 2 ) −1/2 =σ(Y 2,T B 2 T ) −1/2 and the conventional t-statistic is represented as (ρ − 1)/sρ = √ BY 1,T / σ 2 Y 2,T , which diverges in probability under H 0 . Accordingly, we consider a scaled pseudo t-statistic of the form In what follows, pseudo t-tests are defined for both small-b and fixed-b block asymptotics. In order to get a nuisance-parameter-free limiting distribution under small-b asymptotics, we replaceσ byκ in equation (2). The small-b pseudo t-statistic is given as The factor v T is defined in Lemma 2. Since v T → 2/3, this term provides a finitesample correction and scales the asymptotic variance of the t-statistic to unity. Under fixed-b asymptotics, a nuisance term appears in the Gaussian process itself. By means of transforming the data with its inverse variance profile, Cavaliere and Taylor (2007) showed that the time-transformation in the Gaussian limiting processes can be inverted.
The variance profile estimatorη(s) is strictly increasing and admits the unique inverse functionη −1 (s). Accordingly, we consider the time-transformed seriesỹ t = y η −1 (t/T )T for t = 1, . . . , T . We replace the original series in the test statistic byỹ t and define which yields the fixed-b statistic In practice, the time time-transformed seriesỹ t can have duplicate entries in low volatility periods and therefore may not include all information of the original series in high volatility periods. However, we do not need to discard any observations when transforming the data. We may artificially extend the series. An auxiliary sample size T ≥ T can be chosen in such a way thatη −1 (t/ T ) −η −1 ((t − 1)/ T ) ≥ T −1 for all t = 1, . . . , T . Then, the grid of width 1/ T is dense enough such thatỹ t = y η −1 (t/ T ) T , t = 1, . . . , T , includes all sample points of the original series, and the fixed-b statistic may be applied to this auxiliary series. Note that the auxiliary time series is not necessary from a theoretical point of view, but it leads to better test results in small samples.
where J c,b (r) = r 0 e −(r−s)c/b dW (s) is a standard Ornstein-Uhlenbeck process. Note: The sample paths of the standard Brownian motions contained in the asymptotic null distribution of τ -FB are simulated by a discretized version of W (r) on a grid of 50,000 equidistant points. The empirical quantiles are obtained from 100,000 Monte Carlo repetitions.
The unit root hypothesis is rejected in favor of stationarity if the test statistic is smaller than the α-quantile of the limiting distribution for the case c = 0, where α is the significance level. For τ -SB we can rely on standard normal quantiles as critical values. The limiting Table 1 presents simulated left-tailed quantiles of the null distribution for various relative blocklengths B/T and significance levels.
From the point of view of a practitioner, the τ -SB test has a number of advantages: the distribution is standard normal; thus, there is no need to resort to new tables, and p-values are easy to implement. In fact, the simulations in Section 5 indicate that the standard normal approximation is quite accurate in small samples if B = Θ(T γ ), where 0.5 ≤ γ ≤ 0.8. Furthermore, the unit root test is robust to heteroskedasticity without using any data modification method such as those in Cavaliere and Taylor (2007) and Beare (2018) or wild bootstrap implementations (see Cavaliere and Taylor 2008a).

Testing under short-run dynamics
A more realistic scenario for macroeconomic variables is that error terms are serially correlated. We impose the following assumption on the error process: Assumption 3 (serially correlated errors). The process {u t } t∈Z possesses the moving av- where L is the usual lag operator. Moreover, all solutions z of the equation ψ(z) = 0 satisfy |z| > 1. The pro- The function σ(r) is càdlàg, non-stochastic, strictly positive, and bounded.
Assumption 3 implies that the moving average representation of u t is invertible, and we In order to correct for the effect of short-run dynamics, we follow Breitung and Das (2005), among others, and consider the pre-whitened series x * t = θ(L)x t . By equation (1), it follows that where t satisfies the same conditions as u t under Assumption 2. Consequently, if the unit root statistics are defined in terms of instead of X 1,T and X 2,T , their limiting distributions coincide with those presented in the previous sections.
Since the autoregressive parameters of the error process are unknown, they need to be estimated. In the fashion of Said and Dickey (1984) and Chang and Park (2002), we fix some lag order p T and consider the AR(p T ) error representation which is equal to p T i=1 θ i ∆x t−i + p T ,T under the unit root hypothesis. The lag order p T is allowed to grow with the sample size T . In what follows, we show that the differenced deterministic terms are asymptotically negligible, as p T → ∞ with p T = o(B 1/2 ), and we may replace ∆x t−i by ∆y t−i for all i ≥ 0 in the augmented regression equation.
The estimated pre-whitened series is defined asŷ * t = y t − p T i=1θ i y t−i , and the corresponding numerator and denominator statistics are given bŷ The pre-whitened counterparts of the estimators from Lemma 3 are defined aŝ Analogously, we consider the time-transformed pre-whitened seriesỹ * t =ŷ * η * −1 (t/T )T for all t = 1, . . . , T , whereη * −1 (s) is the unique inverse ofη * (s), and we define For any lag order p T ≥ 0, the pre-whitened versions of the test statistics are given by Note that τ -SB 0 = τ -SB and τ -FB 0 = τ -FB. To summarize, we obtain the following limiting distributions: Theorem 4. Let ρ = 1−c/ √ BT , let d t satisfy Assumption 1, and let u t satisfy Assumption The lag order p T is typically unknown in practice and can be chosen using conventional lag order selection methods, such as the Bayesian information criterion (BIC) or by the general-to-specific methodology in the fashion of Ng and Perron (1995). The maximum lag order p max can be chosen for instance by the rule of thumb provided by Schwert (1989). For the special case of a single break in the deterministic component, Demetrescu and Hassler (2016) showed that if p T is determined by a usual information criterion the correct lag length is selected asymptotically.
Note: The functional form of the trend functions for the simulations are presented. The parameter λ determines the size of the trend. Note: The plots of the of the trend functions from Table 2 are presented. The trend size is λ = 3.

Simulations
In this section, the finite sample performance of the unit root tests is evaluated by means of Monte Carlo simulations. The analysis includes different specifications for both the deterministic part d t and the stochastic part x t .
While the zero-trend d t = 0 is the main benchmark, we consider several other trends including sharp breaks and smooth changes of different shapes. The trend specifications are presented in Table 2 and Figure 1. The parameter λ determines the size of the break.
Similar trend functions are also considered in Jones and Enders (2014) in order to evaluate the performance of the unit root test by Enders and Lee (2012).
The stochastic part x t is simulated both under the null hypothesis ρ = 1 and the alternative hypothesis ρ = 0.9. For the errors u t , we consider an independent process as well as the AR(1) process u t = 0.5u t−1 + t with standard normal innovations. Furthermore, results with heteroskedastic innovations using the variance function σ 2 (r) = 1 + λ · 1 {r≤2/3} are presented.
The small-b tests are implemented using blocklengths of the form B = T γ with param- 6}. For all tests, the lag augmentation order p T is either fixed or flexibly determined by the BIC with a maximum lag order of p max = 5. All empirical size levels are presented for a significance level of 5%, and the models are simulated with 100,000 repetitions for sample sizes of T = 100 and T = 300. As noted by Müller and Elliott (2003), the power of a unit root test depends on the initial condition, and the initial value is simulated as x 0 ∼ N (0, σ 2 0 ) for σ 2 0 ∈ {0, 5, 10}. In order to demonstrate the advantage of the fixed-b and small-b unit root tests, their finite sample results are compared to those obtained by conventional unit root tests. As the main benchmark, we consider the augmented Dickey-Fuller test by Said and Dickey (1984) with constant trend specification (ADF henceforth), which is the t-test for the hypothesis Elliott et al. (1996) proposed a feasible point-optimal test with local-to-unity GLS demeaning in the ADF regression. Let the deterministic trend function be given by the vector z t , and let α * = 1 − c/T , where c ∈ R. Furthermore, let y c,t = y t − α * y t−1 and Z c,t = z t − α * z t−1 for t ≥ 2, and let y c,1 = y 1 and Z c,1 = z 1 . The Dickey-Fuller GLS test is where y d t = y t −β z t and whereβ is the OLS estimator from a regression of y c,t on Z c,t . For the constant trend specification (DF-GLS henceforth), we set z t = 1 and c = 7, and, for the linear trend specification (DF-GLS-trend henceforth), z t = (1, t) and c = 13.5 are considered. Note that the point-optimal test with GLS demeaning is asymptotically equivalent with the Dickey-Fuller test for d t = 0 computed using the series with initial value subtraction (see Elliott et al. 1996) An approach that does not assume a precise model for the trend component is that developed by Enders and Lee (2012) (EL henceforth). A flexible Fourier form is used to approximate smooth breaks in the trend function. Structural changes can be captured by the low frequency components of a series. In its simplest form, Enders and Lee (2012) considered the parametric trend model d(r) = α 0 +γr+α 1 sin(2πr)+β 1 cos(2πr). More frequencies could be included, but doing so could lead to an over-fitting problem. The test works as follows: First, the auxiliary regression ∆y t = δ 0 + δ 1 ∆ sin(2πt/T ) + δ 2 ∆ cos(2πt/T ) + v t is considered with OLS estimates δ 0 , δ 1 , and δ 2 . Let D t =δ 0 t +δ 1 sin(2πt/T ) +δ 2 cos(2πt/T ), which yields the detrended series S t = y t − D t − (y 1 − D 1 ). Finally, the test statistic is given by the t-statistic for the null hypothesis φ = 0 in the regression ∆y t = Leybourne (2005, 2006) showed that, if x 0 ∼ N (0, σ 2 α /(1 − ρ 2 )) for ρ = 1 − c/T with c > 0 and some σ α > 0, the limiting distributions of the ADF and the DF-GLS test depend on the additional nuisance parameter σ α . The DF-GLS test is optimal for the zero initial condition x 0 = 0, but its power decreases monotonically in σ α , while the power of the ADF test increases. Figure 2 indicates that the pooled tests are less sensitive to this effect across different values of σ α . Furthermore, there is no test that outperforms the other tests uniformly across σ α for this situation in terms of size-adjusted power. Note: Size-adjusted power results for different tests are presented. The initial condition is simulated from a normal distribution with mean zero and different values for σ 2 0 = V ar[x 0 ], where σ 0 is shown on the x-axis. The simulation results are reported for for a nominal size level of 5%, for 100,000 replications with T = 100, ρ = 0.9, the zero trend specification d t = 0, and independent standard normal innovations u t .
Tables 3-7 present size and actual power results under different model specifications.
For smaller sample sizes, the pooled tests have small size distortions, which become larger as the break gets larger. However, for larger sample sizes, the size distortions decline.  Note: Simulation results are reported for 100,000 replications. The zero-trend d t = 0 is considered for all t = 1, . . . , T . The AR(1) process is given by u t = 0.5u t−1 + t . All innovations are simulated independently as standard normal random variables. For the small-b and fixed-b tests, the lag order p refers to the pre-whitening scheme, and, for the conventional tests, p represents the augmentation order. The rejection frequencies are based on the asymptotic critical values for a significance level of 5%.
Overall, the size levels are similar to those obtained from using the conventional unit root tests.
The power of the pooled tests depends on the blocklength. In case of no break, a larger blocklength implies higher power results, which is in line with the theoretical findings that those tests have power in a 1/ √ BT neighborhood of the unit root hypothesis. For blocklengths of B = T 0.8 in the small-b case and B = 0.6T in the fixed-b case, the power results are similar to those from the ADF test and the Dickey-Fuller GLS test, where the ordering depends on the initial condition (cf. Figure 2). Hence, none of the tests dominates the pooled tests uniformly across these small-sample specifications (although,      Note: Simulation results are reported for 100,000 replications. The errors u t are simulated from u t = 0.5u t−1 + t with independent standard normal innovations, and the series are pre-whitened with a lag order p that is determined from the BIC. The rejection frequencies are based on the asymptotic critical values for a significance level of 5%. asymptotically, those tests have power in a 1/T neighborhood of the unit root hypothesis).
Furthermore, smaller blocklengths, such as T 0.6 in the small-b context and 0.2T in the fixed-b context, still yield reasonably high power. In particular, the EL test performs much worse in all cases. The size and power results obtained under the AR(1) error specification with both fixed and flexible lag augmentation for the pre-whitening scheme are similar to those produced by i.i.d. errors.  Note: Simulation results are reported for 100,000 replications. The errors u t are simulated independently as standard normal random variables, and the series are not pre-whitened (p = 0). The sharp break specification is defined by a break in the variance at 2/3 of the sample. The rejection frequencies are based on the asymptotic critical values for a significance level of 5%.
As the tests are designed to yield higher power in the presence of slowly varying trends and breaks, we compare the size-adjusted powers of the tests under the trend specifications presented in Table 2 and Figure 1. For large break sizes λ, it is shown that the smaller the blocklength, the greater the power results. In most cases, the pooled tests have greater power than the ADF, the DF-GLS, the DF-GLS-trend, and the EL test. Furthermore, the power results of the pooled tests are quite uniform across different trend specifications when compared to those of the conventional tests. Table 6 shows that the pooled tests have reasonable size and power properties under the presence of AR(1) errors and different trend specifications. Furthermore, from Table   7, we can conclude that the tests are sized correctly and have good power properties in the presence of a break in the variance and in the trend function.
The blocklength B is a tuning parameter that needs to be chosen carefully, and any optimality result would depend on the actual trend model. In practice, however, the trend model is unknown, which makes it hard to derive an optimal blocklength. Although theoretical recommendations cannot be formulated based on the current analysis, the smallb tests with B = T 0.7 and the fixed-b tests with T = 0.2B yield very promising results for all trend functions studied in this paper and are therefore recommended as the default settings.

(b):
Note that by mathematical induction on n, the identity n t=2 t−1 k=1 a k = n−1 k=1 (n− k)a k holds true for any sequence (a t ) t∈N . The index set I j can be expressed as For j ∈ [1, B], it follows that and, analogously, if j ∈ [B + 1, T − B], we obtain and the first part of (b) has been shown. For the second part, we decompose the denomi-nator statistic into X 2,T = S 5 + S 6 + S 7 , where The first term satisfies Combining all cases and applying the Gaussian summation formulas yields since c = 0. For the denominator, we have S 6 = S 7 = 0, since c = 0. Then, and the assertion follows with equations (A.5) and (A.6).

A.4 Proof of Theorem 1
From Lemma 2(a), it follows that E[q 2 j,T ] = O(T −1 ) for any j ≤ T , which implies that V ar[ T j=1 q j,T ] = T −B j=B+1 E[q 2 j,T ]+o(1). The identity n t=2 t−1 k=1 a k = n−1 k=1 (n−k)a k holds true for any sequence (a t ) t∈N , which follows by induction on n. Then, for B +1 ≤ j ≤ T −B, Since {q j,T } is a martingale difference array, we can apply the central limit theorem from Theorem 24.3 in Davidson (1994), which implies that T j=1 q j,T / V ar[ T j=1 q j,T ] d −→ N (0, 1), as T → ∞. Furthermore, from Lemma 2, E[X 1,T ] = −c/2 1 0 σ 2 (r) dr + o(1), and the first statement follows from Lemma 1. For the second statement, note that, Furthermore, from Lemma 2, V ar[X 2,T ] = o(1), and the assertion follows by Chebyshev's inequality together with Lemma 1.

A.5 Proof of Theorem 2
Let X T (r) = T −1/2 rT k=1 u k and Y T (r) = T −1/2 x rT for r ≥ 0. From Lemmas 1 and 2 in Cavaliere (2005), it follows that X T ⇒ σW η , where σ 2 = 1 0 σ 2 (r) dr denotes the average variance. For notational convenience, we set u 0 = x 0 . Note that a Taylor expansion around 0 yields e −x = 1−x+o(x), which implies that Then, with the continuous mapping theorem, we obtain Then, with Lemma 1, From ∆x t = u t , it follows that which implies that Furthermore, Lemma 1 yields The assertion follows from equation (A.7), together with the continuous mapping theorem.
The consistencies ofσ * 2 ,κ * 2 , andη * (s) follow from the fact that where the last two equations hold true as B/T → 0, analogously to Lemma 3.
Finally, since the pre-whitened numerator and denominator statistics (X * 1,T , X * 2,T ) under Assumption 3 have the same properties as (X 1,T , X 2,T ) under Assumption 2, the assertion follows with Lemma 5 and the proof of Theorem 3.