Tail Risk in Commercial Property Insurance

We present some new evidence on the tail distribution of commercial property losses based on a recently constructed dataset on large commercial risks. The dataset is based on contributions from Lloyd's of London syndicates, and provides information on over three thousand claims occurred during the period 2000-2012, including detailed information on exposures. We use occupancy characteristics to compare the tail risk profiles of different commercial property exposures, and find evidence of substantial heterogeneity in tail behavior. The results demonstrate the benefits of aggregating granular information on both claims and exposures from different data sources, and provide warning against the use of reserving and capital modeling approaches that are not robust to heavy tails.


Introduction
Property and business interruption risks represent roughly a third, 1 or USD 175 billion, of direct insurance premiums written in commercial insurance lines worldwide (see Swiss Re, 2012). The latter also include liability insurance, commercial auto insurance, and specialty lines such as off-shore energy and workers' compensation. 2 Property insurance typically includes fire insurance, which offers protection against fire and lightning, but may also provide additional cover against natural and social perils such as wind, flood, and vandalism. Business interruption is a complementary insurance covering the expenses and losses incurred when business is interrupted and damages are being repaired. The demand for commercial property insurance is dominated by medium and large corporations that need to insure complex, high severity risks. The largest property insurance markets in the world are the US 3 and the UK. 4 Despite their relevance for the corporate sector and (re)insurers operating in commercial insurance lines, large commercial property risks are poorly understood. This is due to the limited public information available on property losses and exposures, the heterogeneity of risk characteristics of insured values, and the complex relation between hazard events and realized losses, as small events may often precipitate major disasters.
These factors make it difficult for insurers to build reliable statistical claims information, whereas companies that have larger insurance portfolios and more sophisticated claims 1 According to the estimates reported in Swiss Re (2012), 29% of direct insurance premiums in 2010 were written for property and business interruption risks.
2 In terms of global direct premiums written in 2010, the shares of these lines of business were as follows: 25% (liability insurance), 19% (commercial auto insurance), 17% (specialty lines such as offshore energy), and 11% (workers' compensation); see Swiss Re (2012).
3 With USD 1.1 billion of direct premiums written in 2010; see Swiss Re (2012). 4 Although the domestic UK market is not as large as, for example, the German and French markets, London is the main marketplace for international commercial (re)insurance risks. When counting foreign business, the UK jumps to second place; see Swiss Re (2012). reporting systems have no incentive to disclose information for competitive reasons.
The literature on the subject is scant. The few methodological contributions available emphasize the challenges of pricing high layers of exposure, and offer insights into the blending of exposure and experience rating (see Riegel, 2010;Desmedt et al., 2012;Buchanan and Angelina, 2014). As a result, the insurance industry is overly reliant on underwriters' judgment and recent claims history, and finds it difficult to properly understand the true risk that it is taking on. On the demand side, the excessive weight placed on reported claims and the variability of pricing schedules across exposures result in a considerable degree of price volatility, which makes it challenging for corporates to budget for insurance purchases on a systematic basis. 5 In this paper, we shed some light on the tail distribution of commercial property risks by using a recently constructed dataset on large commercial risks. This new data source is based on information collected from two leading Lloyd's syndicates writing a total of GBP 2.67bn gross premiums across all lines of business in 2012, a figure representing around 10% of Lloyd's gross premiums that year. As Lloyd's is predominantly a subscription market, the claims information provided by the two syndicates allows us to encompass a substantially larger fraction of business transacted, making the dataset representative of the business written in the London market. 6 To measure the tail risk of commercial property exposures, we use a parsimonious model based on approximating the tail behavior of the claims with a power law. In particular, we estimate the tail index, a parameter describing how fast the tail of a power law decays: the lower the tail index, the greater the probability mass in the tails. To ensure robustness relative to small sample bias, heterogeneity, and dependence of the claims considered, we resort to the log-log rank-size method, as presented by Gabaix and Ibragimov (2011), and discussed more in detail in section 3 below. As a robustness check, we also apply the method of Huisman et al. (2001), which is designed to address small sample issues (see section 3.3 for a review of alternative approaches).
Estimation of the tail index offers immediate insights for pricing, reserving, and capital modeling exercises. In particular, the value of the tail index is in one-to-one correspondence with the maximal order of finite (centered) moments of the risks considered. For example, the skewness only exists for values of the tail index strictly larger than three; the variance only exists for values strictly larger than two; and the mean only exists for values strictly larger than one (e.g., Embrechts et al., 1997;Beirlant et al., 2006). The tail index can be regarded as being infinite for Normal distributions, as the tail decay is faster than exponential, and moments of an arbitrary order are then finite. In our data, we find that commercial property risks are significantly heavy tailed. For several rating factor configurations, the hypothesis of existence of the variance can be rejected at the 5% significance level. For some important classes of risk, even the hypothesis of existence of the mean can be rejected. 7 Existence of a finite variance is essential for the application of standard statistical methods, such as least squares methods. It is also crucial for reserving methods based on risk margins proportional to the standard deviation of the claims, 8 meaning that such methods are inappropriate for liabilities modeled by extrapolating available claims information far into the tails. Existence of a finite mean is important for capital modeling and quantile-based risk measures, such as Value at Risk (VaR) in the Solvency II frame-7 Tail index estimates below one also arise in the context of operational risks and returns from technological innovation; see Nešlehová et al. (2006), Degen et al. (2007), and Silverberg and Verspagen (2007), for example.
8 Commercial property insurance providers in the London market, for example, often quantify reserves by inflating premiums by a factor (say 30%) of the standard deviation of the losses.
work. In particular, coherence of VaR as a risk measure in the sense of Artzner et al. (1999) may be violated, meaning that the diversification benefits on which a subscription market like Lloyd's is based may be limited for some classes of commercial property risks. As an example, denote by the random variable Z (w 1 ,...,wn) := n i=1 w i X i the risk exposure resulting from retaining fractions w 1 , . . . , w n (with w i ∈ [0, 1], i = 1, . . . , n) of i.i.d. risks X 1 , . . . , X n . If the risks belong to the class of stable distributions with tail index α, for example, it can be shown that V aR q (Z (1,0,...,0) ) < V aR q (Z (1/n,...,1/n) ) for tail probability parameter q ∈ (0, 1/2) and tail index value α ∈ (0, 1) (see Ibragimov, 2009b,a). Regulators should therefore be aware of the risk concentration incentives that popular Hill (1975) estimator, and the weighted-Hill estimator of Huisman et al. (2001), which was specifically designed to address small sample issues. The results allow us to reject the hypothesis of existence of first or second moments at a good significance level in some interesting cases. Finally, we explore the relative contribution of different rating factors to tail risk, by expressing the tail index as a deterministic function of relevant covariates, and adopting a regression approach in line with Beirlant et al. (1999), Beirlant and Goegebeur (2003), and Wang and Tsai (2009).
The paper is organized as follows. In the next section, we provide details on the dataset. In section 3, we outline the statistical methodologies used. In section 4, we provide tail estimation results for some configurations of exposure characteristics. Section 5 concludes, offering recommendations for future research.

Data
The Imperial-IICI dataset contains claim and exposure information obtained from two leading syndicates of Lloyd's of London. As the latter is a subscription market, the data span business written by a number of other syndicates. Granular information on claims and exposures was obtained from brokers' submissions. These are documents informing the 'lead' underwriter of any claims occurring under a policy; the information is then shared with the market, in order to allocate the losses to each 'follower', depending on the individual retentions of the syndicates that co-insured the risk underwritten by the 'lead'. Brokers' submissions are fundamental in our analysis, as they allow us to determine claims from the ground up (FGU). It is in general very difficult to recover FGU claims from the losses incurred by individual syndicates, due to the complex layering and coinsurance arrangements characterizing large commercial property insurance. All data were anonymized and aggregated by using fictitious claims and policy identifiers.
Internal validation of the data was carried out by looking at individual claims narratives and policy schedules, which are documents listing the asset values insured under a policy.
External macro-validation was carried out by using data from fire protection agencies as compiled by ISO Verisk. 10 The Imperial-IICI FGU claims provide aggregate information on indemnities for physical damage and business interruption, as well as claims assessment and settlement fees. Both claims and exposures are expressed in 2012 USD terms; 11 the normalization is obtained by trending claims and exposures at an average rate of 2.5% per annum across the two syndicates. An example of data record is presented in table 1. The record reports location information, and classifies the risk type according to the Lloyd's risk codes (a selection of these codes is presented in table 2). The claim can be further understood by using occupancy type information, which has three levels of increasing granularity. The first one broadly classifies exposures into commercial (e.g., offices, banks, stores), manufacturing (e.g., utilities, food processors, mines), and residential property (e.g., hotels, hospitals). The second level provides some more detail according to the definitions reported in table 3, allowing one to distinguish, for example, a hotel from a hospital, or metals from food producers. The third occupancy level offers a more granular view of the exposures, distinguishing for example between large vs. small hotels, heavy vs. light fabrication infrastructure, and food & drugs vs. chemicals vs.
metal & minerals processing plants. Finally, occupancy information is complemented 10 We are grateful to John Buchanan and Chris Kent at ISO Verisk for making this validation exercise possible. The exercise demonstrates consistent results in terms of average excess severity of losses above USD 1m and 5m across occupancy types (see figure 1). The Imperial-IICI dataset provides larger coverage of manufacturing exposures, which give rise to the largest losses also in the data compiled by ISO Verisk. 11 Claims are available in original currency. For comparability, we converted claims in USD by using Lloyd's year-end currency conversion rates. by the claim narrative, which may also provide some indications on the hazard event (e.g., burst of waterpipe, electrical failure, fire from hotel restaurant).
< Table 1 Table 4 gives an idea of the geographical distribution of the losses. Although the dataset has global scope, the largest subsample is represented by North American data.
The Worldwide data class is currently being analyzed at a deeper level, and might result in the allocation of claims to more precise locations in the future. In figure 1, we give an idea of the claim counts and average FGU losses in excess of different thresholds.

Methodology
We use a simple yet effective method to estimate the tail index, which relies on the log-log rank-size (LLRS) OLS regression. The method is severely biased in small samples and has often been applied with an incorrect formulation for the standard errors. We apply the optimal bias correction and the correct formula for the standard errors indicated by Gabaix and Ibragimov (2011). Application of the method with an optimal ranks shift is very robust to heterogeneity and dependence of the data, including common factor structures. For comparison, we pair the method with the popular Hill estimator, which suffers from significant bias in small samples, and with the weighted Hill estimator developed by Huisman et al. (2001) to deal with small sample issues. Readers interested in the empirical results may skip the next sections and go directly to section 4.
3.1 Hill estimator. Hill (1975) proposed a simple method for estimating the tail index from a sequence of i.i.d. observations. Let l(i) denote the i-th order statistic of losses, such that l(i) ≥ l(i − 1) for i = 2, . . . , n, with n the size of the sample. Let us focus on the losses in excess of the k-th order statistic, and assume a conditional Pareto distribution for the random loss L in excess of the threshold l(k). We have where α(k) := γ(k) −1 is the tail index and C a positive constant. The Hill estimator for the parameter γ(k) is the maximum likelihood estimator The standard error for the Hill estimate of the tail index,α(k), is s.e. Hill =α(k)/ √ k. To understand the bias affecting (3.1) in small samples, consider the class of distribution functions (see Hall, 1990;Dacorogna et al., 1995): for positive parameters α, β, and real a, b (the case a = 1 and b = 0 corresponding to a Pareto distribution). Dacorogna et al. (1995) showed that the above is the second order expansion of the cumulative distribution function of a vast class of heavy tailed distributions, and provided the following approximations for the expected value and variance of the estimatorγ(k) in (3.1): for n, k and l(k) going to infinity. The above show how the choice of threshold l(k) introduces an important trade-off between bias and efficiency.

LLRS regression.
A popular alternative to more complex estimators for the tail index is to run the OLS regression ln(i − η) = a − b ln l(i), with i = 1, . . . , k. The tail index is given by the OLS estimatorb. The procedure is typically applied with η = 0, hence the name LLRS. Unfortunately, this approach is strongly biased in small samples, as can be seen from the following asymptotic expansions for the OLS estimatorb(η, k) (see Gabaix and Ibragimov, 2011): where N (0, 1) denotes a standard Normal random variable. When using the LLRS method, one should therefore apply the optimal rank shift η = 1/2. We will refer to this case as the LLRS-1/2 method. The relevant standard error of the OLS estimator of the slope coefficientb(1/2, k) is then given by s.e. LLRS =b(1/2, k) 2/k; again, we refer to Gabaix and Ibragimov (2011) for a discussion of this result.

Alternative methodologies.
There are several alternative methods to estimate the tail index or its inverse. 12 As a robustness check, we consider the method proposed by Huisman et al. (2001) to deal with small samples, and the bias-efficiency trade-off formalized in (3.2). Huisman et al.
(2001) note from (3.2) that, for small enough k, the bias can be approximated by a linear function, suggesting the use of the regression γ(k) = β 0 + β 1 k + ε(k).
where we refer to Huisman et al. (2001) for details on the weights w(k).

Empirical evidence
In this section we provide tail index estimates for the subset of the Imperial-IICI dataset covering commercial property claims and exposures. 13 Let us first provide a comparison of the two estimation methods outlined in sections 3.1-3.2 by looking at property exposures classified as RE (residential) according to Occupancy Information Level 1. These include hotels, condos, and municipal property such as council houses, universities, and colleges. Figure 2 reports tail index estimates and 90% confidence bands for the Hill and LLRS-1/2 methods. Estimates are based on subsamples obtained by considering between 10% to 40% of the largest losses. The results suggest considerable heaviness of the tail distribution: both estimation methods indicate a tail index close to one; the existence of a finite variance can be rejected at the 5% significance level.
< Figure 2 about here > To demonstrate the importance of occupancy type, we then compare the tail behavior of Occupancy Information Level 1 types RE (residential), CO (commercial), and MA (manufacturing). Figure 3 depicts the results obtained with the LLRS-1/2 method.
They indicate considerable variation in tail behavior across occupancy types. Based on our data, in particular, all types have tail indices lower than two at the 95% confidence level; commercial losses have the highest tail index estimates, residential losses the lowest, whereas manufacturing claims are somewhere in the middle. Although losses in our dataset are on average substantially larger for manufacturing than residential exposures, the results suggest that the latter may be more dangerous from a distributional perspective. We must bear in mind, however, that residential exposures are underrepresented in our dataset, as shown for example in figure 1. As a robustness check, in table 6 we report the results obtained by applying the method of Huisman et al. (2001) to the three occupancy types. The results show broad agreement with the LLRS-1/2 method, in particular for MA and CO exposures.
< Table 6 about here > The use of Occupancy Information Level 3 gives us the opportunity to explore variations in tail behavior within a specific occupancy class. Let us consider RE (residential) exposures, for example, and distinguish between Large Hotels and Dwellings, the latter including condos, housing associations, and institutional housing. Figures 4 and 5 show that the second type of exposures is lighter tailed than Large Hotels. We report results for both the Hill and LLRS-1/2 estimation methods, which provide similar implications.
It is well known that hotel exposures have different risk profiles, depending for example on the presence of a restaurant (increasing the risk of fire), and may be more or less risky than condos and apartment complexes. Although the dataset does not allow us to explore this dimension at the moment (including the presence of sprinklers), our results suggest that the Large Hotels occupancy class may subsume the presence of serious sources of risk (such as restaurants), thus explaining the lighter tail behavior of Dwellings. Finally, in figure 6 we report the tail index estimates for the case of aggregate Occupancy Level 2 types C (chemicals), J (metals), and M (mines). Again, the results show considerable tail heaviness, which is comparable to Dwellings, but lighter than Large Hotels across different thresholds and estimation methods. ing Beirlant et al. (1999), Beirlant and Goegebeur (2003), and Wang and Tsai (2009), we assume that the tail index can be expressed as a deterministic function of rating factors, which in the examples below are represented by occupancy types, but more generally could include Total Insurable Value (TIV) bands and/or locations. Using the notation of (3.1), we assume where X = (X 1 , . . . , X p ) ′ is a vector of covariates, and the tail index is assumed to take the form α(X; k) = exp (θ ′ X). We estimate the exponential regression coefficient θ ∈ R p by using the approximate maximum likelihood estimator of Wang and Tsai (2009), as well as their methodology to select the optimal threshold k.
The approach can be used to quantify the relative contribution to tail risk of different property characteristics. We provide an example using as covariates dummy variables for Occupancy Level 1 classes RE, CO, and MA. Assuming that the Imperial-IICI dataset is representative of a diversified portfolio of commercial property risks insured in the London market, one can use the results reported in table 7 to suggest that on average occupancy MA provides a positive contribution to portfolio tail risk, while occupancy types RE and CO provide a negative contribution. These results should be interpreted bearing mind that MA exposures are overrepresented in our sample, and associated with higher excess loss severity (see figure 1).
< (residential) (residential) (Large Hotels) primary layer property; USA; excluding binders)      Table 7: Tail regression results for the exponential model α = exp(θ ′ X). The vector of covariates, X = (X 1 , X 2 , X 3 ) ′ , includes Occupancy Level 1 indicators for classes RE (residential), CO (commercial), and MA (manfacturing). The corresponding estimated loadings are indicated byθ = (θ 1 ,θ 2 ,θ 3 ) ′ . Following Wang and Tsai (2009), the optimal sample fraction corresponds to 10% of the largest losses in the entire dataset. We setα i := exp(θ i ) for the tail index estimate resulting from considering only the loading estimateθ i pertaining to each individual occupancy type.
FGU claims in excess of USD 1m.
FGU claims in excess of USD 5m.