CRIX an index for cryptocurrencies

The cryptocurrency market is unique on many levels: Very volatile, frequently changing market structure, emerging and vanishing of cryptocurrencies on a daily level. Following its development became a difficult task with the success of cryptocurrencies (CCs) other than Bitcoin. For fiat currency markets, the IMF offers the index SDR and, prior to the EUR, the ECU existed, which was an index representing the development of European currencies. Index providers decide on a fixed number of index constituents which will represent the market segment. It is a challenge to fix a number and develop rules for the constituents in view of the market changes. In the frequently changing CC market, this challenge is even more severe. A method relying on the AIC is proposed to quickly react to market changes and therefore enable us to create an index, referred to as CRIX, for the cryptocurrency market. CRIX is chosen by model selection such that it represents the market well to enable each interested party studying economic questions in this market and to invest into the market. The diversified nature of the CC market makes the inclusion of altcoins in the index product critical to improve tracking performance. We have shown that assigning optimal weights to altcoins helps to reduce the tracking errors of a CC portfolio, despite the fact that their market cap is much smaller relative to Bitcoin. The codes used here are available via www.quantlet.de.


Introduction
More and more companies have started offering digital payment systems. Smartphones have evolved into a digital wallet, telephone companies offer banking related services: clear signal that we are about to enter the era of digital finance. In fact we are already acting inside a digital economy. The market for e-x (x = "finance," "money," "book," you name it . . . ) has not only picked up enormous momentum but has become standard for driving innovative activities in the global economy. A few clicks at y and payment at z brings our purchase to location w. Own currencies for the digital market were therefore just a matter of time. Due to organizational difficulties the idea of the Nobel Laureate Hayek, see hayek_denationalization_1990 of letting companies offer concurrent currencies seemed for a long time scarcely feasible, but the invention of the Blockchain has made it possible to bring his vision to life. Cryptocurrencies (CCs) have surfaced and opened up an angle towards this new level of economic interaction. Since the appearance of Bitcoins, several new CCs have spread through the Web and offered new ways of proliferation. Even states accept them as a legal payment method or part of economic interaction. E.g., the USA classifies CCs as commodities, kawa_bitcoin_2015 and lately Japan announced that they accept them as a legal currency, econotimes_japans_2016 Obviously, the crypto market is fanning out and shows clear signs of acceptance and deepening liquidity, so that a closer look at its general moves and dynamics is called for.
The transaction graph of Bitcoin (BTC), the Blockchain, has received much attention, see e.g. ron_quantitative_2013 and reid_analysis_2013 Even the economics of BTC has been studied, e.g. bolt_value_2016 and kristoufek_what_2015 To our best knowledge, the development of the entire CC market has not been studied so far, only subsamples have been taken into account. wang_buzz_2017 studied the variations of 5 CCs.
elendner_cross-section_2017 analyzed the top 10 CCs by market capitalization and found that their returns are weakly correlated with each other. Furthermore, a Principal Component (PC) Analysis, carried out in the same reference, showed 7 out of 10 PC were necessary to describe more than 90% of the variance. These findings indicate the price evolution of CCs is very different from each other. This brings us to the conclusion that BTC, even though it dominates the market in terms of its market capitalization, can not lead the direction of the market. The movements of other CCs are important too, when one analyzes the market. Having a closer look at the different CCs, it becomes obvious they have different kind of missions and technical aspects. Bitcoin pioneered as the token of the first decentralised, distributed ledger, giving start to multiple interpretations of its nature and purpose: new type of currency, commodity (like gold), alternative asset or innovative technology. The currently second most important CC by market capitalization -Ethereumwas created with a particular goal in mind -to power the blockchain based Ethereum platform for company building (DAO) and smart contract implementation. This idea triggered an unprecedented interest as it allowed companies to enter the field without creating their own blockchain ecosystem. Newcomers could benefit from the existing supporters of the respective platform, which allowed faster entry, adoption and operation. Other CCs, like Ripple (XRP), are intended to fuel the transaction network bridging traditional markets (banks) and the crypto ecosystem. Ripple also became one of the first successful cases of pre-emitted CC, abandoning the idea of decentralisation. Since the appearance of BTC many technological advancements took place. Some CCs are designed for faster (or even immediate) transactions, like Litecoin (LTC), some are more efficient energy-wise, like DASH. Many embraced different hashing algorithms, altering the mining process, like Monero. Long ASIC domination is being disrupted, Proof-of-work is replaced by Proof-of-Stake, new ways to motivate those providing computational power are introduced. Regardless the type of CC, one witnesses a new kind of transaction network with a different approach for fees and handling of trust issues. The intended and actual usage can be interpreted as the business model of the different CCs and the participation in either CC can give advantages over others, white_market_2014 In the first month of 2017, CCs other than BTC (altcoins) showed a strong gain in their market capitalization, reducing the dominance of BTC in the market. The finding of very different movements of CCs and the stronger position of alternative CCs in the market infers the necessity of a market index for the CC market for tracking the market movements.
Comparing CCs against a market index answers economic questions like which business model is more successful than another one, gained recently compared to other CCs, drives the success of the market, is more established. Comparing a CC market index against other market indices answers economic and financial questions like which market proxy is more volatile, has more tail risk, attracts more investments. We construct CRIX, a market index (benchmark) which will enable each interested party to study the outlined economic questions, the performance of the CC market as a whole or of single CCs. Studying the stochastic dynamics of CRIX will allow a la limite to create ETFs or contingent claims.
Many index providers construct their indices with a fixed number of constituents, see e.g. ftse_ftse_2016 s&p_index_2014 and deutsche_boerse_ag_guide_2013 If the respective index is intended to be a proxy for the performance of a market, this requires huge trust from economists and investors into the choice of the index constituents by the index provider. On the other hand, the CRSP index family, derived for the US market, crsp_crsp_2015 has no boundary on the number of index constituents. The number of constituents is reviewed daily and adjusted until the index members cover a predefined share of the market capitalization. Such a dynamic methodology is important in the market of CCs since the number of CCs changes daily. Additionally the market value of CCs often changes frequently, which increases the market volatility and therefore the need for considering such a CC for the representation of the market. Our intention is extending the idea behind the CRSP indices. Our first goal is constructing a methodology for CRIX which relies on model selection criteria to receive a proxy for the market and to replace the trust problematic with a statistical methodology. The resulting methodology is dynamic in the number of index constituents, like the CRSP indices. By this method only CCs which add informative value to the index are considered, which makes it representative. If more CCs than BTC are necessary to fulfill this requirement, they will be added. However we are concerned with the dominance of BTC in an index solely relying on market capitalization. Thus we introduce a second weighting scheme based on weighting by trading volume. Due to the usage of trading volume, the respective index is constructed in terms of trading focus. If the market participants focus more on altcoins than on BTC, these receive a higher weight. On the other hand, if the market focus is truly on BTC, it will receive a high weight in either index. Our second goal, constructing an investable index will be fulfilled by the methodology itself due to having a sparse index, only consisting of actively traded CCs in a market with low transaction costs.
Note that due to the low transaction costs in the CC market, a dynamic methodology creates low additional costs. Additionally to the methodology ensuring an investable index, the proposed trading volume weighting scheme further supports this goal.
Investing into an ETF composed of the constituents of CRIX implies some differences compared to traditional index investing. In the traditional setting only the constituents are reviewed and replaced on the review date -if necessary -according to the index rules. In dynamic index investing the constituents are also reviewed for their number. This requires the manager of the fund to buy and sell more assets on the review date. In a market with high transaction costs, this approach is more costly. But the market of CCs has very low transaction costs, thus this problem won't occur in this market.
To compute CRIX, the differences in the log returns of the market against a selection of possible indices is evaluated. The results show, that the AIC works well to evaluate the differences.
It penalizes the index for the number of constituents. For the calculation of the respective likelihoods, a non-parametric approach using the epanechnikov_non-parametric_1969 kernel is applied. The proof for the impact of the value of an asset in the market on the AIC method is given, thus a top-down approach is applied to select the assets for the benchmarks to choose from, where the sorting depends on either market cap or trading volume. The number of constituents is recalculated quarterly to ensure an up-to-date fit to the current market situation. With CRIX one may study the contingent claims and the stochastic nature of this index, chen_econometric_2017 or study the CC market characteristics against traditional markets, hardle_crix_2015 This paper is structured as follows. Section 2 introduces the topic and reviews the basics of index construction. In Section 3 the method for dynamic index construction for CRIX is described and Section 4 introduces the remaining rules for CRIX. Section 5 describes further variants to create a CRIX family. Their performance is tested in Section 6. In Sections 7 and 8 the new method is applied to the German and Mexican stock markets to check the performance of the methodology against existing indices. The codes used to obtain the results in this paper are available via www.quantlet.de .

Index construction
The basic idea of any price index is to weight the prices of its constituent goods by the quantities of the goods purchased or consumed. The Laspeyres index takes the value of a basket of k assets and compares it against a base period: with P it the price of asset i at time t and Q i0 the quantity of asset i at time 0 (the base period). For market indices, such as CRSP, S&P500 or DAX, the quantity Q i0 is the number of shares of the asset i in the base period. Multiplied with its corresponding price, the market capitalization results, hence the constituents of the index are weighted by their market capitalizations. These indices are often referred to as benchmarks for their respective market.
We define the term benchmark:  (1) can not handle such events entirely because a change of constituents will result in a change in the index value that is not due to price changes. Therefore, established price indices like DAX or S&P500, see deutsche_boerse_ag_guide_2013 and s&p_index_2014 respectively, and the newly founded index CRIX(k), a CRyptocurrency IndeX, thecrix.de, use the adjusted formula of Laspeyres, with P , Q and i defined as before, β i,t − l the adjustment factor of asset i found at time point t − l , l indicates that this is the l-th adjustment factor, and t − l the last time point when for all i and l. Anyhow, some indices use β i,t − l to achieve maximum weighting rules, e.g.
deutsche_boerse_ag_guide_2013 and mexbol_prices_2013 The Divisor ensures that the index value of CRIX has a predefined value on the starting date. It is defined as The starting value could be any possible number, commonly 100, 1000 or 10000. It ensures that a positive or negative development from the base period will be revealed. Whenever changes to the structure of CRIX occur, the Divisor is adjusted in such a way that only price changes are reflected by the index. Defining k 1 and k 2 as number of constituents, it results In indices like FTSE, S&P500 or DAX the number of index members is fixed, k 1 = k 2 , see ftse_ftse_2016 s&p_index_2014 and deutsche_boerse_ag_guide_2013 As long as the goal behind these indices is the reflection of the price development of the selected assets, this is a straightforward approach. But, e.g., DAX is also meant to be an indicator for the development of the market as a whole, see jansen_deutsche_1992 This raises automatically the question of whether the included assets and the weighting scheme are representing the market. Since the constituents are chosen using a top-down approach, meaning that the biggest companies by market capitalization are included, the intuitive answer is yes. But it leaves a sour taste that additional assets may describe the market more appropriately. Furthermore different weighting schemes provide another view on the market. One may object by referring to total market indices like the Wilshire 5000, S&P Total Market Index or CRSP U.S. Total Market Index, see wilshire_associates_wilshire_2015 s&p_dow_2015 and crsp_crsp_2015 that are providing a full description. But financial practice has shown that smaller indices like DAX30 and S&P500 receive more attention in evaluating the movements of their corresponding markets, probably because they are easier to invest in due to the smaller number of constituents. It is therefore appealing to know which are the representative assets in a market and which smaller number of index constituents eases the handling of a tracking portfolio. Additionally, one may be concerned that an index would include illiquid and non-investable assets which makes the management of a tracking portfolio even more difficult. Figure 1 shows that this is indeed a problem in the CC market.
Some CCs have a fairly high market capitalization while their respective trading volume is very low. This is problematic, because an asset which is not frequently traded can not add enough information to a market index to display market changes and is difficult to trade for an investor. Hence, one goal behind constructing CRIX is making it investable by concentrating on liquid CCs: Definition 2. Between investment portfolios with equal performance, the one with the least assets is preferable.
We react to the goals and problems in two ways: First, these thoughts raise the question VolMarketCapComparison which value of k is "optimal" for building an investable benchmark for the market. Additionally, especially young and innovative markets may change their structure over time. Therefore, a quantification of an accurate CC benchmark with sparse number of constituents is asked for.
Since the CC market shows a frequently changing market structure with a huge number of illiquid CCs, a time varying index selection structure is applied. The later described selection method omits illiquid CCs by construction, because only CCs who show changes in their return series can be selected to be added to CRIX by the method. Due to the low transaction costs in this market, a dynamic methodology is applicable since it does not raise the costs of restructuring a tracking portfolio too much. Secondly, we apply two kind of weighting schemes, Table 1. We apply the classical setting to build a proper market index which is only flexible in terms of the dynamic constituents and tackles the illiquidity issue due to the applied selection method. The liquidity weighting allows one to weight CCs higher, which are more traded relative to their market capitalization and therefore implicitly acquire more financial attention. This weighting scheme bails (2) down to weighting the price development by their trading volume, The latter is referred to as Liquidity CRIX (LCRIX). This approach has the potential to diminish the influence of e.g. Bitcoin stronger than the market cap weighting, if the relation of trading volume to market cap is higher for other CCs. In section 6 we show that LCRIX has a better mean directional accuracy than CRIX and puts more weight on altcoins,

Dynamic index construction
This section is dedicated to describing the composition rule which is used to find the number of index members-the spine of CRIX and LCRIX. Since CRIX will be a benchmark for the CC market, the dimension and evaluation of the market has to be defined:

Definition 3. The total market (TM) consists of all CCs in the CC universe. Its value is the combined market value of the CCs.
To compare the TM with a benchmark candidate, it will be normalized by a Divisor, with K the number of all CCs in the CC universe. Note that no adjustment factor is used for TM(K) t . For the volume weighting, the TM is defined as LTM respectively, In the further explanations, the focus lies on the TM. However when LCRIX is derived, it is optimized against LTM. The results can be easily extended to the case of LTM. Further define the log returns: where CRIX(k, β) t is the CRIX with k constituents at time point t.
The goal is to optimize k and β so that a sparse but accurate approximation in terms of is achieved, where ε(k, β) is the difference in the log returns of TM(K) and CRIX(k, β). A squared loss function is chosen in (10), since it heavily penalizes deviations.
Since the value of TM(K) t is unknown and not measurable due to a lack of information, the total market index will be defined and used as a proxy for the TM(K). The definition is inspired by total market indices like crsp_crsp_2015 s&p_dow_2015 and wilshire_associates_wilshire_2015 They use all stocks for which prices are available. This changes (6) to with k max the maximum number of CCs with available prices and (10) to where ε(k max ) T M I are the log returns for TMI. In the derivation of LCRIX, the optimization is performed against LTMI and Several constraints were introduced with (11). The parameters β k+1 , . . . , β k+s are included to evaluate if adding s more assets to the index explains the difference between ε(k max ) T M I and ε(k, β) CRIX better. The first k assets (k 1 ) won't be adjusted by a parameter, so no parameter estimation is necessary. This makes the first term a constant. The choice of k 1 is important since it defines the number of base CCs to be included in the index. The parameters of the next s assets have to be estimated, so (2) becomes A number of criteria are applicable. Model selection (SC) criteria can be categorized by their property to be either asymptotic optimal or consistent in choosing the true model. In where k 1 , k 2 , . . . are predefined values and SC ∈ {GC, GFC, C p , SH, FPE, AIC}. Recall that the intention behind CRIX is to discover under a squared loss function the best model to describe the data (benchmark), which supports the choice of an asymptotic optimal criteria.
The GC criterion, see craven_smoothing_1978 is defined as by assuming that s < T . One shall note that s and not k + s defines the number of variables to penalize for, since k parameters are set to be 1 and need not be estimated. According to arlot_survey_2010 the asymptotic optimality of GC was shown in several frameworks.
The GFC, see droge_comments_1996 is an alteration.
A further score, SH, was shown to be asymptotically optimal, shibata_optimal_1981 and asymptotically equivalent to Mallows' C p and AIC.
mallows_comments_1973' C p : with σ(k, β) 2 the variance of ε(k, β). C p { ε(k, β), s} tends to choose models which overfit and is not consistent in selecting the true model, see mallick_bayesian_2013 woodroofe_model_1982 and nishii_asymptotic_1984 The FPE uses the formula see akaike_statistical_1970 So far, the discussed criteria depend on little data information. Just the squared residuals and, in the case of Mallows' C p , the variance are taken into account. The AIC uses more information by depending on the maximum likelihood, derived by where f , in (21), represents the density of the ε(k, β) t over all t. The AIC is defined to be akaike_information_1998 If the true model is of finite dimension, then the AIC is not consistent, compare hurvich_regression_1989 shibata_asymptotic_1983 showed the asymptotic efficiency of Mallows' C p and AIC under the assumption of an infinite number of regression variables or an increasing number of regression variables with the sample size. Due to the usage of the density in deriving the AIC, it uses more information about the dataset.
Considering that (10) implies the criteria are derived under an expected squared loss function, the density, f , can be estimated different from the Gaussian distribution. Here, f is estimated nonparametrically with an Epanechnikov kernel, since according to hardle_nonparametric_2004 the epanechnikov_non-parametric_1969 kernel shows a good balance between variance optimization and numerical performance. In nonparametric estimation with an Epanechnikov kernel, Epa, the estimator of f is derived by where h is the bandwidth.
The bandwidth selection is performed with the plug-in selector by sheather_reliable_1991 Due to the richer information basis of the AIC, we decide to use it as the selection criteria for CRIX. The choice is supported by an empirical analysis in section 6.
To decide with the AIC which number k should be used, a procedure was created which compares the squared difference between log returns of the TMI, see Definition 4, and several candidate indices, where ε(k j , β) CRIX is the log return of CRIX version with k j constituents and ε(k j , β) is the respective difference. The candidate indices, CRIX(k j , β), have different numbers of constituents which fulfill k 1 < k 2 < k 3 < · · · , where k j = k 1 + s(j − 1). Therefore, the number of constituents between the indices are equally spaced. The procedure implies that the selection method evaluates if s more assets add information to CRIX. If so, these assets are added to the intercept and the next s assets are tested for. Assets with a higher market capitalization are expected to have a higher influence on the AIC, so the following theorem is formulated: Theorem 1. The rate of improvement of the AIC depends on the relative value of an asset in the market.
The proof for the Theorem 1 is given in the Appendix, 11.1, under the assumption of normally distributed error terms. Therefore, we will follow the common practise to include the assets with the highest market capitalization in the index, Thus, a top-down approach to decide about the number of index constituents is applied.
For the sorting of the index constituents by highest market capitalization, just the closing data of the last day of a month are used. We chose to do so, since the next periods CRIX will just depend on Q i,t − l , (2), and not on data which lie further in the past. This is in line with the methodology of e.g. the DAX. For LCRIX, the CCs with the highest trading volume are chosen respectively, Since the differences between the TMI(k max ) and CRIX(k j , β) are caused over time by the missing time series in CRIX(k j , β), the independence assumption of the ε(k j , β) for all j can not be fulfilled by construction. But gyorfi_nonparametric_1989 give arguments that under certain conditions in case of nonparametric density estimation, the rate of convergence is essentially the same as for an independent sample. Summarizing the described procedure, results to: 1. At time point T + 1, construct TMI(k max ) 2. Set j = 2 3. Construct CRIX(k 1 , 1) and CRIX(k j , β), k 1 < k 2 < k 3 < · · · 4. Compute ε(k j , β) and ε(k 1 , 1) 5. Kernel density estimation (KDE) for density f ( ε(k 1 , 1)) a) Compute the log likelihood (20) for ε(k j , β) with KDE for ε(k 1 , 1).

CRIX family rules
The constituents of the indices are regularly checked so that the corresponding index always represents its asset universe well. It is common to do this on a quarterly basis. In case of CRIX this reallocation is much faster. In the past, coins have shown a very volatile behavior, not just in the manner of price volatility. In some weeks, many occur out of nothing in the market and many others vanish from the market even when they were before very important, e.g., Auroracoin. This calls for a faster reallocation of the market benchmark than on a quarterly basis. A monthly reallocation is chosen to make sure that CRIX catches the momentum of the CC market well. Therefore, on the last day of every month, the CCs which had the highest market capitalization on the last day in the last month will be checked and the first k will be included in CRIX for the coming month. Accordingly for LCRIX the ones with the highest trading volume are chosen.
Since a review of an index is commonly performed on a quarterly basis the number of index members of CRIX will be checked on a quarterly basis too. The described procedure from Section 3 will be applied to the observations from the last three months on the last day of the third month after the markets closed. The number of index constituents, k, will be used for the next three months. Thus, CRIX corresponds to a monthly rebalanced portfolio which number of constituents is reviewed quarterly.
It may happen that some data are missing for some of the analyzed time series. If an isolated missing value occurs alone in the dataset, meaning that the values before and after it are not missing, then Missing At Random (MAR) is assumed. This assumption means that just observed information cause the missingness, horton_much_2007 The Last-Observation-Carried-Forward (LOCF) method is then applied to fill the gap for the application of the AIC.
We did not choose a different approach since a regression or imputation method may alter the data in the wrong direction. By LOCF, no change is implied and the CC is not excluded. If two or more data are missing in a row, then the MAR assumption may be violated, therefore no method is applied. The corresponding time series is then excluded from the computation in the derivation period. If data are missing during the computation of the index values, the LOCF method is applied too. This is done to make the index insensitive to this CC at this time point. CRIX should mimic market changes, therefore an imputation or regression method for the missing data would distort the view on the market. Before continuing, the described rules are summarized: • Quarterly altering of the number of index constituents

The CRIX family
Using the described methods and rules from above, three indices will be proposed. This indices provide a different look at the market.

CRIX/LCRIX:
The first and leading index is CRIX and for volume weighting LCRIX. While the choice for the best number of constituents is made, their numbers are chosen in steps of 5.
It is common in financial industry to construct market indices with a number of constituents which is evenly divisible by 5, see e.g. ftse_ftse_2016 s&p_index_2014 deutsche_boerse_ag_guide_2013 Therefore this selection is applied for CRIX(k), k = 5, 10, 15, . . . with k 1 = 5. Since the global minimum for the selection criterion may involve many index constituents, but a sparse index is the goal, the search for the optimal model terminates at level j whenever and k j−1 index constituents are chosen. Therefore merely a local optimum will be achieved in most of the cases for Θ = Θ AIC , in (13). But the choice is still asymptotically optimal by defining Θ = {Θ AIC |k i ≤ k j ∀i}. In Section 6 it will be shown that the performance of the index is already very good.

ECRIX/LECRIX:
The second constructed index is called Exact CRIX (ECRIX) and Liquidity ECRIX respectively. It follows the above rules too. But the number of its constituents is chosen in steps of 1. Therefore the set of models contains CRIX(k), k = 1, 2, 3, . . . with k 1 = 1 and stops when 3. EFCRIX/LEFCRIX: Since the decision procedures for CRIX and ECRIX terminate when the AIC rises for the first time, Exact Full CRIX and Liquidity EFCRIX will be constructed to visualize whether the decision procedure works fine for the covered indices. The intention is to have an index which may approach the TMI but only in case even small assets help improve the view on the total market, a benchmark for the benchmarks. It'll be derived with the AIC procedure, compare Section 3. For k = 1, 2, 3, . . . with k 1 = 1 the decision rule is based on for Θ = Θ AIC , in (13). This index computes the AIC for every possible number of constituents and the number is chosen where the AIC becomes minimal.

Performance analysis
The indices CRIX, ECRIX, EFCRIX with market cap weighting and LCRIX, LECRIX, LEFCRIX with volume weighting have been proposed to give insight into the CC market.
Our RDC CC database covers data for over 1000 CCs, kindly provided by CoinGecko. The data used for the analysis cover daily closing data for prices, market volume and market     The indices optimized until a local optimum are expected to perform less optimal than the globally optimized ones against the TMI/LTMI. Table 2 and Table 3 give the mean over where t − l and t + l are the beginning and end of the month respectively, I(·) is the indicator function and sign(·) gives the sign of the respective equation. Apparently CRIX performs best, which can be explained due to its larger number of index constituents. The CRIX, ECRIX and EFCRIX are close in terms of the MDA but the MSE is much better for CRIX. Comparing all the model selection criteria, FPE has the best performance in terms of MSE and MDA, due to choosing high numbers of constituents. The trading volume weighted indices are close in terms of MSE and MDA to their market weighted corresponding indices.
At the same time the number of constituents are mostly sparser for the volume weighted ones. CRIX was constructed with steps of five which is common in practice and performed best under AIC. For this case the number of constituents was the most stable, while achieving the best performance for MSE and MDA. Additionally, the analysis showed that it is indeed unnecessary from a practical viewpoint to choose the global optimal AIC under steps of 1.
Even a local optimum and a much more stable number of constituents is able to mimic the market movements very well in terms of the MDA and MSE. Furthermore, even for ECRIX there was more than one constituent selected most of the time. This shows that Bitcoin, which currently clearly dominates the market in terms of market capitalization and trading volume, does not account for all the variance in the market. Other CCs are important for the market movements too.
Depending on the theoretical and empirical analysis, we decided to continue with the AIC.
From the theoretical viewpoint, the AIC uses the most information about the data, since it relies on the density. From the empirical analysis, the AIC chooses much less constituents than GC, GFC, SH and FPE, while its performance in terms of MSE and MDA is close to the three outlined criteria. The better performance was achieved due to overparametrization of the index by GC, GFC, SH and FPE. Therefore, CRIX will be derived with the AIC criterion.
Comparing CRIX with the development of BTC, it tracks the market development better over time. Figure 3 shows the monthly MSE of CRIX with AIC and BTC. In 2016 CRIX tracked the market development much better than BTC, and in the beginning of 2017 even better due to the huge impact of the price gain of altcoins like Ethereum, Ripple and Dash.
Their performance is visualized in Figure 4, clearly showing the better performance of CRIX in this time period, driven by price gains in altcoins. Due to the log scale and the high gains of altcoins, the difference between CRIX and BTC appears little, while in fact being considerable.  Table 8 shows the actual weights given to BTC and altcoins in the respective indices. In the liquidity indices altcoins frequently receive a higher weight compared to the respective indices based on market capitalization weighting.

Application to the German stock market
The CRIX methodology was derived with the idea of finding a method which allows mimicking young and fast changing markets appropriately. But well known major markets usually change their structure too. So the proposed methodology is tested on the German stock market, which has four major indices: DAX, MDAX, SDAX and TecDAX. The DAX is used to determine the overall market direction, jansen_deutsche_1992 Since it is chosen from the so called The computation of the MSE and MDA, see   Again, the CRIX methodology works well. The MSE is very low compared to the one for the IPC35 and the MDA gives a much better performance too, see Table 7. We can conclude that the methodology helped to circumvent the usage of arbitrary rules for the weights in the rules of the indices and enhances at the same time the performance of the market index.    We conclude, that the CRIX technology enhances the construction of an index if the goal is to find a sparse, investable and accurate benchmark.

Acknowledgments
We would like to thank the editor and an anonymous referee for their valuable comments to this article. Our thanks extends to David Lee Kuo Chuen and Ernie G. S. Teo for their comments in several discussions. Financial support from the Deutsche Forschungsgemeinschaft via CRC 649 "Economic Risk" and IRTG 1792 "High Dimensional Non Stationary Time Series", Humboldt-Universität zu Berlin, is gratefully acknowledged.
With the linearity property of the expectation operator, assume without loss of generality Using the relation log(a + b) = log(a) + log(1 + b a ), it results: Solving the derivation and writing the terms which do not depend on β 1 as A t and the last part of (36) as B t : Since normally distributed error terms are assumed, note that β 1 = Cov{ ε(k,1),ε k+1 } V ar{ε k+1 } , where ε k+1 is the log return of P i,t Q i,0 . The change in the variance will depend on the additional variance which the new constituent can explain, see β 1 . Furthermore, it depends on the value of P k+1,t Q k+1,0 relative to k i=1 P i,t Q i,0 , (36), which is the summed market value of the constituents in the index. This infers that constituents with a higher market capitalization are more likely to be part of the index.
This gives support to using the often applied top-down approach, which we use for the construction of CRIX too.