Evolutionary Game Theory: A Renaissance

Economic agents are not always rational or farsighted and can make decisions according to simple behavioral rules that vary according to situation and can be studied using the tools of evolutionary game theory. Furthermore, such behavioral rules are themselves subject to evolutionary forces. Paying particular attention to the work of young researchers, this essay surveys the progress made over the last decade towards understanding these phenomena, and discusses open research topics of importance to economics and the broader social sciences.

This essay surveys recent work in evolutionary game theory, primarily as it relates to the social sciences, with particular attention paid to the work of young researchers. The intended audience is current and potential researchers in evolutionary game theory, as well as a broader audience of interested readers whose specialisms lie in other fields. Evolutionary methods consider how a state variable changes over time. The state variable can be a biological or cultural trait or a profile of strategies in a game. The process by which it changes can be survival of the fittest, imitation or optimization arising from some deliberative rule. Thus, axioms on behavior and decision making are theoretically postulated and can be empirically tested. These axioms lead indirectly to predictions of medium and long run outcomes. This contrasts with fixed point solution concepts, such as Nash equilibrium or the Core, in which axioms on behavior are explicit restrictions on outcomes. In evolutionary game theory, behavioral rules and outcomes are distinct. The broad open spaces between behavior and outcomes are where evolutionary game theorists go to play.

The shadow of Nash equilibrium
To some, the period of intense activity in evolutionary game theory in the mid to late 1990s had two goals, firstly to justify Nash equilibrium and secondly to give some consistent and simple selection criterion for favoring some Nash equilibria over others. See Samuelson (2016) for a brief description of this perspective. The author of the current survey sees such an approach as an exercise in begging the question. There are behavioral rules that do not lead to Nash equilibria. Whether these rules are realistic is ultimately an empirical question, the answer to which can be determined independently of whether or not they lead to Nash equilibrium. Furthermore, different rules may lead to different outcomes and the rule that is applied may be sensitive to context. Again, this is an empirical question. Finally, a given Nash equilibrium might arise from many processes, even, as we shall see later in this survey, ones in which players are unaware of the existence of other players. Hence, in contrast to evolutionary models, the implications of an assumption of Nash equilibrium for out-of-equilibrium behavior are imprecise.

Renaissance and the scope of this survey
The author believes that the evolutionary approach is interesting simply because it is often the correct approach. The world comprises decision makers that are not always far sighted and make decisions according to basic heuristics that vary according to situation. Increasingly many interactions are with other parties who we have never met and know nothing about. Simple rules of decision making can lead to complex social phenomena, as institutions and social facts emerge, compete and disappear. The individual can be simple, but society will still be complex. High quality, book length treatments of such methods, varying in technical and conceptual breadth and depth, can be found in Bowles (2004); Samuelson (1998); Sandholm (2010); Weibull (1995); Young (1998b).
Inspired by such logic, away from the spotlight, a substantial body of researchers has continued to work on evolutionary game theory in the social sciences, taking it in interesting and sometimes unexpected directions. A disproportionate number of these researchers are relatively young, and this survey aims to draw attention to their work. The majority of work discussed at length in this survey was published over the last ten years by researchers awarded a doctoral degree in 2007 or later, though there are many exceptions to this rule.
It is intended that the topics covered here are treated with enough depth to leave the reader with a clear idea of the relevant concepts. Research is organized according to major themes and connections between studies both within and between these themes are remarked. Whilst the survey is predominantly neutral and descriptive, discussion of important growth areas (see Section 10) and open topics (see Open Topics throughout text) is necessarily subjective. The author values the reader's disagreement in such matters. Naturally, the reader who wishes to go deeper should refer to the cited papers themselves, but the presentation here may assist in finding relevant and interesting topics. Furthermore, although we discuss a good deal of literature across many areas, no claim to completeness is made

Structure of the survey
The structure of the survey is informed by the observation that a model of behavior must, either explicitly or implicitly, answer the question who does what to whom and in what circumstances? All topics discussed here can be thought of with reference to this question. A summary is provided in Figure 1.
Section 2 addresses who, considering the identity of agents, who may be individual humans, groups of friends, firms or even wind turbines. Methods for analyzing multiple levels of agency are considered, as are the implications of such agency, the question of what kind of agency might be expected to evolve, and links between individual and collective agency.
Section 3 addresses to whom, considering the identity of other agents with whom a given agent interacts. The implications of assortativity in interaction are considered, as are methods by which it might arise, such as agents choosing directly to interact with those similar to themselves, choosing institutions that determine their interactions, or deciding to forsake uncooperative partnerships.
Sections 4,5,6,7 address what. Section 4 considers the evolution of traits that affect behavior and the evolution of culture, both embodied at the individual level and embodied at the collective level in conventions.
Section 5 considers economic applications in areas such as market selection, the learning of rational expectations equilibria, price dispersion, and fluctuations Who does what to whom ?
Agency Section 2 Assortativity Section 3 Evolution of behavior Section 4 Economic applications Section 5 Evolutionary Nash Program Section 6 Behavioral dynamics Section 7 General methodology Section 8 Empirics Section 9 in aggregate inputs and productivity.
Section 6 considers the relationship between evolutionary game theory and cooperative game theory, addressing topics such as core convergence and selection, matching problems and the evolution of bargaining solutions.
Section 7 surveys work on a broad range of dynamics, including reinforcement learning, imitation, best experienced payoff dynamics, best and better response dynamics, dynamics for games with continuous strategy sets and completely uncoupled dynamics.
Section 8 considers methodology and technical results for perturbed dynamics, stochastic stability, evolutionary stability and systems of distributed control.
Section 9 discusses empirical work, divided into studies relevant to best and better response dynamics, imitation, completely uncoupled dynamics and the nature of errors in perturbed dynamics.
Sections of the survey can be read independently. However, to balance ease and clarity of reading, specialist terminology is precisely defined only once and referenced in the remainder of the text where it is appropriate to do so.

AGENCY -WHO MAKES DECISIONS?
An important consideration when modeling (human) action is the question of the identity of the agent that chooses an action or actions. We can think of many possible agents that might take actions. An agent could be an individual, a group of individuals, a module within a mind that contains many such modules, or even a piece of software. In particular, collaborative decision making is widespread amongst humans and something that humans are especially good at relative to other primates. In the words of Michael Tomasello: "...humans are able to coordinate with others, in a way that other primates seemingly are not, to form a "we" that acts as a kind of plural agent to create everything from a collaborative hunting party to a cultural institution." - Tomasello (2014) From an economic perspective, an agent could be a consumer, a family, or a firm. From a distributed control perspective, an agent might be a wind turbine or a group of wind turbines (Marden and Shamma, 2012a).
Evolutionary game theory studies adaptive rules that govern behavior. If realistic behavior involves decisions being made at multiple levels of agency, then this can be easily incorporated into such a rule. Consider a set of individuals N , such that individual i ∈ N adopts some strategy s i ∈ S i . Let S be the set of strategy profiles, such that S = × i∈N S i . When strategy updating occurs, instead of an individual i ∈ N following an individualistic strategy updating rule of the form we can have a coalition T ⊆ N update s T ∈ S T = × i∈T S i by following a collective strategy updating rule of the form For example, an individual better response dynamic mandates that i ∈ N chooses a strategy that is at least as good for himself as his current strategy, holding fixed the strategies of other players. That is, he chooses an individual better response from the set For a coalitional better response dynamic this can be generalized so that T ⊆ N chooses a strategy subprofile that weakly benefits all of its members. That is, coalition T chooses a coalitional better response from the set suppR T (s) = {s * T ∈ S T : ∀i ∈ T, π i (s * T , s −T ) ≥ π i (s)}. (2.4) Each of these rules can be made into a best response by the addition of a Pareto condition. Furthermore, such rules can be perturbed by errors in strategy choice, thus making applicable the methods and tools of perturbed adaptive dynamics There are three states, x, y and z, all assumed to be rest points of an individualistic best response dynamic. Arrows between states indicate, for the most probable transition path from one state to another, e, the number of random errors on this transition path, and c, the size of the largest coalition that makes a coalitional response on this path. and stochastic stability analysis Foster and Young (1990); Freidlin and Wentzell (1984); Kandori et al. (1993); Young (1993a). Strategy updating rules that incorporate collective agency have been used before (e.g. Feldman, 1974;Green, 1974). Recently, however, substantial progress has been made in understanding how these rules can be used, the implications of such rules, and whether the ability of individuals to participate in such collective behavior is likely to evolve.
Open Topic 1 The above discussion relates to multiple agency variants of best and better response dynamics. Multiple agency variants of other dynamics are still to be explored. For example, at some weddings, after the marrying couple has started dancing, other couples join the dancing. Some couples join the dancing early, while others wait until a large proportion of people are dancing, so we have something like a collective (pairwise) version of the threshold model of Granovetter (1978). Another example is a group of companies who collectively imitate successful contractual arrangements used by other groups of companies.

Coalitional stochastic stability
It is clear that behavior under a dynamic with frequent coalitional strategy updating may differ from behavior under a purely individualistic dynamic. However, Newton (2012a) notes that even in contexts in which collective agency is infrequent, it may still be more frequent than the kind of errors in strategy choice that determine long run outcomes under perturbed adaptive dynamics. To give some background, the literature on perturbed adaptive dynamics and stochastic stability (see Section 4.2.1 for technical definitions) builds upon the fact that if individuals usually follow some behavioral rule, but occasionally make an error and deviate from this behavioral rule, then these small errors can have a large effect on long run outcomes. Similarly, the idea behind coalitional stochastic stability is that small probabilities of collective agency can have a large effect on long run outcomes.
To fix ideas, consider a situation in which in any given period, with probability 1 − ε b2 − ε b3 − ε be , 0 < b 2 < b 3 < b e , some player chooses an individual best response; with probability ε b2 , some pair of players makes a coalitional best response; with probability ε b3 , some trio of players makes a coalitional best response; and with probability ε be , some player makes an error and chooses a strategy randomly. For small values of ε, it should be clear that coalitional best responses occur much more frequently than random errors. Moreover, b 2 < b 3 implies that coalitional best responses by coalitions of size two occur much more frequently than coalitional best responses by coalitions of size three. If, in addition, we let b 2 b 3 b e , then identically to standard stochastic stability analysis, random errors can be used to select between recurrent classes of the process without random errors. Consider the example given in Figure 2, in which x, y and z are the rest points of an individualistic best response dynamic. In this example, one random error is required to move from x to {y, z}, but two random errors are required to move in the opposite direction, so in the long run, the state will usually be in {y, z}. However, unlike the standard model, once a recurrent class of the process without random errors is selected, further selection can be obtained. Coalitions of size three are required for transitions from z to y, whereas transitions from y to z only require coalitions of size two, which are much more likely. Therefore, in the long run, the process will spend most of the time at state z. Thus, a hierarchy of rare behaviors allows iterative 'drilling down' to select progressively smaller subsets of the strategy space. Note that changes to the ordering of rare behaviors, such as considering b 2 > b 3 for example, do not present any additional technical difficulties.

Coalitional logit choice
Sawa (2014) describes a perturbed coalitional best response rule whereby at period t, some coalition T ⊆ N is randomly chosen and has the opportunity to accept or reject a randomly chosen alternative strategy subprofile for itself. Let the current strategy profile be s t−1 = (s t−1 T , s t−1 −T ) and the proposed alternative subprofile be s T . Let each i ∈ T independently accept the proposed new strategies with probability given by the logit choice rule where η > 0. If every i ∈ T accepts the proposal, then the strategy profile becomes s t = (s T , s t−1 −T ). If at least one i ∈ T does not accept, then the strategy profile remains s t = s t−1 . Figure 3: Two player prisoner's dilemma. Let β, γ > 0. For each combination of C and D, entries give payoffs for the row player. β is the payoff advantage of (C, C) over (D, D). γ is the gain from defecting on a cooperator. The gain from defecting on a defector is normalized to 1.
With regards to perturbed adaptive dynamics, the salient feature of the individualistic logit choice rule is that the cost (the exponential decay rate of the probability as η → 0, see Section 4.2.1) of playing a given non-best response is equal to the expected payoff difference between playing a best response and playing the non-best response in question. Considering T = {i} in (2.5), this corresponds to the cost of the transition from s t−1 to (s T , s t−1 −T ) being equal to max 0, π i (s t−1 ) − π i (s T , s t−1 −T ) . (2.6) Now, when T is not a singleton, the probability that every i ∈ T accepts according to (2.5) is given by these probabilities being multiplied together. Note that each of the individual probabilities takes into account the proposed strategy change by all members of T . That is, this is not simply the probability of every member of T switching as per an individualistic logit rule. The exponential decay rate of this combined probability is then the sum of the exponential decay rates of the individual probabilities. That is, the cost of the transition is This simple expression is easy to work with, although it should be noted that the clean characterization of stochastically stable states of potential games under individualistic logit choice (Blume, 1993) does not transfer to coalitional settings (see Section 2.4 for a discussion of potential and agency).

Frequent or infrequent coalitional behavior
One difference between the models of Newton (2012a) and Sawa (2014) (see Sections 2.1.1 and 2.1.2) is that the former considers infrequent coalitional choice in dynamics, whereas the latter makes no assumption about frequency. An implication is that stable cyclic behavior will arise more frequently under the latter dynamic, precisely in situations in which there is some conflict between different levels of agency. The classic example of a game with conflict between different levels of agency is the prisoner's dilemma, in which the tension between individual agency (defection as a dominant strategy) and collective agency (every pure strategy profile is efficient except for joint defection) defines the dilemma. Figure  3 gives a parameterized prisoner's dilemma that will be occasionally referred to for the remainder of this survey. In prisoner's dilemmas, the model of Newton (2012a) will select (D, D) as coalitionally stochastically stable, whereas in the model of Sawa (2014), cyclic behavior will be stable. In many situations, including many of the examples in Newton (2012a), there is no tension between different levels of agency and qualitative results do not differ according to whether coalitional behavior is rare or frequent. In particular, incentives at all levels of agency are perfectly aligned at a strong equilibrium (Aumann, 1959), a strategy profile at which there is no profitable coalitional deviation available to any subset of players. Avrachenkov and Singh (2016) show that, if every subset of players can coalitionally update its strategies and, when it does so, will choose a coalitional better response with probability 1 − ε, but with probability ε will choose from a full support distribution on all possible non-best responses, then any strong equilibrium is stochastically stable.

Implications of collective agency
The implications of collective agency for evolutionary models in economics have only recently begun to be explored. Nevertheless, some interesting results have emerged. Some of these results (Newton, 2012b;Serrano and Volij, 2008; results on matching) relate to the Evolutionary Nash Program, so discussion of these is deferred until Section 6.
2.2.1. Coordination games on networks. Angus (2013, 2015) consider players as vertices on a graph (a 'network') who each play a strategy A or B. A player's payoff is the sum of his payoffs from playing his strategy against each of his neighbors in the game in Figure 4[ii]. They study the effect of coalitional behavior on the speed of dispersion of strategy A, the efficient strategy, starting from a state in which all players play B. It turns out that the introduction of coalitional behavior can have either of two effects, (a) a conservative effect (Figure 5[i,ii]), by which coalitional behavior greatly slows the adoption of the new strategy, or (b) a reforming effect  Figure 4[iii], and analyze how the size of teams in an organization (cliques on a network) affects long run behavior when small groups of players within a team meet to adjust their strategies. Large teams end up playing A as the risk-dominance effect of the classic individualistic model (Young, 1993a) outweighs the coordination effect of collaborative choice. Medium size teams end up playing B for the opposite reason. The behavior of small teams depends critically on their neighbors within the organization.
Note that in the individualistic model, as long as behavior only depends on payoff differences (either ordinal or cardinal) and not absolute payoff values, behavior in all of the games in Figure 4 is identical: β is a redundant parameter. This is not the case in the presence of collective agency. The link between payoffs and agency is explored in depth in Newton and Sercombe (2017), which is discussed further in Section 2.4.

Matching
If a coalition of size two decides to adjust its behavior to the mutual benefit of its members then it acts as a collective agent. Coalitions of size two are the objects considered by notions of pairwise stability in networks (see, e.g. Jackson, 2010), a special case of which is pairwise stability in matching models. For example, in Roth and Vande Vate (1990), a man and a woman meet randomly and match with one another if and only if they prefer one another to their current partners. Such a rematching is an instance of collective agency, as neither the man nor the woman can effect such a rematching on their own.
In recent years, there has been many studies on evolutionary dynamics in matching problems. Similarly to Roth and Vande Vate (1990), the main results from these studies tend to concern the convergence of the dynamics to particular solutions from cooperative game theory. As such, they are part of the Evolutionary Nash Program and will be discussed at length in Section 6.

Social choice rules
Okada and Sawa (2016) examine an evolutionary model in which the policy followed by a collective is determined by majority (or supermajority) voting by individuals. They consider policies that emerge under given voting rules. A voting rule is a way of choosing collectively. In fact, it can be regarded as a strong way of doing so because the wishes of individuals in a minority are disregarded. It follows that, in this sense, the weakest voting rule is the one that requires all individuals to agree: the unanimity voting rule. Under the unanimity rule, the only way a new policy x can defeat a status quo policy y is if every voting individual weakly prefers x to y. This is a coalitional better response rule based on pairwise comparison and can be compared to expressions (2.2) and (2.4) at the beginning of Section 2. Okada and Sawa (2016) find that when their voting dynamic is perturbed uniformly, Condorcet winning policies (policies that beat all others under simple majority rules) are stochastically stable. Furthermore, if the voting rule is the unanimity rule, then, under coalitional logit choice, Borda (points based rankings, de Borda, 1784) winning policies are stochastically stable.

The evolution of collective agency
It is possible to ask whether the ability of multiple individuals to act as a single collective agent is an ability that will be evolutionarily selected for. This may at first seem like a question with an easy answer. After all, we are considering the ability to participate in joint behavior that is mutually beneficial for all concerned. However, such a rush to conclusions proves to be ill founded for several reasons.
Firstly, as discussed above, the ability to participate in collective agency may slow the spread of efficient behavior on a network Angus, 2013, 2015). This means that populations in which such an ability is widespread may find that new technology is adopted more slowly than in populations in which such an ability is rare or absent. If this is the case, then under reasonable assumptions on migration and conflict, a group selection model, in which some selection occurs at the population (the 'group') level when high technology populations invade and replace low technology populations, can lead to selection against the ability to participate in collective agency .
Secondly, in some situations, it may be beneficial to be a type of individual that cannot participate in collaborative behavior (Newton, 2017b). One case is when there exists the opportunity to free ride on the collaboration of others in such a way that free riders do better than those who collaborate. An example of this is when Alice and Bob collaborate to hunt an animal but Colm, lacking the ability to collaborate, can still eat the leftover meat without exposing himself to the risks of the hunt. Another case is when there is positive assortativity in types (see Section 3) and the externalities of collaboration are negative. An example of this is when Alice and Bob can collaborate to attack another individual, but due to positive assortativity, this individual is likely to also be a collaborative type. Consequently, those who cannot collaborate benefit from being less likely to be subject to such negative externalities.
Finally, it may be the case that people mistakenly think that they are collaborating with a partner when the partner does not in fact have the ability to collaborate. Rusch (2017) examines this possibility in a model in which players play a variety of symmetric, two strategy, two player games, with the share of any given game given by a probability measure. He finds that as long as the prisoner's dilemma ( Figure 3) is not too likely under this measure, then the ability to collaborate will be selected for.

Links between individual and collective agency
Nax and Perc (2015) discuss payoff-based learning in public goods games. This process is completely uncoupled (see Section 7.6), relying neither on opponents' payoffs nor their prior actions. They consider how simultaneous errors by multiple players can end up benefiting the error-making players. Although contribution to a public good may be suboptimal from an individualistic perspective, payoffs may increase when several players simultaneously start to contribute. Hence, profitable coalitional strategy changes are replicated by the errors of individuals. To replicate coalitional moves by larger numbers of players requires a larger number of mistakes and so such moves are relatively less likely. This is similar to the assumption made in Section 2.1.1, although there it is an assumption, whereas here it emerges endogenously. A Nash equilibrium is k-strong if there exists no profitable coalitional deviation for a coalition of any size up to and including k. The authors show that long run behaviour depends on the values of k for which the equilibria in their model are k-strong.
The evolution of preferences literature often follows the indirect evolutionary approach (Güth and Kliemt, 1998) and assumes an outcome given the types of the players. This outcome is usually assumed to be a Nash equilibrium and, in the tradition of Harsanyi (1967), types are usually taken to specify the payoffs of players. Examples of such models can be found in Dekel et al. (2007);Heifetz et al. (2007); Samuelson (2001); Sethi and Somanathan (2001). In general, a change in the type of only one player can lead to a change in everyone's behavior via the assumption of equilibrium. For example, in games of strategic substitutes (e.g. Cournot competition), if the type of one player changes so that he prefers to play higher actions than he did before, the equilibrium actions of the other players will decrease. However, Herold (2012) gives a model in which sufficient numbers of rewarder or punisher types are required to induce a change in equilibrium behavior. When there are sufficient rewarders playing the game, the change in players' behavior induced by the assumption of equilibrium benefits all players. Such a change may benefit rewarders less than non-rewarders in a similar way that collaborators may benefit less from collaboration than noncollaborators do in Newton (2017b), discussed in Section 2.3. However, in both these papers, the rewarder/collaborator type is more likely to find themselves in a group where there are a sufficient number of rewarder/collaborator types to induce the change in behavior. This observation lies behind both Proposition 1 of Herold (2012) and Theorem 1 of Newton (2017b). A major difference between the two approaches is that the equilibrium approach assumes that non-rewarder types change their behavior as a consequence of rewarders in the population, whereas non-collaborator types may, but are not required to, alter their behavior in response to collaboration.
A coordination game like that in Figure 4[i] played on a network admits an exact potential function (Monderer and Shapley, 1996). A potential function retains information from individual payoff functions on the individual incentives of players, aggregating them into a single function. Young (2011) calls a set of players autonomous if, fixing the strategies of all other players and regardless of what these strategies are, potential is maximized when all of the players in the set play A. Another way of thinking of this, as explored further in Section 4.2.2, is that it will be stochastically stable under logit choice for players in any such set to play A. Newton and Sercombe (2017) link autonomy driven by a potential function, potential autonomy, to autonomy driven by collective agency, agency autonomy. A set of players is agency autonomous if, fixing the strategies of all other players and regardless of what these strategies are, players in the set would all gain from a simultaneous switch by every player in the set from B to A. It is shown that every potential autonomous set on every network is agency autonomous if and only if β ≥ 1 + α. Conversely, every agency autonomous set on every network is potential autonomous if and only if β ≤ α /2. That is, the payoffs of the game provide a connection between aggregation of agency and aggregation of incentives via a potential function.
Open Topic 2 Consider the mind as software that applies an algorithm to solve problems. This software is modular, in that different sections of code achieve different tasks, but nevertheless communicate with one another towards some overall goal. In a similar manner, two people may collaborate and their two minds together be considered as one piece of software with two distinct modules that communicate. It would be interesting to see an evolutionary game theoretic model of choice based on this hierarchy (parts of a mind, whole mind, collective 'mind') in which the relationship between the lower and middle level of the hierarchy is  Figure 6: Assortativity in matching. From a population, individuals are matched into groups to interact (Section 3.3). This matching may be affected by the traits of the individuals (Section 3.2.1) and also by institutions at a societal level (Section 3.2.2). Individuals may be able to choose to join institutions (Section 3.2.3) and the institutions may have their own preferences over their membership. Note that the institution in the Figure is highly positively assortative in that it matches individuals with other individuals of a similar type. Individuals may have the option to interact multiple times with those with whom they are matched, or to leave them and seek new partners (Section 3.4).
qualitatively similar to the relationship between the middle and upper level.

ASSORTATIVITY -WITH WHOM DOES INTERACTION OCCUR?
Assortativity is a tendency for agents to engage in a disproportionate share of their interactions with those who are either similar in a trait (positive assortativity) or dissimilar in a trait (negative assortativity). This can be as simple as spending more time with members of one's family than one does with random strangers. Although assortativity and associated effects on behavior have been studied for a long time (see, e.g. Eshel and Cavalli-Sforza, 1982;Wilson and Dugatkin, 1997), the relationship between the two has recently been attracting increased attention from economists. The general structure of the effects governing assortativity that are discussed in this section are given in Figure 6.

Assortativity and preferences
There has long been interest in the relationship between assortativity and selection for altruistic preferences, often modeled as a predilection to play C in a prisoner's dilemma ( Figure 3) in which payoffs represent fitness. All evolutionary models of which the author is aware that work in favor of selection of such preferences rely on inducing positive assortativity in behavior. That is, for playing C to be profitable, it must be played a disproportionate amount of the time against C. Examples include repeated interaction (Trivers, 1971), kin-selection (Fisher, 1930;Hamilton, 1963) and group selection (Bowles, 2006a;Choi and Bowles, 2007;Haldane, 1932). For an extensive and detailed discussion of cooperation, the reader is referred to Bowles and Gintis (2011), and for a specialized review of parochial altruism theory, to Rusch (2014).
More recently, Weibull (2012, 2013) have considered the relation of assortativity to a broader class of preferences. They study the evolution of preferences when players are matched to play a two player game. Payoffs in the game correspond to fitness. Types of players correspond to preferences over outcomes in the game. The matching protocol is exogenously given and exhibits a fixed amount of assortativity. Specifically, consider an incumbent type θ which comprises a 1−ε share of the population and an invading type τ which comprises a ε share of the population. Let P r[τ |τ, ε] be the probability that a τ type is matched with a τ type. Let where σ is independent of τ . That is, any invading mutant type τ that appears in small numbers will be such that any given mutant will be matched to another mutant with probability approximately equal to σ and matched to an incumbent with probability approximately equal to 1 − σ. This use of a coefficient of assortativity, σ, follows Bergstrom (2003).
If σ = 1, then any incumbent population that fails to achieve efficient payoffs will be vulnerable to invasions of mutants who play efficiently against one another. Conversely, if σ = 0, then any incumbent population whose members do not behave in a way consistent with individual fitness maximization will be vulnerable to invasions of mutants who maximize individual fitness. This logic extends to intermediate values of σ, so it transpires that the most stable behavior accords with preferences that are a weighted average of fitness maximization and efficient symmetric choice: where the weighting given to efficient behavior is increasing in assortativity σ. These arguments are expanded to m player symmetric games in Alger and Weibull (2016). See Section 3.3 for a discussion of assortativity of types in games with more than two players.
The preferences described in (3.2) were earlier derived in Bergstrom (1995) for the special case of games between siblings and stability of preferences to small invasions of dominant mutant genes in a model of sexual reproduction. In this case, a mutant has a probability of one half of having a mutant sibling, equivalent to a level of assortativity in interaction of σ = 1 /2. The cited paper also gives results for the case of invasions of recessive mutant genes and compares these stable behaviors to preferences that correspond to Hamilton's coefficient of relatedness (Hamilton, 1964a,b), which in siblings is equal to one half. Alger and Weibull (2010) also make such a comparison for the specific case of altruism. Bilancini et al. (2018) consider the evolution of cooperation in a population comprised of individuals of two types. Each individual is matched to play a prisoner's dilemma in which he can either cooperate or defect. Individuals of each type suffer a payoff loss from interacting with individuals of the other type. Such a desire to interact with those similar to oneself is known as homophily. Interaction is assumed to be positively assortative (per the specification of Cavalli-Sforza and Feldman, 1981) according to strategies played but not according to types. If payoff losses from interacting with individuals of the other type are sufficiently large, it is then evolutionarily stable for one type to play C and the other type to play D, with individuals of each type choosing their strategy in order to minimize their chance of being matched with an individual of the other type. That is, they use positive assortativity according to strategy to induce positive assortativity according to type.
Note that the preferences described in (3.2) are defined for a fixed level of assortativity that applies to any invading type. The case of differing values of σ for alternative invading types will be considered next.

Individual types and assortativity
Following work in the biology literature on the evolution of assortativity (e.g. Cara et al., 2008;Dieckmann and Doebeli, 1999;Matessi et al., 2002;Otto et al., 2008;Pennings et al., 2008;Servedio, 2011), Newton (2017a) shows that under a specification of type-specific assortativity given by Cavalli-Sforza and Feldman (1981) and applied to the model of Alger and Weibull (2013) discussed in Section 3.1, stability is only possible when an incumbent population behaves efficiently and does not interact at all with invading τ types, thus ensuring that σ = 1 in (3.1). Perfectly assortative efficient behavior is not susceptible to invasion, but everything else is. The reasons for this are that (i) unless incumbent θ types only interact with one another, if they do not behave in a way consistent with individual fitness maximization, then they are vulnerable to invasion by a type τ that maximizes individual fitness and has no predilection for assorting with its own kind, and (ii) if incumbent θ types do not behave efficiently, then they are vulnerable to invasion by a type τ that behaves efficiently and only interacts with other τ types.
Even without assortativity in interaction, individuals can sometimes adjust their behavior so that it correlates with the types, and therefore the behavior, of those with whom they interact. Specific correlations are sometimes generated via the twin devices of equilibrium and preferences. For example, Herold and Kuzmics (2009) show that an incumbent type that earns more than the minmax payoff when playing against itself can maintain stability by playing spitefully against invading mutants so as to minimize their fitness.
In practice, the maximum assortativity based on types that can be maintained may be bounded above, as mutants are not always easy to recognize. This has been considered in the evolution of preferences literature (e.g. Dekel et al., 2007;Heifetz et al., 2007), in which non-observability of types reduces the possibility of assortativity in behavior according to type. That is, if Alice cannot observe that Bob is a mutant, she cannot condition her behavior towards him on whether or not he is a mutant. Nevertheless, high levels of assorting by phenotype (i.e. by observed behavior, not by types per se) can plausibly be maintained by shunning and ostracism of those who exhibit unusual behavior. Moreover, there may be gains to be had from considering different forms of recognition. Do two objects with different charges recognize each other as they attract each other? At what point in a conversation do two fluent English speakers recognize one another as such? Similar examples can be found in Newton (2017b). Finally, note that the cause of differing behavior and assortativity need not be genetic and may be cultural. Types in the above models relate to individuals and hence to culture embodied at the individual level, models of which are discussed further in Section 4.3. Further discussion of assortativity in behavior can also be found in Section 4.1.4.

Institutions that determine assortativity
Assortativity that may vary by individual according to individual traits is not the only assortativity of interest. Assortativity may also be embodied in institutions and determined at a societal level, similarly to other aspects of culture (see Section 4.2). Nax and Rigos (2016) consider a model in which there are two types, each of which plays a given strategy in a two strategy, two player game drawn from a variety of social dilemmas. Assortativity in the matching protocol is determined by logit-style voting for higher or lower assortativity. So the probability of an individual voting for higher assortativity is increasing in his gain from an increased level of assortativity. This weighted majoritarian rule pushes assortativity in the direction favored by one of the types, who then grow as a share of the population and proceed to push assortativity even further in the same direction. Consequently, stability always involves either no assortativity or full positive assortativity (negative assortativity is not permitted by the model). Similar results were subsequently found in Wu (2016), in a setting that considers unweighted majority voting and restricts itself to coordination games, while allowing negative or positive assortativity. This can lead to the stochastic stability, under a best response dynamic with uniform errors, of a Pareto efficient but non-risk dominant equilibrium in the coordination game, as individuals who play the strategy associated with the Pareto efficient equilibrium vote for high assortativity so as to segregate themselves from any players who play the strategy associated with the risk dominant equilibrium. Wu (2017b) gives an institutional setup in which there are equal numbers of two positions, high and low, in a society. Individuals in high positions are matched to play a game against individuals in low positions. The high position in the game is associated with higher payoffs than the low position. There are two types of individual, θ, τ . Assortativity in interaction is then determined by the share of the high positions that are held by players of each type. For example, if θ players hold all of the high positions and τ players hold all of the low positions, then matching will be perfectly negatively assortative: individuals will never play against an individual of the same type. It is assumed that the share of each type in a high position is determined not by voting, but by a form of generalized Nash bargaining solution (Nash, 1950) with the bargaining weights given by the population shares of each type.
Wu (2017a) considers a similar model to the above, with a continuum of types and payoffs that are continuous in types. The evolutionary stability of a population of a single type to invasions of slightly different types is considered. Under majoritarian voting for who holds high positions, homogeneous populations of any type are stable, as the incumbents, forming a majority, ensure that invaders occupy low positions. Under the Nash bargaining solution formulation, a type is only stable if, when matched against a slightly different type, it obtains a higher marginal benefit than the slightly different type from being in the high position rather than the low position.

Choosing an institution
Another approach to the evolution of assortativity is found in Alós-Ferrer and Buckenmaier (2017). There are two types of trader, buyers and sellers. Each period, traders either choose an institution at which to trade or remain at the same institution as in the previous period. The role of the institution is to partition the traders at that institution into sets for the purpose of trade. For example, one type of institution, a bazaar, matches buyers and sellers into pairs and leaves the remaining traders unmatched. Another type of institution, a centralized institution, matches all of the buyers and sellers into a single set. In any case, buyers and sellers within the same set trade with each other at a price that is increasing in the ratio of the number of buyers in the set to the number of sellers in the set. Buyers prefer lower prices and sellers prefer higher prices. Over time, traders move between institutions and the environment evolves. Thus, it is the the matching algorithm itself, responsible for assortment, that is the unit of selection. The paper goes on to show that the centralized institution that matches all traders at that institution into one set, has strong attractive properties under updating rules that satisfy certain conditions. Thus the paper contributes to a set of papers that give results for classes of updating rule that satisfy criteria that are plausible in the given context (see also Klaus and Newton, 2016, discussed in Section 6). Note that the many to one matching of individuals to institutions is similar to the 'college admissions' matching problem, evolutionary models of which will be discussed in Section 6.3.
Open Topic 3 In Section 3.2 we have seen how assortativity can arise from culture (or genes) embodied at an individual level (e.g. avoid people with tattoos) or at a collective level (e.g. systems of schooling and university attendance). Some collective cultures of assortativity will be subject to selection driven by individual choice as happens in Alós-Ferrer and Buckenmaier (2017) and in Carvalho (2012), discussed in Section 4.4. Study of such topics is extremely pertinent to societies with large cultural minorities and associated conflict.

Generalized assortative matching protocols
Van Veelen (2011) takes a general approach to exogenous assortativity. Individuals, whose types correspond to one of two strategies, are drawn from a population and matched into groups of m individuals to play a game. The share of each type of individual then evolves according to the replicator dynamic. Alger and Weibull (2016, see Section 3.1) consider a similar situation in which types correspond to a continuous space of preferences over fitness. Even more generally, Jensen and Rigos (2017) consider matching protocols that group individuals from a population to play m player, n strategy symmetric games. Again, strategies correspond to types. Jensen and Rigos (2017) refer to a population state x * and a matching protocol f * as an evolutionary optimum if they lead to the maximum average fitness amongst all (x, f ) pairs such that population state x is a stable state given matching rule f . It is shown that there exists a matching protocol h such that (x * , h) is also an evolutionary optimum and a Nash equilibrium of a game derived from the evolutionary setup: a Nash equilibrium under a matching rule. The result is obtained by constructing the matching rule h such that invading types that are not in the support of x * are matched into groups apart from other types. Finally, Newton (2017b) uses a similar setup, allowing asymmetric games and infinite strategy sets, but with only two types, which do not correspond to strategies but rather to the presence and absence of a behavioral trait, as discussed in Section 2.3.

Conditional dissociation
One way in which assortativity in interaction may be induced is by conditional dissociation, whereby individuals, who are partnered to play a game, can choose to either remain with the same partner for another period of play or to leave and randomly rematch with some other individual. Such an environment can be thought of as lying somewhere between random matching every period and a setting in which individuals play the same opponent for life.
Fujiwara-Greve and Okuno-Fujiwara (2009) analyze a setting in which players are matched to play a repeated prisoner's dilemma but have the opportunity to conditionally dissociate every period. Cooperation (C, C) is assumed to be efficient as measured by sum of payoffs (γ < 1 + β in Figure 3) for most of the analysis, though the alternative possibility is considered later in the paper. The strategy space is rich: strategies can be contingent on the entire history of play in a partnership, although cannot be conditioned on events prior to or outside of the current partnership. Mixed strategies are disallowed. They show the (neutral) stability of trust building strategies, in which players defect for a fixed number of periods against their partner before they start to cooperate, following which they continue to cooperate until one of them either dies or fails to cooperate. This initial period of defection reduces continuation payoffs after dissociation from a partner and thus discourages defection. Positive assortativity between types is induced as players only persist with their current partner when their strategies complement one another. Fujiwara-Greve et al. (2012) analyze a variant of this model in which players can, each period, after observing play, furnish their opponent with a reference letter for a small cost. If the partnership terminates due, for example, to the death of one individual, then the other individual can take the reference letter with him. The trust building period between two individuals who hold reference letters can then be shortened. Vesely and Yang (2012) shows that consideration of mixed strategies can make some polymorphic neutrally stable states of the model of Fujiwara-Greve and Okuno-Fujiwara (2009) no longer neutrally stable. The reason is that the polymorphism in the prior model causes some partnerships to terminate for reasons other than the death of one of the partners. This opens the door to the invasion of mutants who mimic, by playing a mixed strategy, the distribution of existing types in the population. The only difference in the behavior of the mutants is that they use a secret handshake (not terminating partnerships when they are expected to do so) to gain higher payoffs when playing against other mutants. Vesely and Yang (2010) explore this in more detail, constructing neutrally stable states which are resistant to secret handshakes. This is done by avoiding voluntary dissociation (i.e. not caused by death) in equilibrium by punishing deviations within existing partnerships. Notably, such punishment is milder than traditional grim trigger punishments as the possibility of dissociation gives players an outside option, removing the 'assortativity' of remaining with the same partner forever. Izquierdo et al. (2010) obtain cooperation in a model which is similar to Fujiwara-Greve and Okuno-Fujiwara (2009), but has a much simpler strategy space (see Sigmund, 2010, for discussion of similar models). Strategies are triplets that specify (i) Cooperate or defect when first matched with a partner; (ii) Cooperate, defect or dissociate when partner cooperated last period; (iii) Cooperate, defect or dissociate when partner defected last period. When individuals have short expected lives, defection dominates in stable equilibrium, but when individuals have long expected lives, the assortativity induced by strategies in which players start off cooperating and continue to do so as long as their partner also cooperates is enough to generate high levels of cooperation in stable equilibrium. High levels of cooperation can still be generated by tit-for-tat strategies (cooperate but defect if partner defects) in the model without the option to leave. However, without the option to leave, a tit-for-tat player will have to wait for any defecting partner to die, hence the model with the option to leave can generate even more cooperation. This advantage is one reason why in the model with the option to leave, players who cooperate but leave when their partner defects end up comprising a share of the population almost twelve times that of tit-for-tat players. Izquierdo et al. (2014) analyze a similar model with a slightly reduced strategy space and give further analytic results for the dynamics and stable states. Rivas (2013), also mentioned in Section 4.2.4, considers the case when pairs of cooperating players automatically remain together in the following period.

Network formation
Returning briefly to the time honored topic of risk dominance versus Pareto efficiency in two strategy coordination games (see Figure 4[iii]), Staudigl and Weidenholzer (2014) consider a setup in which any given player chooses both a strategy in the coordination game and a set of other players to play the game with (see Hellmann and Staudigl, 2014 for a discussion of previous work along these lines). The given player's payoff is then given by the sum of his payoffs in the game played against each of these players, minus a fixed cost for each of the players with whom he plays the game (a linking cost). The number of other players with whom a player can play the game is bounded above by a constant k. Considering a perturbed best response dynamic, starting from a state in which all players play the strategy associated with the risk dominant equilibrium, it takes at most k players to make errors and switch to the strategy associated with the Pareto efficient equilibrium for any other player to benefit from switching to the strategy associated with the Pareto efficient equilibrium and choosing to interact with the k players who already play this strategy. This amounts to changing strategy and consciously engaging in positive assortment with those who play that strategy. In this manner, k errors suffice to move to a state in which all players play the strategy associated with the Pareto efficient equilibrium. For low values of k, this means that the Pareto efficient equilibrium can be stochastically stable. The reader will see the strong similarities between these arguments and the arguments of Wu (2016) discussed in Section 3.2.2, although the papers come to the topic from differing perspectives. Bilancini and Boncinelli (2015) consider a model that differs from that of Staudigl and Weidenholzer (2014) in that there are two types of player and play-ers of each type suffer a cost from interacting with individuals of the other type. In choosing a best response, players know the types of their current neighbors, but are assumed not to know the types of those who are not their neighbors and instead only know the proportions of each type in the population that currently plays each strategy. When the cost of heterogeneous interaction is high, stochastically stable states involve players segregating by type. More surprisingly, one type plays one strategy and the other type plays the other strategy. The reason that such profiles are especially stable is that, starting from such a profile, following an error, the error making player can, with high probability, infer players' types from the strategies that they are currently playing and use this information to rematch and recoordinate with those of the same type as himself. Goyal et al. (2017) consider a similar model of endogenous interaction to the above but for situations in which two players interact if and only if they both desire this. Payoffs for each interaction are similar to Figure 4[ii], except that there is an additional payoff for coordinating with oneself; and, similarly to the language game of Neary (2012) discussed in Section 4.2.2, the preferences of some players over (A, A) and (B, B) are reversed, so that these players obtain a payoff of 1 from (A, A) and a payoff of 1 + α from (B, B). Under a pairwise coalitional dynamic (see Section 2.2.2), conditions are given for the stochastic stability of the state at which every player interacts with every other player and plays the strategy corresponding to the preferred outcome of the majority of players. Bilancini and Boncinelli (2009) consider network formation in a situation akin to a multiplayer prisoner's dilemma, in which cooperation by a player decreases his payoff relative to defection but creates a benefit to all of his neighbors in the network, with the maximum number of neighbors of each player bounded above. Each period, a player chooses whether to cooperate or defect, following which, players have the opportunity to sever any of a randomly chosen subset of their existing links then create new links before payoffs are realized. Players sever links with defectors, so if a player chooses to defect he faces the cost of other players severing links with him. The process converges to either full cooperation, full defection or a mixture of the two, depending on parameters.
Boncinelli and Pin (2017) give a very simple model of the formation of networks in which players have a maximum degree of 1 (i.e. can be matched to at most one other player). Players get a payoff of 1 if they are matched and a payoff of 0 if they are not matched. A perturbed pairwise coalitional best response dynamic is considered under two models of perturbations. In the link-error model, a single perturbation, occuring with probability of order ε, suffices to create or destroy a link that would not otherwise be created or destroyed. In the agent-error model, errors that cause one player to lose payoff occur with probability of order ε, and errors that cause two players to lose payoff occur with probability of order ε 2 . The substantive difference between the two models regards the creation of a link that benefits neither of the two players between whom the link is formed. Such an event occurs with probability of order ε under the link-error model and with probability of order ε 2 under the agent error model. A maximal matching is a network to which no additional edge can be added without causing a player to have degree more than 1. A maximum matching is a maximal matching with the largest number of edges. It is shown that under the link-error model, the set of stochastically stable networks corresponds to the maximal matchings, whereas under the agent-error model, the set of stochastically stable networks corresponds to the maximum matchings.

EVOLUTION OF BEHAVIOR
This section considers the evolution of behavior, broadly divided into the evolution of (i) traits that can be primarily thought of as genetic; (ii) conventions, the embodiment of culture at a societal level; and (iii) cultural values held by individuals. The general structure that tends to be followed by these models is given in Figure 7. Some work on the evolution of behavior in economic applications and as part of the Evolutionary Nash Program is deferred until Sections 5 and 6 respectively, and general results for various behavioral dynamics are deferred until Section 7.

Evolution of traits
Preferences in economics are a statistic derived from human choice behavior. However, preferences do not tell the whole story. Alice may believe that the world will end at midnight whereas Bob is a drug addict and doesn't think much about the future. Both Alice's choices and Bob's choices will exhibit low discount factors but for quite different reasons. This is fine as long as preferences are descriptive, but in the evolution of preferences literature they are usually assumed to be prescriptive. That is, they are associated with payoffs and describe a goal function. When preferences are considered as a goal function instead of simply a reflection of revealed reality, it becomes necessary to consider other factors that can affect the pursuit of these goals. Much of the work discussed in this section considers attributes such as intelligence and farsightedness that can exist together with or even lead to such a preference ordering.
A few of the models here follow the indirect evolutionary approach (Güth and Kliemt, 1998). This approach typically assumes that, given preferences that are some function of fitness, for example expression (3.2) in Section 3.1, players play a Nash equilibrium of the game given by those preferences and have their fitness determined accordingly. Generalizing, the indirect method can be regarded as a 'black box' that gives an outcome given traits, with Nash equilibrium being only one possible setting for the black box. One case in which Nash equilibrium may be an inappropriate outcome is when collective agency is possible. However, the evolution of collaboration has been discussed in Section 2.3, so will not be discussed here.
Finally, note that the most well developed literature on the evolution of traits is the literature on the evolution of cooperation. This literature tends to consider i j k &c.

Population Groups
Population matched into groups

Reproduction
Play a game &c.  Figure 7: The evolution of behavior. From a population, individuals are matched into groups to interact (Section 3). In some cases, the entire population will constitute a single group. Groups of matched players then play a game. How the game is played within a group may depend on the traits of individuals within the group, which may be genetic (Section 4.1) or cultural (Section 4.3). How the game is played may also depend on cultural conventions based on how the game has been played in the past (Section 4.2). Strategies are reproduced through intergenerational transmission or through individuals following some rule of strategic adjustment such as imitation or best response. how cooperation can be sustained in prisoner's dilemmas, usually through some form of assortativity in interaction. See the beginning of Section 3.1 for a brief discussion and references to such work. 4.1.1. Self-confirming beliefs Gamba (2013) takes an indirect evolutionary approach to studying altruism in the centipede game, but considers self-confirming equilibria (Fudenberg and Levine, 1993) instead of Nash equilibria. There are two types of player, selfish types and altruists. It is assumed that at every decision node, altruistic types take the opposite action to that specified by the subgame perfect equilibrium (SPE) of the centipede game. Selfish types assume that all players will always take the SPE action at every subsequent decision node, so they themselves always take the SPE action. Their beliefs are never disconfirmed, so we have a self confirming equilibrium. Altruists obtain lower payoffs than selfish types with whom they are matched, but they obtain very high payoffs when they are matched to other altruists. Hence, if there are enough altruists in the population, their population share grows under replicator dynamics and a monomorphic population of altruists is stable. 4.1.2. Level k thinking Kim and Hwang (2015) also consider selfishness and altruism, but in a model of level k players (Stahl and Wilson, 1995) playing a game with negative externalities and strategic substitutes. Level 0 types play a fixed strategy. Level 1 selfish types best respond to level 0 types, selfish level 2 types best respond to selfish level 1 types, and so on. Selfish level ∞ types play the Nash equilibrium strategy. Altruistic level k + 1 types altruistically best respond, that is best respond according to an altruistic payoff function, to selfish level k types. Altruistic level ∞ types altruistically best respond to selfish level ∞ types. It is shown that, if we restrict the type space to level 0 types together with selfish and altruistic level k types for some k, then, under some conditions, a population of the selfish level k type is not evolutionarily stable, but a population of the altruistic level k type is evolutionarily stable. Mohlin (2012) considers two player, symmetric, normal form games and level k players. In this model, level 0 players uniformly randomize over the strategy space. A type-acyclic game is defined as a game such that for large enough k, level k + 1 players play the same strategy as level k players, and consequently said strategy must be a Nash equilibrium. It is shown that if the highest k in the type space plays this Nash equilibrium strategy, then the set of states at which all types present in the population play this strategy is asymptotically stable. For example, a two strategy coordination game is type acyclic, as all players of level k, k ≥ 1 choose the same Nash equilibrium strategy, so the set of states in which every player has k ≥ 1 is asymptotically stable. Conversely, in typecyclic games, games in which a cycle of best responses emerges as k increases, under some regularity conditions there exists a unique asymptotically stable set. Furthermore, if the highest k in the population is large enough, then states in this set will always include some players who do not play identically to the highest type player.

Foresight
Heller (2015) considers a model of a finitely repeated prisoner's dilemma, with sum of payoffs maximized at (C, C) (i.e. γ < 1 + β in Figure 3), in which players have limited foresight in that they do not know which will be the final period of the game. Specifically, foresight is costly and any given player has a level of foresight as part of his strategy. Type L 1 only knows that the game is about to end when he reaches the final period. Type L k knows k periods ahead that the game will end. There is some probability that players observe the type of their opponent. This affects play. For example, an L 3 type who knows that he is playing against an L 1 type will know when the game is due to end two periods before his opponent does. He knows that his opponent, on realizing that it is the final period, will defect and that he will do likewise. Foreseeing this, he defects in the second last period as well. Similarly, if an L 3 type knows that he is playing against an L 2 type, he will defect in each of the final three periods. It is shown that combinations of L 1 and L 3 types that play tit-for-tat have good stability properties. In a population containing both of these types, L 3 types defect a period earlier and thus do better than L 1 types when playing against L 1 types, but L 1 types will sustain mutual cooperation against L 3 types for longer than L 3 types sustain cooperation against L 3 types.

Competing cognition
Robalino and Robson (2016) consider a finite extensive form game tree that is played repeatedly, with payoffs at each terminal node drawn randomly from a finite set of possible payoff vectors, to which new payoff vectors are occasionally added. There are two types of player, naive players, who cannot use information about other players' payoffs in making their decisions, and theory of preferences players, who can learn information about other players' payoffs and incorporate this information into their decision making. It is shown that theory of preferences players who optimize their payoffs come to dominate the population.
Heller and Mohlin (2017a) consider the evolution of preferences and cognition in an environment of symmetric two player normal form games. The cognition of a player is a natural number, with larger numbers representing higher levels of cognition. When two players, say Alice and Bob, with different cognitive levels play one another, if Alice has the higher level, then she may, with some probability, deceive Bob and choose Bob's beliefs about her actions. It is shown that in a stable state, types matched with the same type must choose an efficient (with respect to fitness) strategy profile, as otherwise there is the possibility of invasion by mutants who obtain higher payoffs in such interactions. Furthermore, if multiple types are present at a stable state, then any two types must play efficiently against one another, otherwise homogeneous (same type) interactions would give higher fitness than heterogeneous interactions and the state would not be stable against a small increase in the share of either one of the types.
It is worth remarking that the secret handshake (Robson, 1990) style arguments leading to efficiency amongst the same type in the model discussed above are a version of the assortativity arguments discussed in Sections 3.1, 3.2.1. The difference is that rather than a player having assortativity in interaction and interacting mainly with similar types, the player exhibits assortativity in behavior and behaves in one way when playing against one type and in another way when playing against another type. In fact, if we regard no interaction as a form of behavior, we can regard assortativity in interaction as a special case of assortativity in behavior.

Biases: overconfidence and endowment effects
Heller (2014b) gives a simple theory of the evolution of overconfidence. Each generation, each individual in a population chooses between a status quo technology and an individual-specific technology. Each individual observes a public signal of the likelihood that they succeed when they use the status quo technology and a private signal of the likelihood that they succeed when they use their own technology. The success probabilities of individuals using the status quo technology are correlated. Individuals of different types may be over or underconfident regarding their own technologies, interpreting their signal as indicating an inaccurately high or low probability of success. Consequently, overconfident types will be more likely than a rational payoff maximizer to choose their own technology. The expected fitness of rational payoff maximizers will be higher than that of overconfident agents, but due to the correlation in success when using the status quo technology, average fitness of rational types will have a higher variance. Some level of overconfidence will thus lead overconfident types to have a higher expected logarithm of average fitness, which is the quantity that matters for long run population growth (Lewontin and Cohen, 1969;Robson, 1996). Frenkel et al. (2018) consider barter trade and allow individuals to exhibit two biases, cursedness whereby a trader does not sufficiently adjust his beliefs about the quality of a trading partner's good in response to the partner's willingness to trade, and the endowment effect whereby a trader overvalues a good in his possession. These two biases can counteract one another so that in barter situations an individual with both biases can behave identically to, that is exhibit the same phenotype as, a rational decision maker. If, aside from barter situations, with small probability p other situations may be faced in which the biases cause their holders to lose payoff, then rational types can invade the population under the replicator dynamics. However, if we consider dynamics with imperfect replication (e.g. sexual reproduction) and two loci of selection (one for cursedness and one for the endowment effect), then offspring of rational types (e.g. from mating with a biased type) may be more likely to exhibit only one of the biases and thus achieve low fitness in barter situations. It transpires that populations of individuals that exhibit a rational phenotype in barter situations can only be invaded by type combinations that are close to them, so evolution can take a long time to eradicate the biases from the population. These results explicitly depend on p being low, so that biases do not harm payoffs greatly, and implicitly depend on assortativity amongst types being low, so that invading rational types do not mate too frequently with other rational types and produce rational offspring.
Open Topic 4 The concept of Nash equilibrium depends on individual incentives yet requires social coordination. Evolutionary dynamics can sometimes justify this social coordination, but this is not always the case. Consequently, even assuming individualistic decision making, Nash equilibrium is not an obvious starting point for social analysis (e.g. for use in the indirect evolutionary method). Maximin strategies, strategies which maximize the lowest possible payoff that a player could obtain by playing them, are not socially determined and so may be a better starting point (see, e.g. Rusch, 2017).
Open Topic 5 Work on the evolution of traits is vulnerable to the criticism that it is merely telling 'just so stories' (Kipling, 1902), inventing unverifiable creation myths that are not tested against alternative hypotheses. Studies often make no attempt to discuss testable implications that could potentially falsify their theories. Ideas could be borrowed from the evolutionary psychology literature, which has taken steps in this direction.

Conventions -culture embodied in society
An example of culture embodied at a collective level is a system of justice. Even if Alice and Bob fail to tell their son Colm about trial by jury, Colm will still have the right to a trial by jury should he be accused of a serious crime. The right to a trial by jury is a social fact (Searle, 1995) in that its continued existence as an institution relies upon the beliefs of those within a society, but it is a social fact that is robust to the ignorance of some members of society. Moreover, it is a convention (Lewis, 1969) in that it exists as a social fact today because it has existed as a social fact in the past.
Conventions are a powerful tool for explaining stable behavior. Moreover, the properties of and relations between conventions can help to explain which conventions may be most stable in the long run. Examples of the effects discussed in this section are easily found in everyday life as well as in economic data, such as the survey data on conventional crop sharing contracts in Illinois considered in Young and Burke (2001).

Perturbed dynamics and stochastic stability
The approach of Lewis (1969) to conventions was mathematically modeled by Young (1993a) using the methods of Foster and Young (1990) and Freidlin and Wentzell (1984). Players update their strategies according to some adaptive dynamic, but occasionally make errors in strategy choice. The adaptive dynamic will often move society to a convention, but the errors generate the possibility of occasional transitions from one convention to another.
Consider a given family of perturbed adaptive dynamics (Markov processes) indexed by a parameter ε ≥ 0 that measures the size of the perturbations. A convention is a state which is a rest point of the dynamic when ε = 0. When ε > 0, transitions of a society from one convention to another can be studied. The exponential decay rate of a transition probability as ε → 0 is typically referred to as the cost or resistance of the transition. The invariant measures of the processes, µ ε , give the share of time, µ ε (x), that a dynamic spends at any given state x in the long run. Conventions x * that have a non-vanishing probability of being visited as error probabilities approach zero, that is µ ε (x * ) → 0 as ε → 0, are referred to as stochastically stable (Foster and Young, 1990).
The total cost of a path of consecutive transitions is the sum of the costs of the transitions on that path. Consider a weighted, directed graph on the set of conventions that, for any conventions x and y, includes an edge from x to y, the weight of which is the lowest total cost of any path of transitions from convention x to convention y. A spanning tree is a subgraph of this graph that contains no cycles and such that one convention, the root, has outdegree equal to zero, and every other convention has outdegree equal to one. Of all the possible spanning trees rooted at a convention x, consider one that minimizes the sum of edge weights. The stochastic potential of x equals this minimum value. Stochastically stable conventions can be shown to correspond to the conventions with the lowest stochastic potential (see, e.g. Kandori et al., 1993;Young, 1993a).
A typical perturbed adaptive dynamic involves strategy choice by players whose choice rule is composed of an unperturbed dynamic (e.g. best response) together with the possibility of errors (e.g. playing a non-best response).
Uniform errors are such that the cost of every error is the same, typically set equal to 1 with the probability of each error of order ε. Other errors are payoff-dependent and typically have a cost that increases in the payoff loss from making the error. For example, logit errors have a cost equal to the difference between the expected payoff of playing a best response and the expected payoff of playing the error in question. Probit errors have a cost equal to the square of this difference (see Sandholm, 2010).
If there is some ordering on the strategies such that any given player only makes errors that correspond to strategies higher in the ordering than his best response strategy, then we say that errors are intentional (Naidu et al., 2010). For example, if there exist conventions corresponding to strategies {s j } 1≤j≤n and Alice attains higher payoffs at conventions corresponding to higher values of j, then when her best response is s k , it may be that the only strategies that she will play in error are s j , j > k. Experimental evidence on errors is presented in Section 9.4.
The total cost of a path of transitions can be considered as a combination of two factors, (i) the length of the path -the number of errors on the path, and (ii) the steepness of the path -how unlikely the errors on the path are to occur. These ideas are illustrated in Figure 8. In some models, selection by stochastic stability arises from differences between the lengths of paths from one convention to another, whereas in other models it arises from differences in the steepness of such paths. We shall refer to such selection as length-based and steepness-based respectively.

Coordination games
In two strategy, two player coordination games, length-based selection (see Section 4.2.1) works towards risk dominance, as, by definition, it is a best re- The state space is the set of integers from 0 to 11. The only transitions that occur with positive probability are between adjacent states. The cost of a transition, the exponential decay rate of its probability, equals the change in height on the vertical axis if this quantity is positive. Otherwise, the cost is zero. For example, the cost of 9 → 10 is two and the cost of 10 → 9 is zero. The length of the path from x to y is one, as only a single transition on the path, 2 → 3, has strictly positive cost and is therefore an error. However, this transition is relatively steep, with a cost of three. In contrast, the path from y to z has length two as both 6 → 7 and 7 → 8 are errors, but is less steep as each error only has a cost of one, thus the total cost of the path is two. States x and y minimize stochastic potential and are thus stochastically stable.
sponse to play a risk dominant strategy against a population of possible opponents that is divided equally between two possible strategies. Uniform errors do not give steepness-based selection as all errors that occur with positive probability are equally likely. Errors that decrease according to payoff loss relative to best response (such as logit or the class of weakly payoff-dependent mistakes of Klaus and Newton, 2016) also work towards risk dominance via steepness-based selection. Staudigl (2012) considers asymmetric two strategy coordination games in a two population environment. For large population sizes, it is shown that the cost of transitions between conventions can be estimated by the solution to a continuous optimal control problem. This methodology is used to show that, if the two populations are the same size, then under uniform errors or logit choice, the convention corresponding to the risk dominant Nash equilibrium is uniquely stochastically stable. The same can be said for probit choice under a (non-generic) condition on the payoffs of the game. Figure 9: Two player coordination game with heterogeneous preferences. Let γ i , γ j ∈ (0, 1). For each combination of A and B, entries give payoffs for the row player and column player respectively. Note that if γ i < 1 /2 < γ j or vice versa, this game is a Battle of the Sexes.
Neary (2012) discusses the language game, in which players in a finite population play a two strategy, two player coordination game against each of the other players (i.e. interaction is uniform). One type of player prefers one of the coordination outcomes and another type prefers the other (see also Goyal et al., 2017, discussed in Section 3.5). There are three possible conventions, a homogeneous convention corresponding to each strategy and, if the minority type's preferences are sufficiently strong, a heterogeneous convention in which each type plays their preferred strategy. The effect of a range of parameters on stochastic stability is studied. Naidu et al. (2017) consider a similar two population model in which asymmetry is interpreted as due to one of the strategies corresponding to egalitarian language and the other corresponding to inegalitarian language (e.g. 'tu' vs. 'vous' in French). One of the populations has high status and does better than the other, low status, population under the inegalitarian language. They show that under uniform intentional errors (see Section 4.2.1 for definition and Sections 6.4, 9.4 for further discussion), the inegalitarian convention may be stochastically stable if the low status population is large enough relative to the high status population. Belloc and Bowles (2013) uses a similar model to explain the persistence of inferior cultural conventions, building on previous work that considers the role of conventions in sustaining poverty traps and inequality (Bowles, 2005(Bowles, , 2006b. Neary and Newton (2017) also consider two strategy, two player coordination games, and allow players on an arbitrary interaction network to have individual specific payoffs for each coordination outcome. When player i and j interact their payoffs are given by the game in Figure 9. The concept of autonomy governed by a potential function (Young, 2011, see Sections 2.4, 8.1.3) is used to make statements about long run behavior. In particular, a class of networks, corpulent graphs, is identified such that, for large enough networks in this class, random diversity in ordinal preferences will nearly always lead to heterogeneity in behavior at stochastically stable states, regardless of the cardinal strength of the preferences. Sawa and Wu (2017) analyze a dynamic in which players in a population make choices according to reference dependent utility, with losses relative to the reference point weighted more heavily than gains (Kahneman and Tversky, 1979).
When the dynamic is perturbed by uniform errors, it is shown that, for essentially any model of endogenously determined reference points, if a symmetric two strategy, two player coordination game has a super dominant strategy, defined as a strategy that is both maximin and payoff dominant (hence also risk dominant), then the state at which every player plays this strategy is uniquely stochastically stable. This is because, when we consider the payoffs of a two by two matrix under the reference dependent payoff transformation, a payoff dominant and maximin strategy is risk dominant after the transformation for any given reference point. That is, as long as half the population plays this strategy, it will be a best response to play this strategy, no matter which reference point is used. Sawa and Wu (2016) consider a similar model, with similar intuition, that uses a weaker condition than super dominance, loss dominance (maximin and risk dominant), proving their result for a smaller class of preferences. Bilancini and Boncinelli (2016) compare stochastically stable conventions in two strategy, two player coordination games for (i) uniform errors, (ii) payoffdependent errors (like logit and probit), in which the cost of an error is an increasing function f (·) of the loss in expected payoff to the player making the error, and (iii) condition-dependent errors, in which the cost of an error is an increasing function g(·) of the last realized payoff of the player making the error. The effects of condition-dependent errors on steepness-based selection vary. Initial errors are hardest from payoff dominant conventions, but subsequent errors will also depend on payoffs off the main diagonal, with a higher probability of errors by players who are not playing the maximin strategy. Now, consider a model of random matching with varying probabilities of match termination. When players rematch infrequently, it is always a best response to coordinate with one's current partner, so rematching can lead to the spread, without the aid of errors, of any action that is present in the population. Consequently, transitions from one convention to another only require a single error. In such cases, selection is then steepness-based. For the coordination game in Figure 10, this is illustrated in Figure 11. At the opposite extreme, in the standard model of rematching every period, as discussed earlier in this section, length and steepness-based selection favour risk dominance under uniform and payoff-dependent errors. For conditiondependent errors, length and steepness can work in opposing directions. Figure  12 illustrates these effects. (i) Uniform errors.
(iii) Condition-dependent errors. Figure 12: Stochastic potential under perturbed best response for coordination games under rematching every period. The game in Figure 10 is played by a population of size n = 8. Updating players maximize their expected payoff over all possible opponents and the threshold between basins of attraction is approximated by n b−d a−c+b−d as population size n becomes large. In Panel (i), as all errors have equal steepness, the convention with the longer basin of attraction, the risk dominant convention, is stochastically stable. In Panel (ii), the path exiting the risk dominant convention is not only longer, but also steeper, so it remains stochastically stable. In Panel (iii), for large enough populations, we can ignore the g(a) and g(b) terms, so that stochastic stability is determined by comparing g(c) b−d a−c+b−d and g(d) a−c a−c+b−d . In this example, the difference in steepnesses g(c) and g(d) is large enough that steepness dominates length and the maximin convention is stochastically stable.
There also exist results on stochastic stability in coordination games with an arbitrary number of strategies. Stochastically stable conventions of Nash demand games are related to subsets of the core of cooperative games and are discussed in Section 6.1. Stochastically stable conventions of coordination games with zero payoff for miscoordination are related to bargaining solutions and are discussed in Section 6.4.

Communication and language
There has been considerable prior work on the evolution of language and efficient communication. Examples include Blume (1998); Blume and Arnold (2004); Blume et al. (1998Blume et al. ( , 2001Blume et al. ( , 1993; Hurkens et al. (2003); Kim and Sobel (1995); Schlag et al. (1993Schlag et al. ( , 1994; Sobel (1993). More recently, Heller (2014a) considers coordination games with a unique efficient strategy profile and a round of cheap talk before the game is played. In such settings, Demichelis and Weibull (2008) showed that when there exist messages with preexisting literal meaning corresponding to elements of the strategy set and also lexicographic lying costs (so that if the expected payoff from a truthful message and a false message are equal, a player prefers to tell the truth than to lie), then the unique stable outcome is for the efficient strategy profile to be played. Heller (2014a) shows that the interpretation of this as relating to 'small' lying costs is problematic. Specifically, continuous lying costs are considered and it is shown that if these costs are small enough then inefficient equilibria perist. The method is to show that the game without lying costs has an inefficient symmetric equilibrium σ * that assigns positive probability to every pure strategy that is a best response to itself, therefore nearby games (with small lying costs) have a nearby Nash equilibrium with the same support. As every message is used in equilibrium and the equilibrium is constructed so that lower payoffs are achieved when the same message is sent by both players, any invading mutant types attain low payoffs when playing one another, so σ * is evolutionarily stable.

Best shot and minimum effort games
Boncinelli and Pin (2012) consider stochastic stability in best shot network games. Players interact on a network and each can contribute or not contribute. For a given player, if none of his neighbors contribute, then his best response is to contribute. Otherwise, his best response is to not contribute. The game is effectively a model of threshold public good provision with a threshold of one. It is shown that, if the only type of errors in the model are errors which switch players from not contributing to contributing (each occurring with probability of order ε), then the stochastically stable states are the Nash equilibria of the static model. This is because there always exists a transition path from any Nash equilibrium to any other Nash equilibrium that, for each transition between consecutive Nash equilibria on the path, only involves a single error. In the words of Samuelson (1994), the set of Nash equilibria is a mutation-connected component. If, instead, the only type of errors in the model are errors which switch players from contributing to not contributing, then the stochastically stable states are the Nash equilibria of the static model with the maximum number of contributors. If errors in both directions are permitted at the same rate, then the results of the first case apply. Note that the directional restrictions on possible errors make them similar to uniform intentional errors as defined in Section 4.2.1, with further theoretical results described in Section 6.4 and empirics in Section 9.4.
A minimum effort game is at the opposite end of the public goods spectrum from a best shot game. Whereas in the best shot game only a single person must contribute for a good to be provided, in the minimum effort game everyone must contribute. Specifically, players each make some level of costly effort and the payoff to a player is the minimum level of effort taken by any of the players with whom he interacts, minus the cost of his own effort. There have been a few papers in recent years (Alós-Ferrer and Weidenholzer, 2014;Angus and Masson, 2010;Cui and Wang, 2016;Khan, 2014) that consider dynamic processes and minimum effort games when the set of players with whom a player interacts is not the same as the set of players he observes when he chooses his strategy. Specifically, a player may imitate the strategy of successful players with whom he does not interact. Thus, small sets of interacting players who play high effort levels can 'reproduce' themselves (i.e. their strategies) by being observed to obtain high payoffs. Similar effects can be obtained in other games, such as prisoner's dilemmas (Rivas, 2013). This is effectively a story of assortativity (see Section 3) and group selection. Individual strategies that are played in successful groups are imitated. In terms of group selection, groups of players who obtain high average payoffs (e.g. a group of cooperators who interact mainly with each other) expand via selection at a group level, even though they may be outperformed in interactions with those outside the group (e.g. against defectors). -Redondo (1997) showed that in a Cournot oligopoly game, if players imitate (with uniform errors) the strategy of the player who is achieving the highest current payoff, then production of the Walrasian equilibrium quantities, and not the Nash equilibrium quantities, is uniquely stochastically stable. This is due to spite effects, as starting from Nash equilibrium quantities, if firm i makes a single error and increases its quantity, this reduces the profits of firm i but reduces the profits of the other firms even more. Hence firm i, being relatively more successful, will be imitated the following period. Alós-Ferrer (2004) considers an amendment to the model whereby firms instead imitate the strategy that achieved the highest payoff in the previous m periods. That is, firms play the game with players in the present, but may imitate players in the past. In this model there can be a multiplicity of stochastically stable quantities in between the Nash equilibrium and Walrasian quantities. The reason for this multiplicity is that memory is assortative in the sense that strategies at period t generated the payoffs they generated due to other strategies at period t. If every player has played the Nash equilibrium quantity for as long as they can remember, and a player makes an error and plays a higher quantity, this will reduce the payoffs associated with the Nash equilibrium quantity in that period, but will not affect the payoffs associated with the Nash equilibrium quantity in previous periods (a similar assortativity across time is generated in a best response setting by a variant of fictitious play in Marden et al., 2009, discussed in Section 8.4). This makes quantities lower than the Walrasian quantity more robust in the presence of memory. Alós-Ferrer and Shi (2012) show that this effect disappears if at least one player has no memory of any strategy profile other than the current one. Such a player is immune to the lure of the payoffs of the past. 4.2.6. Prisoner's dilemmas Weibull and Salomonsson (2006) consider symmetric, two player games. Individuals' fitnesses are given by a function of both their own payoff in the game and their opponent's payoff. This sometimes leads to evolutionary stability of behavior that is not a Nash equilbrium of the game. In particular, consider an extensive form game in which players' play a prisoner's dilemma, following which, a player who has cooperated and his opponent defected, has the opportunity to pay a cost to punish his opponent. There is a set of rest points at which all individuals are cooperators and a sufficient number punish defection. The size and stability of this set is considered under a variety of different fitness functions, under which an individual's fitness can increase or decrease in the payoff of his opponent. A similar extensive form model with rewards and punishments, in which fitness is considered in equilibrium and there is no explicit transformation of payoffs into fitnesses, is considered by Herold (2012), discussed in Section 2.4. Heller and Mohlin (2017b) consider randomly matched pairs of players who play a prisoner's dilemma with sum of payoffs maximized at (C, C) (i.e. γ < 1+β in Figure 3) against one another. Each player in a matched pair observes a random sample of past play by their opponent against other opponents. Strategies map these observations to distributions over actions. Perturbations of the environment are considered in which some small fraction of the population comprises commitment types, where each commitment type always plays a given strategy and at least one commitment type plays a totally mixed strategy. A steady state of the unperturbed environment is said to be strictly perfect evolutionarily stable if is a limit of evolutionarily stable states of the perturbed environment as → 0. It is shown that if π(D, C) − π(C, C) > π(D, D) − π(C, D), corresponding to γ > 1 in Figure 3, then the only strictly perfect evolutionarily stable state is for players to always defect, as by the assumed inequality, defecting can only discourage other players from defecting against you. However, if the opposite inequality holds, then full cooperation can be sustained as a strictly perfect evolutionarily stable state, as equilibrium strategies can then be such that defection increases the probability that future partners will defect against you. There is an essentially unique way in which such cooperation can be maintained, with some members of the population defecting if they observe their opponent defecting at least once, and the remainder of the population defecting if they observe their opponent defecting at least twice.

Culture embodied in individuals
An example of culture embodied at an individual level is the ability to cook Yorkshire pudding. Alice and Bob can teach this skill to their son Colm. Yorkshire pudding will then persist as a cultural phenomenon as long as enough individuals know how to make it. Montgomery (2010) considers the model of intergenerational cultural transmission of Bisin and Verdier (2001). Individuals (within a continuum population) have a cultural type (from a finite set of types) that they have some probability of passing directly to their offspring via direct socialization. If that fails, then the offspring adopts a trait at random from the population. Individuals of type i choose the level of direct socialization for their offspring, trading off a quadratic cost against a desire to maximize a weighted probability over values V ij , where V ij represents the value that an individual of type i places on having an offspring of type j. The cultural distaste of an individual i for trait j is given by Montgomery (2010) shows that this model simplifies to a replicator dynamic on a game where ∆ ij is the payoff from playing strategy i against strategy j. When the number of types in a population is more than two, it is possible that the most tolerant types (low cultural distate) can go extinct. For even higher values of n, multiple equilibria and limit cycles are observed. Cheung and Wu (2018) consider the above model adapted for a continuous set of types T = [0, 1], assuming that V ij and hence ∆ ij are continuous in i and j (which implies that the assumption that T is one dimensional is meaningful). It is shown that monomorphic states are not Lyapunov stable. Furthermore, if ∆ ij is a strictly increasing function h of |i − j|, more can be said. If h is convex, then the unique Nash equilibrium (which is Lyapunov stable) is composed of the two extreme types 0 and 1. In fact, all rest points of the dynamic are composed of two types with equal shares of the population, but unless these types are the extreme types, these rest points are not stable. If h is strictly concave, then the abovementioned state with the two extreme types is no longer a Nash equilibrium, although it remains a rest point. Some rest points under concave h have more than two types present.

Interaction of culture embodied in individuals and society
The desire of Yorkshiremen to teach their offspring about Yorkshire pudding is a convention. We can easily imagine an alternative Yorkshire in which Lancashire hotpot is a local cuisine. This shows that sometimes individual behaviors will be socially enforced and a cultural phenomenon embodied partly at a collective and partly at an individual level.
A model that combines culture embodied at the individual level (religiosity versus secularism) and culture embodied at a societal level (equilibrium levels of wearing the Islamic veil) is that of Carvalho (2012). There are two types, religious and secular types, in a continuum population. Individuals have the option to veil themselves to some degree (between 0 and 1). Veiling is a commitment device to avoid temptation. Secular types would like to give in to temptation, but religious types would like to resist temptation. Individuals care not only about their own behavior, but also about how others view their behavior. In equilibrium, religious types veil more than secular types. Individuals can also choose to spend on religious education to increase the probability that their offspring will be religious. Religious types may wish to do this, so that their offspring will adopt a higher degree of veiling and be less likely to be tempted. Regulations that mandate (typically at 0 or 1) some degree of veiling thus lead to a reduction in religiosity, as the transmission mechanism from religiosity to behavior via veiling is removed. However, the degree of veiling that is mandated will still determine how often people give in to temptation. Moreover, it is shown that if there is an option for types to segregate themselves and avoid temptation, then banning veiling can lead to a higher degree of religiosity, as individuals seek to increase the probability that their offspring will be religious, in order that they will segregate and avoid temptation. Note that voluntary segregation is a method of endogenously determining assortativity along the lines discussed in Section 3.2.

Macroeconomics, market selection and finance
Chakrabarti and Lahkar (2017a) give a model in which firms choose input levels and production depends on aggregate input, with each firm's share of production equal to their share of total input. Each firm's payoff is given by its production minus a strictly convex input cost. The aggregate production function is concave, so input by any given firm exerts a negative externality on other firms by reducing the average return on input. This model has a potential function, hence play converges to an equilibrium under the (continuous population) logit dynamic (Hofbauer and Sandholm, 2007). Due to the negative externalities in the model, equilibrium production is inefficiently high. As a consequence of this, it may occur that while the process is converging to the (unique, in this model) equilibrium, aggregate payoffs of the firms may rise before falling back down. This can occur due to states that are passed through en route to the equilibrium being closer to efficient production levels than the equilibrium is. The production function itself has a technology parameter and it is shown that when this parameter is changed, thus changing the equilibrium, boom-bust patterns of the type just discussed can be generated. Norman (2017) analyzes a variant of the model of Blume and Easley (2006, with a correction later provided by Massari, 2013), in which there are many possible paths of the economy (sequences of states of nature); there is a true probability measure over these paths; and consumers have differing beliefs over these paths. The model is one of general equilibrium. Norman (2017) assumes that the probability measure over the true state of nature in period t + 1 is determined by the consumption shares of the consumers in period t. The beliefs of consumers are bound by the same restriction. It is shown that a perfect-foresight equilibrium, in which there exists a consumer with rational expectations who accounts for all of the consumption in the economy, need not be neutrally or Lyapunov stable, as a change in consumption shares can lead to a change in the state probabilities. However, if the process is ergodic, then there exists an invariant probability measure over states in the distant future, over which belief selection can occur. These results are used to explore the stability of inflation paths (constant versus increasing or decreasing inflation) and liquidity traps in macroeconomic examples. For further work on market selection of beliefs, see Massari (2015Massari ( , 2017; Sandroni (2000Sandroni ( , 2005, and for the related literature on the market selection of rules of portfolio choice see Alós-Ferrer and Ania (2005) Foster and Young (2003) study a process of learning to approximate Nash equilibria in repeated games by hypothesis testing. Norman (2015) extends this to a model of learning rational expectations equilibria in a macro-style environment. Agents hold models of the world, given by sets of parameters, and given his model of the world, an agent will play a smoothed response that is close to a best response. Given play (data), every so often an agent will submit his model to a hypothesis test. If his hypothesis test rejects the model, he will randomly adopt another model and henceforth play accordingly. If agents' responses are close to best responses, the hypothesis tests are sensitive enough, and enough data is used for the hypothesis tests, then, in the long run, the process spends most of the time close to a rational expectations equilibrium in that agents' predictions are close to the actual outcomes and agents' responses are close to optimal.
Cho and Kasa (2014) analyze a similar model to Norman (2015), but focus on selection amongst multiple models (corresponding to equilibria) using stochastic stability, although they do not explicitly mention stochastic stability except for a reference to Kandori et al. (1993). They find the costs of transitions that escape the basins of attraction of equilibria. These escapes correspond to model rejection events after which any alternative model may be chosen with positive probability. Hence, a model which has the uniquely highest cost of escape will be uniquely stochastically stable. Freidlin and Wentzell (1984) tree arguments are not required.

Industrial organization
Since the 1990s there has been a steady flow of evolutionary work related to pricing and competition. Examples include Alós-Ferrer et al. (1999); Alós-Ferrer and Ania (2005); Alós-Ferrer et al. (2000); Tanaka (1999); Vega-Redondo (1997). Some recent work has studied the selection of market institutions (Alós-Ferrer and Kirchsteiger, 2010; Alós-Ferrer et al., 2010) and for a flavor of this see the discussion in Section 3.2.3. Recent work has also considered the impact that adaptive multi-agency decision making within organizations can have on corporate culture . This relates to the relationship between collective agency and coordination problems and has been discussed in Section 2.2.1. Lahkar (2011) analyzes the stability of equlibria in the Burdett and Judd (1983) model of price dispersion and shows the emergence of price cycles. There are populations of buyers and sellers, each of which can be represented by the unit interval. Strategies of sellers are prices in the interval [0, 1]. The set of possible prices is discretized by Lahkar (2011) to simplify analysis. A strategy for a consumer is a number of sellers to sample. Consumers sample sellers independently and each consumer chooses to buy from the seller in her sample that offers the lowest price. Some Nash equilibria of the model involve sellers charging multiple prices. Building on the analysis of Hopkins and Seymour (2002), who show instability of equilibria in multiple prices under the replicator and similar dynamics, it is shown using simulation that, under logit dynamics, rest points that approximate multi-price Nash equilibria of the original model are unstable, and that play converges to limit cycles. This is shown analytically for the special case in which buyers can only sample one or two sellers. In a similar model, Chakrabarti and Lahkar (2017b) let sellers choose a level of technology instead of a price, with buyers sampling multiple sellers and choosing the one which offers the highest technology. Buyers produce an output equal to the technology level that they sample. Results are very similar to those of Lahkar (2011), but can instead be interpreted to show the emergence of cycles in productivity. Dawid and Hellmann (2014) consider a setup in which firms choose which other firms to form R&D partnerships with, and payoffs are then given by the Nash equilibrium of a Cournot game. Each R&D partnership in which a firm is involved costs a fixed amount f and reduces the firm's marginal cost of production by a fixed amount. Note that firms care not only about how many partnerships they are involved in, but also in how many partnerships their partners are involved in, as their profits are affected by their partners' marginal costs. The network of partnerships is updated according to the perturbed pairwise better response dynamic of Jackson and Watts (2002), which is effectively the dynamic of Roth and Vande Vate (1990) augmented with random errors in the style of Young (1993a) (see Section 2.2.2). The stochastically stable states of this process are shown to have one completely connected component of the network in which every firm is connected to every other firm, with the remainder of firms being isolated and not involved in any partnerships. The size of the connected component decreases monotonically in the cost of a partnership f . A comparative static analysis shows that total profits in the industry are nonmonotonic in f . As f increases, profits decrease until the connected component shrinks and profits increase (discontinuously) due to the reduced total cost of partnership forma-tion. The discontinuity in the comparative statics in f implies a coordination failure at values of f just below the values at which total industry profits show a discontinuous increase. The dynamic is a coalitional better response dynamic in which coalitions are restricted to sizes 1 and 2, raising the question of whether the possibility of larger coalitions (e.g. of size 4) might smoothen some of these discontinuities.

THE EVOLUTIONARY NASH PROGRAM
The Evolutionary Nash Program is the study of connections between evolutionary game theory and cooperative game theory (see Figure 13). This is similar to the standard Nash Program which studies connections between noncooperative game theory and cooperative game theory (see, e.g. Nash, 1953). A cooperative game in characteristic function form is a set of players N and a characteristic function v(·) : P(N ) → R + from subsets S ⊆ N to the nonnegative real numbers. The characteristic function shows how much surplus a coalition S ⊆ N can generate. It is typically assumed that . Note that supermodularity implies superadditivity. The core is an allocation of surplus x = (x i ) i∈N under the grand coalition N , such that i∈N x i = v(N ) and no coalition S can obtain a higher surplus by leaving the grand coalition: i∈S x i ≥ v(S) for all S ⊆ N . A coalition for which i∈S x i < v(S) is known as a blocking coalition.

Recontracting and Nash demand games
The question of how a society represented by a cooperative game might converge to the core is an old one. Feldman (1974) and Green (1974) give recontracting processes under which subsets of players S ⊆ N randomly meet, and if they can do better than they do under the current allocation x, that is if i∈S x i < v(S), then they form a coalition with a new allocation x S such that i∈S x i = v(S). These processes eventually reach the core, assuming that the core is nonempty. Note that any such process suffers from the remainder problem: when a subcoalition S leaves a coalition S, it is not clear what should happen to players in S \ S . Should they be left in a coalition on their own as in Green (1974)? Should they form singleton coalitions as in Feldman (1974)? Each of these makes sense in some context. For example, if eleven people constitute a discussion group, S, and two of them, S , decide to go climbing instead, it makes sense for the discussion group to continue to meet as S \ S . However, if S instead formed a football team, the remainder set S \ S no longer has enough members to form a football team, so it may make more sense for them to become singletons.
The Evolutionary Nash Program links...

Evolutionary Game Theory
Cooperative Game Theory for example...

Rest points of an unperturbed dynamic
The Core Stochastically stable states of a perturbed dynamic Subset of the Core E.g. the Least Core Figure 13: The Evolutionary Nash Program. Connections are made between evolutionary game theory and cooperative game theory. For example, sometimes a state space can be derived from an underlying cooperative game. For some evolutionary dynamics, the rest points will correspond to the core of the associated cooperative game. When these dynamics are perturbed, the stochastically stable states will then correspond to a (possibly strict) subset of the core.
Young (1993b) represents a two player cooperative game (N, v(.)) noncooperatively as a Nash demand game, in which the players each make a demand, and if the demands sum to no more than v(N ), then they obtain their demands. If the demands sum to more than v(N ), then they get nothing. Players have a utility function u(·) that increases in their allocation. Note that, because payoffs arise from individual strategies, the remainder problem disappears. It is shown that if players adjust their demands according to an individualistic best response dynamic with uniform errors, then the stochastically stable state corresponds to the Nash bargaining solution (Nash, 1950), which is within the core. Agastya (1997Agastya ( , 1999 expands the model of Young (1993b) to a multiplayer environment, in which a player i ∈ N may (sometimes probabilistically) obtain his demand if there exists some S ⊆ N , i ∈ S, such that the demands of the players in S sum to no more than v(S). For games with a supermodular characteristic function, convergence to the core occurs under an individualistic best response dynamic. When this dynamic is perturbed with uniform errors, the stochastically stable allocations are those within the core that minimize the allocation of the wealthiest player. Rozen (2013) adjusts strategies in the Agastya (1997) setup by allowing a player's strategy to include not only a demand, but also a list of players with whom he is willing to form coalitions. Under this setup, results of Agastya (1997) do not change. However, this change in the strategy space makes possible another approach, which is explored in Newton (2012b). Newton (2012b) implements collective agency (in the style of Feldman, 1974;Green, 1974) with individual strategies (in the style of Agastya, 1997;Rozen, 2013;Young, 1993b). This allows the separation of collective agency in decision making from collective generation of payoffs. For example, two players could meet and take a collective decision to not form coalitions with one another, that is to exclude one another from their sets of acceptable coalition partners. Examples along these lines are quite easy to come up with. For example, if two employees in a firm find it difficult to work with one another (perhaps their skill sets overlap too much), they may agree to request that they never be part of the same team. This would be payoff improving for both of them. The return to collective agency (compared to the individualistic rules of Agastya, 1997Agastya, , 1999Rozen, 2013;Young, 1993b) allows convergence to the core to occur for games with a superadditive characteristic function. Moreover, when the dynamic is perturbed by uniform errors, the stochastically stable state is no longer the minmax outcome as it is in Agastya (1999). Rather, when u(·) is concave and u(·) /u (·) is convex, the stochastically stable state trades off maximizing the wealth of the poorest player against minimizing inequality amongst the remaining |N |−1 players. In the boundary case of u(x) = ax b , a, b > 0, only the first of these considerations matters, so stochastic stability results in 'Rawlsian' social choice, maximizing the wealth of the poorest player. Arnold and Schwalbe (2002) give a best response dynamic in which the state includes both a set of existing coalitions and demands which are satisfied for players in coalition S if they sum to no more than v(S). This reintroduces the remainder problem, which is treated in the manner of Green (1974). The dynamic is individualistic, with an updating player choosing a demand and to either join an existing coalition or to form a singleton coalition. Superadditivity is not assumed. However, to obtain convergence to the core, it is assumed that outside of the core, the strategy choice of players in any potential blocking coalition is perturbed by uniform errors, and that no errors occur within the core. Nax (2018) obtains convergence to the core, where it exists, treating demands as aspirations, so that players do not best respond but instead may lower their demands when they are not fulfilled and increase their demands when there is the possibility of a higher payoff. The state includes a set of existing coalitions and the remainder problem is treated in the manner of Feldman (1974). The proof is similar to the proof of convergence in Feldman (1974) in that this assumption is leveraged so that payoffs, and consequently demands, can reduce over time until a jump to a core state is possible.
All perturbations discussed so far in this section are uniform, so stochastic stability emerges from length-based selection (see Section 4.2.1 for definition).  uses logit errors and steepness-based selection. This recovers minmax selection by a different route. In Agastya (1999), all errors are equally likely and the stochastically stable state is minmax as it is the wealthiest player who changes his response most readily to errors made by others. In Newton (2012b), the wealth of the wealthiest |N | − 1 players matters (and hence, by omission, the wealth of the poorest player) as these players change their collective response most readily to errors made by the poorest player. In , the stochastically stable state is minmax as it is the wealthiest player who makes errors most easily. Like Newton (2012b),  has collective agency in the dynamic, specifically the coalitional logit choice rule discussed in Section 2.1.2, but returns to having a set of existing coalitions included in the state of the Markov process, hence reintroducing the remainder problem, which is treated in the manner of Feldman (1974).
Following work on coordination games discussed in Section 6.4, Hwang and Rey-Bellet (2017) have recently analyzed the two player model of Young (1993b) under logit errors, also finding the Nash bargaining solution to be stochastically stable. They obtain these results by extending the technical results of Hwang and Newton (2017, see Section 8.1.1) to coordination games that satisfy the marginal bandwagon property of Kandori and Rob (1998, see Section 8.1.1 for definition). They also give results under intentional errors (see Section 4.2.1 for definition).
The above approach deals with recontracting with transferable surplus. In contrast, Serrano and Volij (2008) consider a model of recontracting over discrete goods -a 'housing economy'. Each player is endowed with a single house (the endowment is fixed and does not evolve with the process) and may only have a single house at any allocation. Every period, some set of players S ⊆ N may agree to reallocate their endowment in such a way that they all strictly gain relative to the current allocation. If, when this happens, the allocation of players outside S can no longer be satisfied (i.e. it depended on the endowments of players in S), then the players outside S are allocated their endowments (this is an approach to the remainder problem). Errors involve players agreeing to reallocations that are not a strict improvement for them upon the existing allocation. It is assumed that errors where a player is indifferent between the old and new allocations are much more likely than errors which involve a strict payoff loss. The only allocation for which there does not exist S ⊂ N such that a weakly improving reallocation is possible is the competitive equilibrium allocation (Roth and Postlewaite, 1977). It follows that the competitive equilibrium allocation is uniquely stochastically stable.
Open Topic 6 Skyrms (2002) finds that in (discretized) Nash demand games, the basin of attraction of the egalitarian norm increases greatly when costless signals (in the style of Lewis, 1969) are added to the strategy space. It is not clear why this is the case, particularly as signaling gives players the opportunity to coordinate on different asymmetric demand splits with different types. Analysis of simulation data suggests it is related to the effect of transient information: the information transmitted by signals first grows by evolutionary selection, but later fades as the system approaches a stable state. Uncovering the mechanisms by which such information benefits some norms over others would be interesting and important.

Transferable utility matching -the assignment game
A special case of a cooperative game is the assignment game (Shapley and Shubik, 1971). The player set can be divided into two disjoint sets, say N = F ∪W (firms and workers), and all surplus is generated by pairs of players in which one player is in F and one is in W . That is, for any |S| = 2, v(S) > 0 implies that S = {i, j} for some i ∈ F and j ∈ W . For |S| > 2, v(S) is equal to the maximum amount of surplus that can be generated by such pairs in S.
Several studies have recently shown that, under pairwise dynamics (see Section 2.2.2) of rematching and surplus sharing, convergence to the core of the assignment problem is assured (Biró et al., 2013;Chen et al., 2016;Klaus and Payot, 2013;Nax et al., 2013). A typical dynamic involves two agents meeting every period, and if they can improve upon their current allocation by matching with one another and dividing the resulting surplus, they do so. Nax and Pradelski (2015) consider a perturbed version of such a dynamic. A player's payoff may be subject to a 'shock' with a probability that, like logit errors, is log-linear in the size of the shock. If a player, following a shock, has a payoff lower than that which he could achieve by a change of partner, then he can change his partner. This combination of shock and response adds a discrete element to the model, bringing it closer to non-transferable models of matching. Using arguments adapted from Newton and Sawa (2015), Nax and Pradelski (2015) show that, under this process, the set of stochastically stable states is a subset of the least core (Maschler et al., 1979), the set of allocations such that the minimum value of v(S) − i∈S x i over all S ⊂ N is as high as possible. For the assignment game, the core is nonempty, and it is clear that the least core lies within the core. Similar results for many to one matchings with transferable utility are given in Nax and Pradelski (2016). Klaus and Newton (2016) analyze a different form of perturbation, allowing errors whereby the agents in a given pair remain matched yet adjust the allocation they obtain within the pair. As we shall see below, this weakens the selective power of stochastic stability relative to the model of Nax and Pradelski (2015). Errors are changes to matches and allocations that do not constitute a weak blocking by players who make them. This allows the possibility of errors in which no error-committing player loses allocation, such as when two players, previously matched to other partners, match and share surplus in such a way that they attain exactly the same allocation as before. This is not a weak blocking and is therefore an error. This is similar but not identical to the approach of Serrano and Volij (2008) to errors involving indifference discussed in Section 6.1. Error probabilities are assumed to be weakly decreasing in loss of allocation. If all errors are equally likely, then no selection within the core is obtained. If errors which cause no allocation loss to the player or players who commit them are more likely than errors which cause strictly positive allocation losses, then all optimal matchings occur in some stochastically stable state, but there may be strict selection of allocations amongst players who have the same partner at every optimal matching. Pradelski (2015) considers the speed of convergence to the core of an assignment game and finds a decentralized dynamic that attains this convergence in polynomial time. Each player has a changing aspiration level of utility that he seeks to meet. The result on polynomial time convergence relies on 'market sentiment', by which at any point in time, players on one side of the market (i.e. either F or W players) can accept allocations slightly below their aspiration level. Interestingly, the result only holds if market sentiment does not switch too frequently from favoring one side of the market to favoring the other side of the market. Further work that uses such state variables can be found in Section 8.4. In contrast, Leshno and Pradelski (2017) show that, if transitions from a given matching and allocation are such that any rematching of a firm and a worker (i) depends only on their allocations at the existing matching and (ii) is a strict blocking, then the expected time to converge to the core may increase exponentially in the number of players.

Non-transferable utility matching -marriage, college admissions
Now consider matching with non-transferable utility (Gale and Shapley, 1962). That is, the payoffs of a player depend solely on the identity of those with whom he is matched. This class of problems includes the marriage problem, in which men and women are matched in pairs; the roommate problem, in which players of any sex are matched in pairs; and the college admissions problem, in which colleges are matched to multiple students, whilst each student is only matched to one college. A college is said to have responsive preferences if its preferences over any two students are independent of the other students to whom it is matched. A college is said to have substitutable preferences if, whenever it would admit a student i from a set of students A that contains i, it would also admit student i from set B ⊂ A, i ∈ B. A matching is pairwise stable if no player can gain by leaving a current partner, and no pair of players who are not currently matched to one another can gain by matching with one another. The set of pairwise stable matchings corresponds to the core in the marriage problem, roommate problem, and college admissions problem when colleges have responsive preferences.
Paths to stability results in such settings give, sometimes implicitly, pairwise dynamics (Section 2.2.2) under which convergence to stable matchings occur. Such results exist for the marriage problem (Roth and Vande Vate, 1990), variants thereof (e.g. Klaus and Klijn, 2007), and the roommate problem (Diamantoudi et al., 2004). Such dynamics work by randomly selecting a player or pair of players each period. If an individual is selected and can gain by leaving his or her current partner (if he or she has one) to become a singleton, then he or she does so. If a pair of players is selected and both players in the pair can gain by leaving their current partners (if they have them) and getting together, then they do so. The process is repeated until a stable matching is obtained. For college admissions problems, if a college has responsive preferences, then the problem reduces to the marriage problem by mapping each position in a college to a man, and each student to a woman. Kojima andÜnver (2008) give a paths to stability result for many to many matchings when one side of the market has responsive preferences and the other has substitutable preferences.
The next step is to consider perturbed dynamics. Jackson and Watts (2002) and Klaus et al. (2010) do this for uniform errors and find no selection in marriage and roommate problems respectively. That is, the entire set of stable matchings is stochastically stable. The reason for this is that, under the unperturbed dynamic, from any unstable matching, it is possible to get closer (under a suitable concept of similarity) to any stable matching of one's choice. This implies that it only ever takes one random error to exit the basin of attraction of a stable matching and eventually reach a stable matching that is closer to any given target stable matching. Freidlin and Wentzell (1984) tree arguments immediately imply that every stable matching must be stochastically stable.
These arguments are taken further by Newton and Sawa (2015), who, for marriage problems, roommate problems and college admissions with responsive preferences, expand the analysis to cover any perturbed dynamic (including uniform, logit, probit) for which error probabilities of an individual or a pair do not depend on the partnerships of other players. Let OS (for 'one shot') denote the set of stable matchings at which the probability of an error occuring is lowest. It is shown that, for any given z ∈ OS, from a stable matching x / ∈ OS, following a single error, no matter what that error is, it is possible under the unperturbed dynamic to reach a stable matching y that is more similar to z than x is to z. This implies that the root of any least cost Freidlin and Wentzell (1984) tree and hence any stochastically stable state will be within OS. In particular, under logit errors, similarly to Nax and Pradelski (2015) discussed above, stochastically stable states are within a non-transferable utility version of the least core of Maschler et al. (1979).
Open Topic 7 Some interesting characterizations or examples may arise from dropping Assumption 3 of Newton and Sawa (2015) and allowing error probabilities to depend on the partnerships of players other than those making the error. For example, it may be that Alice and Diane are friends and that Alice is more likely to make errors when Diane is in an unhappy partnership and has a low current payoff.

Bargaining solutions and coordination games
Consider a two player coordination game, the main diagonal of which approximates the efficient frontier of a convex bargaining set, and that has zero payoffs off-diagonal. Let there be two populations, each corresponding to one of the positions in the game. Under an individualistic best response dynamic with uniform errors, Young (1998a) shows that stochastically stable states approximate the Kalai-Smorodinsky bargaining solution (Kalai and Smorodinsky, 1975). Newton

Unintentional Intentional
Uniform Kalai-Smorodinsky Young (1998a) Nash bargaining (2012a) extends the model to a coalitional dynamic and finds that stochastically stable states approximate the Nash bargaining solution (Nash, 1950). Naidu et al. (2010) consider errors which are intentional (see Section 4.2.1) in that the only errors that a player might make are those that involve attempting to coordinate on an outcome that corresponds to a higher equilibrium payoff than the equilibrium payoff associated with his best response. Players never make errors that ask for less than their best response. Under intentional uniform errors, the stochastically stable states again approximate the Nash bargaining solution. Note that under both unintentional uniform errors and intentional uniform errors, all errors in the support of the error distribution occur with similar probability, so there is no steepness-based selection and results are driven by length-based selection (see Section 4.2.1).
Using the technical results of Hwang and Newton (2017, see Section 8.1.1),  extend the analysis to logit dynamics. Logit errors introduce steepness-based selection as well as length-based selection. Under intentional logit errors, it turns out that length-based selection is dominated by steepnessbased selection, and given that players who earn higher payoffs have more to lose from making errors and miscoordinating, stochastically stable states approximate the Egalitarian bargaining solution of Kalai (1977). This contrasts with previous justifications of Egalitarianism which have usually assumed some symmetry in the problem faced (Alexander and Skyrms, 1999) or invoked ex-ante symmetry of players with respect to their position in the game (Binmore, 2005(Binmore, , 1998. Under unintentional logit errors (i.e. the standard logit choice rule), length-based and steepness-based selection combine and a new (piecewise) bargaining solution, the logit bargaining solution, emerges. This solution shares features of some of the existing bargaining solutions, but exhibits some curious nonmonotonicities. Results on the relationship between bargaining solutions and the stochastically stable states of coordination games under individualistic dynamics are summarized in Figure 14.

BEHAVIORAL DYNAMICS
There exists a rich literature on a variety of behavioral dynamics, rules which specify the current behavior of agents in a population, given what has occurred in the past. Important earlier work includes Samuelson (1994, 1999); Bomze (1986);Friedman (1991); Hofbauer and Sigmund (2003); Hofbauer and Weibull (1996); Nachbar (1990); Ritzberger and Weibull (1995); Samuelson and Zhang (1992). This work concerns processes with both finite and infinite populations under dynamics with both discrete and continuous time steps, and sometimes the approximation of one type of process by another (Benaïm and Weibull, 2003). Here we consider some recent contributions to this literature. Argiento et al. (2009) show that efficient signaling in a two state, two signal setting (similar to the signaling model of Lewis, 1969) can be obtained asymptotically with certainty using the urn based reinforcement learning model of Roth and Erev (1995) and Erev and Roth (1998). A sender observes the state of nature (state 1 or 2), and chooses a signal (A or B) by drawing a ball from an urn that corresponds to the state (urns 1 and 2). The receiver observes the signal and chooses an action (1 or 2) by drawing a ball from an urn that corresponds to the signal (urns A and B). If the receiver's action matches the state of nature, then the players 'win'. When they win, they each add a ball to the relevant urns. For example, if a win is attained when the state of nature is 1, the sender chooses B and the receiver chooses 1, then the sender adds a B ball to urn 1 and the receiver adds a 1 ball to urn B. This process converges to one of two possible signaling systems in which state-signal-action triplets correspond perfectly, such as when state 1 induces signal B which induces action 1 and state 2 induces signal A which induces action 2. This result contrasts, qualitatively as the models are very different, with the evolutionary stability of inefficient communication that was discussed in Section 4.2.3.

Reinforcement learning
The general version of the reinforcement rule of which the above is an example involves each player, after playing a strategy, adding a number of balls (not necessarily discrete) to his urn for that strategy. Building on results that stochastic reinforcement models can be approximated by deterministic replicator dynamics (Beggs, 2005;Börgers and Sarin, 1997;Hopkins, 2002;Laslier et al., 2001), Ianni (2014) shows that if the state (in terms of the probability of players choosing strategies) is close to a strict Nash equilibrium of the underlying game, then the amount of time that has elapsed under the process (so that the urns are full enough that learning has become slow) can be chosen so that the process converges to the Nash equilibrium with arbitrarily high probability. The proof uses the fact that trajectories of the replicator dynamic that start within the basin of attraction of a strict Nash equilibrium converge exponentially fast to show that, for small , if learning is made slow enough, the process can be made to stay within of the trajectory of the replicator dynamic with arbitrarily high probability.
An alternative model of reinforcement learning is the Cross Model (Cross, 1973), under which a player updates a probability vector x according to which he plays each of a finite set of actions. After choosing action i (which occurs with probability x i ) and receiving a payoff from the game, the player transfers a share of the probability mass (1 − x i ) from the other actions to action i. The share that is transferred is proportional to the payoff he received. Lahkar and Seymour (2014) extend this model to incorporate the possibility of negative payoffs. If i yields a negative payoff for the player, then a share of the probability mass x i is transferred to the other actions. The share that is transferred is proportional to the absolute value of the payoff received. It is noted that unlike the basic Cross Model, this model cannot be approximated by the replicator dynamic. In particular, any state at which every player plays strategy i with probability one and obtains a negative payoff cannot be a rest point of the dynamic (even if it corresponds to a Nash equilibrium of the game), as negative reinforcement will then transfer probability away from this strategy. Lahkar (2017) shows conditions under which a continuum population playing a stag hunt under this dynamic will converge to all playing Stag or all playing Hare. If payoffs at the Hare-Hare equilibrium are negative and payoffs at the Stag-Stag equilibrium are positive and sufficiently large, then the state at which every player plays Stag is a globally asymptotically stable rest point. If Hare-Hare payoffs are positive, and the payoffs from playing Stag against Hare is negative and greater in magnitude than the difference between Stag-Stag and Hare-Hare payoffs, then the state at which every player plays Hare is an almost globally asymptotically stable rest point, and the state at which every player plays Stag is an unstable rest point. Other rest points, including polymorphic ones, exist when neither of these conditions is satisfied. Mertikopoulos and Sandholm (2016a) follow Sorin (2009) andHofbauer et al. (2009) in specifying reinforcement learning directly in continuous time. A choice map transforms a vector of scores (cumulative payoffs) from each action over time into mixed strategies. The choice map derives from maximizing the expected score minus a penalty function, the convexity of which encourages mixed strategies over pure strategies. Examples of such choice maps include the projection map (see, e.g. Friedman, 1991;Lahkar and Sandholm, 2008) and the logit map (see, e.g. Littlestone and Warmuth, 1994;Vovk, 1990), which is known to be equivalent to the replicator dynamic (see, e.g. Mertikopoulos and Moustakas, 2010;Rustichini, 1999, and citations earlier in this section). It is shown that iteratively strictly dominated strategies become extinct on solution paths and that, if penalty functions do not approach infinity at the boundaries of the simplex, then this happens in finite time. Weakly dominated strategies either become extinct or the profiles on which they are dominated become extinct. Stationary points of the dynamic correspond to Nash equilibria and strict Nash equilibria are asymptotically stable.

Imitation
Consider a setting in which two players repeatedly play a finite, symmetric game. One of the players, Alice, follows the behavioral rule imitate-if-better by which at period t + 1 she imitates the action of the other player at period t if and only if the other player obtained a higher payoff than Alice did in period t. Otherwise, Alice plays the same action at t + 1 as she did at t. Duersch et al. (2012) consider the zero-sum game defined by the payoff differences ∆(x, y) = π(x, y) − π(y, x) (7.1) and find conditions under which the sum of payoffs (across time) earned by a player playing against an imitate-if-better player are bounded above as time goes to infinity. When this is the case, they say that imitation is not subject to a money pump. An imitation cycle is a cycle of action pairs (x 0 , y 0 ), (x 1 , y 1 ), . . . , (x n , y n ) = (x 0 , y 0 ) (7.2) such that the non-imitative player always outperforms the imitator, ∆(x t , y t ) > 0, and the imitator imitates, so y t+1 = x t . It is shown that imitation is subject to a money pump if and only if there exists an imitation cycle. An imitation cycle such as (7.2) can be expanded into a cycle whereby a single player improves their payoff at each step, (x 0 , y 0 ), (x 0 , y 1 = x 0 ), (x 1 , y 1 ), . . . , (7.3) . . . , (x n−1 , y n−1 ), (x n−1 , y n = x n−1 ), (x n , y n ) = (x 0 , y 0 ).
Hence the absence of such a payoff improving cycle implies the absence of an imitation cycle. A game is a generalized ordinal potential game if and only if there does not exist such a payoff improving cycle (Monderer and Shapley, 1996), therefore in such a game the imitator cannot be subject to a money pump. It is also shown that if there exists an ordering on the strategy set such that payoffs are quasiconcave in a player's own strategy, then no money pump exists. Sawa and Zusai (2014) consider (continuum) population games with imitation rules according to which a revising player who is playing i considers an alternative strategy j drawn at random according to the strategy distribution given by the population state. With some strictly positive state dependent probability the player switches from i to j. Players can be of different types and the switching probability is assumed to be formed by adding together two nonnegative terms. The first term does not vary across types and can vary according to the identity of the strategy pair {i, j} under consideration. This is standard. The second term differs across types but does not depend on {i, j}. The second term, by inducing positive imitation independently of i and j, moves all types towards the average distribution of strategies in the population. Over time, the process converges towards the set of states at which there is an equal proportion of each type playing each strategy (the Wright manifold ) and the process approximates standard imitative dynamics without type diversity. It is interesting to note that the variability of the non-type dependent term in the switching rate according to the strategy pair {i, j} makes these generalized replicator dynamics possibly nonmonotonic in that strategies that generate higher payoffs need not have higher growth rates. Consequently, even a strategy that is strictly dominated by another (pure) strategy may survive indefinitely (Sethi, 1998). Mertikopoulos and Sandholm (2016b) consider a class of dynamics, Riemannian game dynamics, in which the trajectory of the dynamic is the maximizer of an expression equal to a gain function minus a cost function. For a given population state and a trajectory vector that implies a direction and speed of motion, the gain function equals the scalar product of the vector of payoffs from each action and the trajectory vector. It takes higher values for trajectories that increase the share of the population taking actions currently associated with high payoffs and grows linearly with the speed of motion. For a given population state, the cost function is a positive definite quadratic form, so that it increases quadratically in the speed of motion for any given direction. These cost functions correspond to Riemannian metrics on the state space. For example, the dynamic associated with the Euclidean metric is the projection dynamic (Nagurney and Zhang, 1997) and the dynamic association with the Shahshahani metric (Shahshahani, 1979) is the replicator dynamic. Results on existence and uniqueness of solution trajectories are derived and it is shown that Riemannian game dynamics satisfy positive correlation, a form of payoff monotonicity. When a Riemannian metric is the Hessian matrix of a convex function, dynamics are similar to those derived for reinforcement learning models by Mertikopoulos and Sandholm (2016a, see discussion in Section 7.1). For this class, it is shown that evolutionarily stable states are asymptotically stable and, for contractive (also known as negative semi-definite) games, a Lyapunov function is derived. Laraki and Mertikopoulos (2013) consider imitative dynamics in which, rather than the growth rates of strategies in a population being determined by payoffs, the rate of change of such growth rates, or even higher order derivatives, are determined by payoffs. One difficulty with analyzing such dynamics has been that relevant second order optimization methods are defined for unconstrained problems rather than for problems constrained by a strategy space. Flåm and Morgan (2004) tackle this problem by projecting velocities of orbits onto a subspace of admissible directions. Recognizing that projective methods can lead to discontinuous dynamics, Laraki and Mertikopoulos (2013) instead propose a solution based on the equivalence between the replicator dynamic and logit choice according to cumulative payoff scores used in reinforcement learning (see Section 7.1 for more on this equivalence). That is, they consider dynamics on unconstrained spaces of cumulative scores, including second and higher order cumulative scores created from summing lower order cumulative scores. An equivalence exists between choice based on these higher order scores and higher order variants of the replicator dynamic. Iteratively strictly dominated strategies go extinct un-der these higher order dynamics and this happens n orders of magnitude as fast under the nth order replicator dynamic as it does under the first order (i.e. standard) replicator dynamic. Moreover, unlike the standard replicator dynamic, for n ≥ 2, if the dynamic starts at rest, then weakly dominated strategies are guaranteed to go extinct. To see this, consider n = 2. If, for Alice, strategy i weakly dominates strategy j, then the acceleration towards strategy i will be greater than the acceleration towards strategy j. If the process starts from rest, this will give the dynamic a greater velocity towards strategy i relative to strategy j. This velocity advantage will be maintained even if the strategy profiles at which i has a strict advantage over j go extinct so that acceleration towards i and j becomes the same. This argument cannot be made iteratively for the following reason. It may be that following the elimination of Alice's strategy j, Bob's strategy k weakly dominates his strategy l. However, in the time taken for j to be eliminated, it may be the case that strategy l has gained a considerable velocity advantage relative to strategy k. Consequently, the previous argument does not work. Analogous results to those regarding stability and convergence of the standard replicator dynamic are given for the higher order dynamics. Hofbauer et al. (2009) consider the fact that logit choice according to cumulative payoffs over time is a solution of the replicator dynamic (see Section 7.1). Cumulative payoffs over time are equal to average payoffs over time multiplied by time t, so logit choice according to cumulative payoffs can be seen as logit choice according to time average payoffs multiplied by t. As t diverges to infinity, this approximates a best response to time average payoffs (we can think of t as equivalent to 1 /η, with η measuring perturbations from best response). However, best responses to time average payoffs are solutions to a process in which time average payoffs evolve according to a best response dynamic. Together, these results imply that the time average of the replicator dynamic is a perturbed solution to the best response dynamic, with the magnitude of perturbations vanishing over time. Accordingly, results can be derived on the long run behavior of the replicator dynamic in terms of the long run behavior of the best response dynamic on time average payoffs.

Sampling equilibrium and best experienced payoff dynamics
Osborne and Rubinstein (1998) model procedurally rational choice (Simon, 1978) in the following manner. Consider a symmetric game with a finite set of actions A = {a 1 , . . . , a m }. Let p be a probability distribution over A. Suppose that a player samples each of the actions in A once, playing the sampled action against an opponent who plays the mixed strategy given by p. Each of the actions in A is thereby associated with a realized payoff. The player who is sampling selects the action with the highest realized payoff (with uniform tie breaking). Letting w i (p) be the probability that this decision rule selects action a i , a sampling equilibrium is a mixed strategy p * that satisfies w i (p * ) = p * i for all i. Osborne and Rubinstein (1998) suggest that that these equilibria can be interpreted as rest points of a process whereby players in a large population are randomly matched to play the game, any given player in the population plays the same action every time he plays, and players entering the population sample each action and select one as described above. Sethi (2000) formalizes the idea of the dynamic proposed above with a family of dynamics according to which, if the current distribution of actions in the population is p, then the share of action a i in the population increases if and only if w i (p) > p. Under these dynamics, even strict Nash equilibria need not be stable, in contrast to payoff monotone dynamics (Weibull, 1995). Furthermore, it can be the case that the only stable rest points are sampling equilibria in which strictly dominated strategies are played with positive probability. A symmetric action profile (a i , a i , . . . , a i ) is inferior if, for every action a j = a i , there exists a k = a i such that a player's payoff from playing a k when one opponent plays a j and the other opponents play a i is strictly greater than his payoff at the symmetric action profile (a i , a i , a i , . . . , a i ). For games with three or more players, if a sampling equilibrium p * corresponds to the play of a strict Nash equilibrium that is an inferior action profile, then p * is unstable. We can connect this observation to the literature discussed in Section 2 by noting that if, for every alternative action a j , a symmetric strict Nash equilibrium in a i has a profitable pairwise coalitional deviation involving a j , then the Nash equilibrium must be inferior. Recent work by Cárdenas et al. (2015) fits a model of sampling equilibrium to experimental data on games with negative externalities, replicating features of the data despite being parameter free. Mantilla et al. (2017) study public goods games under the above dynamics. It transpires that Nash equilibrium profiles, even in dominant strategies, are unstable for a broad range of payoff specifications. Instability of non contribution (resp. full contribution) can arise when a player samples the action of contribution, encounters a mutant amongst his opponents who contributes (resp. does not contribute), and misattributes the positive (resp. negative) externality from this opponent's contribution (resp. non-contribution) to his own contribution. The precise conditions under which these effects lead to instability depend on a property of the binomial distribution. Sandholm et al. (2017) refer to the above dynamic and its variants as the best experienced payoff dynamics. In particular, they note possible alternatives for three parts of the dynamic, (i) the possibility of only sampling a subset of alternatives rather than every alternative; (ii) the size of the sample taken for each action, which as the above cited papers already discuss, can be more than one; and (iii) the tie breaking rule when several strategies give equal expected payoffs (calculated from the sample). It is emphasized that (iii) is more important in extensive form games than in normal form games, in which it generically has no effect. It is noted that these dynamics are characterized by systems of polynomial equations with rational coefficients and a discussion is given of concepts and methods relevant to solving such equations. Computer assisted proofs are used to analyze the centipede game (Rosenthal, 1981) when all strategies are sampled once and the tie breaking rule is to choose the rule that defects and ends the game at the earliest decision node, thus making it more difficult to achieve cooperation. Under this rule, for centipede games with at least three decision nodes, the equilibrium corresponding to the backward induction solution is unstable. Furthermore, for centipede games with three to six decision nodes, there also exists an asymptotically stable interior rest point of the dynamic. Note that, as every Nash equilibrium of the centipede game involves the game ending after the first decision node, this stable rest point does not correspond to a Nash equilibrium. This result continues to hold even for moderately large sample sizes. However, if a best experienced payoff dynamic samples all strategies a large number of times it approaches a best response dynamic, under which, as we shall now see, results differ.

Best and better response
Xu (2016) considers generic, finite, extensive-form games of perfect information and shows that every solution trajectory of the continuous best response dynamic (Gilboa and Matsui, 1991) converges to some Nash equilibrium component (a set of Nash equilibria with the same outcome). It is further shown that, from any interior initial state, the dynamic converges to the backwards induction solution for the extensive form game. Furthermore, this last result also holds for approximate best response dynamics that are sufficiently close to the standard best response dynamic. An approximate best response dynamic is one in which updating players may play best responses to strategy profiles which are close to the current strategy profile as well as best responses to the current profile itself. Zusai (2017b) defines the tempered best response dynamics, amending the continuous time best response dynamic for continuous populations. The idea is that the further the current payoff of a player is from his best response payoff, the more likely he should be to update his strategy to a best response. Hence, under these dynamics, the revision rate is strictly increasing in payoff gain from switching to a best response. Furthermore, the revision rate is zero when there is zero payoff gain, hence all Nash equilibria are rest points of the dynamic, even if they are not strict. Similar stability results to those that hold for standard best response dynamics (for potential games etc.) continue to hold. A similar dynamic is considered for finite populations in Kuzmics (2011), discussed in Section 8.1.4. Zusai (2018) generalizes the analysis to a broader class of dynamics in which the set of actions from which a player can choose at any given revision opportunity is allowed to vary and shows asymptotic stability of (regular) evolutionarily stable states (Taylor and Jonker, 1978).
Building on the work of Ritzberger and Weibull (1995), Balkenborg et al. (2013) define a generalized best-reply correspondence as a correspondence that, independently for each player, maps mixed strategy profiles to sets of mixed strategies that contain at least one best response. Under some continuity assumptions, the set of such correspondences can be considered as a lattice with a minimal element, σ, that coincides with a standard best response correspondence on a dense set of strategy profiles. The set of independent strategy mixtures over given subsets of the pure strategies is called a minimal asymptotically stable face if it is asymptotically stable under some generalized best-reply correspondence but does not contain any other such set. Inclusion relations are given between these faces and variants of existing set-valued solution concepts, which are defined either with reference to σ or with reference to the standard best response correspondence. Leslie et al. (2017) consider two player, zero-sum, discounted payoff, stochastic games, in which in every period there is a state of nature (from some finite set) that determines available actions and payoffs. The actions chosen by the players affect the probability of the state of nature in the subsequent period. A stationary strategy specifies a mixture over actions for each state of nature. From such a game, an auxiliary game can be derived for each state of nature (Shapley, 1953). In the auxiliary game, continuation payoffs from each state of nature in the future are assumed to be fixed. Of course, when the players optimize, these continuation payoffs are consistent with optimization in every auxiliary game. Leslie et al. (2017) define a continuous time best response dynamic for such games. Under this dynamic, the state space comprises both stationary strategies and the continuation values according to which best responses are calculated. Strategies evolve according to a best response dynamic according to the payoffs in the auxiliary game, taking continuation values as given. Continuation values evolve by moving closer to the expected discounted payoffs that players believe they will obtain by best responding in the auxiliary game. The dynamic is designed so that continuation values adjust at a slower rate than strategies. It is shown that, under this dynamic, strategies converge to the set of stationary optimal strategies of the game, and that both continuation payoffs and expected discounted payoffs converge to the same value, which is consistent with the strategies.
A finite game is weakly acyclic (Young, 1993a) if, from any strategy profile, there exists a path to a Nash equilibrium on which, at each step of the path, the strategy of only one player is altered and this player gains payoff as a consequence. Young (1993a) showed that, when a game is weakly acyclic and each position in the game is occupied by one player, we can expect play to converge to Nash equilibrium under best response dynamics, provided that strategy updating by players is not so predictable that absorbing cycles exist. Arieli and Young (2016) show that similar convergence can be expected when each position in the game is occupied by a finite population, the players of which update their strategies using pairwise comparison dynamics (i.e. better response). However, the time until such convergence occurs can grow exponentially with population size. The reason for this is that in order to leverage the weak acyclicity assumption, a sufficient amount of homogeneity in the play of at least some of the populations is required, and it may take a long time for this to occur. To achieve this homogeneity faster, it is assumed that for any strategy pair {i, j} corresponding to a position in the game, the ability of any member of the corresponding population to switch from i to j is controlled by a state variable α ij , which switches on and off at random. This induces sufficient homogeneity that, starting from any state, the process will be, on average, close to Nash equilibrium over a long enough time period, no matter how large the population is. See Section 8.4 for further examples of the use of state variables to achieve desirable outcomes. Also see Hurkens (1995) for results on convergence under best response dynamics to sets of strategies that are closed under rational behavior (Basu and Weibull, 1991). Babichenko (2013a), building on Babichenko (2013b), considers aggregative games. The games are aggregative in that the payoff of a player only depends on his own strategy and an aggregate statistic based on the strategies of all of the players. A common example is Cournot oligopoly, in which a player's strategy is a production quantity and aggregate production determines the price (see Section 4.2.5 for results on conventional behavior in such settings). A best react function differs from a best response in that it does not take into account the effect of a player's strategy on the aggregate statistic. However, if this effect is small, the best react function can approximate the best response function (Alos-Ferrer and Ania, 2005). Babichenko (2013a) considers a dynamic whereby players best react to an approximation to the aggregate statistic x given by the highest integer multiple of small ε that is lower than x. This makes the best react correspondence constant on small intervals of the aggregate statistic, which is then used to show that the dynamic converges to an approximate Nash equilibrium in a number of steps of order at most n log n, where n is the number of players and is assumed to be large.
In two strategy symmetric games, a best response strategy will also be favored by the replicator dynamic, giving identical basins of attraction under the best response dynamic and the replicator dynamic. Golman and Page (2010, see also erratum) show that there exist three strategy games with arbitrarily small overlap between basins of attraction. For example, the basin of attraction of strategy i under the best response dynamic can include 1 − of the state space, while the basin of attraction of strategy j under the replicator dynamic also includes 1 − of the state space. A parameterized sequence of such games is given in which → 0 as the parameter becomes large.
Open Topic 8 There is broad scope to develop dynamics that incorporate multiple agency (Section 2) for continuum populations. Such models will answer questions such as: (i) What are sensible ways to model multiple agency (e.g. coalitional updating) in an environment in which interaction is driven by random matching? (ii) When will the presence of multiple agency give different results to individualistic models and when will it make no difference?

Continuous strategy sets
Several recent papers have considered dynamics on games with continuous strategy sets. In the case where the population is a continuum, the population state is then a probability measure over the strategy set. Typically, these papers show existence of a solution trajectory under some continuity assumptions on the payoffs and the dynamic, then show convergence results for some classes of games. Cheung (2014) considers pairwise comparison dynamics (i.e. better response), by which any given player considers some alternative strategy (chosen from an exogenous distribution) to her current strategy, and may switch if the alternative strategy gives a greater payoff. Convergence results are proven for potential games and negative semi-definite (i.e. contractive) games. Cheung (2016) considers imitative dynamics such as the replicator dynamic, in which the alternative strategies that may be imitated are determined by the state. Specifically, the distribution over alternatives that was exogenous for the pairwise comparison dynamic is set equal to the state. Convergence to restricted equilibria, Nash equilibria of the game after some subset of strategies is removed, is shown for potential games. These two papers consider weak convergence of measures, as it seems reasonable that states in which every member of the population plays a strategy that is close to strategy x (under some metric that makes sense given the model), can be thought of as close to the state at which every member of the population plays strategy x. Lahkar and Riedel (2015) consider the logit dynamic, referring to a fixed point of this dynamic as a logit equilibrium. As logit choice is stochastic, these equilibria are diffuse (non-atomic), so strong convergence of measures is not too strong a convergence concept. Results on convergence to logit equilibria are shown for potential games and negative semi-definite (i.e. contractive) games. Perkins and Leslie (2014) consider a model of stochastic fictitious play in which players play according to the logit choice rule after forming their beliefs according to a fictitious play rule (see, e.g. Benaım and Hirsch, 1999;Benaïm and Hirsch, 1999;Brown, 1951;Fudenberg and Kreps, 1993;Kaniovski and Young, 1995). Convergence of such a process to the logit dynamic is shown and a convergence result given for negative definite games. The paper also considers the case where measures on the strategy space represent mixed strategies rather than population states. In this case, they show convergence to logit equilibria under stochastic fictitious play for two player zero-sum games with continuous action sets. Newton (2015) considers stochastic stability (see Section 4.2.1) on general state spaces. Finite state space models of stochastic stability (Kandori et al., 1993;Young, 1993a) dispensed with the requirement for the many regularity assumptions (compactness, continuity, bounded convergence) found in the prior literature (Freidlin and Wentzell, 1984;Kifer, 1988). Newton (2015) returns to infinite state spaces, but considers a finite set of orders of magnitude of transition probabilities. This allows a weakening of assumptions so that, for example, best response dynamics can be considered even when best response correspondences are discontinuous. Conditions are given under which the standard tools of stochastic stability can be used.

Completely uncoupled dynamics
A dynamic is uncoupled if the strategy choice of a player does not directly depend on the payoffs of the other players (see Hart and Mas-Colell, 2003). Examples include best response dynamics, better response, and fictitious play. A dynamic is completely uncoupled if the strategy choice of a player does not depend on any information about the other players (see Foster and Young, 2006). That is, when a player updates his strategy he does not take into account the payoffs or actions of other players, nor even the number of other players in the game. One agenda in this literature is to give simple dynamics that lead to convergence to Nash equilibria or approximate Nash equilibria. One example is regret testing (Foster and Young, 2006;Germano and Lugosi, 2007) under which each player plays some current mixed strategy most of the time, but occasionally chooses a random action. Every so often, a player compares the average payoff from his strategy to his average payoffs from other actions and if the latter exceeds the former by some tolerance level τ > 0, then he randomly chooses another strategy. A further example is trial and error learning (Pradelski and Young, 2012;Young, 2009b) in which a player's proclivity to experiment with new actions is determined by a state variable, the player's mood (see Section 8.4 for further discussion).
Another agenda is to describe possibility and impossibility results. That is, to answer the question of whether completely uncoupled dynamics can achieve certain types of convergence. Building on previous literature (e.g. Foster and Young, 2001;Hart, 2011;Hart and Mansour, 2010;Hart and Mas-Colell, 2000, 2003Young, 2007), Babichenko (2012) gives bounds on what convergence is possible under completely uncoupled dynamics. Defining a completely uncoupled strategy mapping as a mapping from sets of possible actions to completely uncoupled learning rules, he shows that there is no completely uncoupled strategy mapping that leads to almost sure convergence of play to pure Nash equilibrium in every finite generic game that has a pure Nash equilibrium. However, if action sets are distinct, in that any two players' action sets are disjoint, then such a strategy mapping exists. Moreover, if players are allowed to condition behavior on either their own identity (their index in the player set) or on the number of players in the game, then such a strategy mapping exists. If learning rules are restricted to have finite memory, then the negative results of the paper must continue to hold, whereas the positive results continue to hold if all possible payoffs can be encoded in finite memory. Minimal cost spanning tree methods for finding stochastically stable states (see Young, 1998b) are conceptually simple and, given least cost transition paths between conventions, can be found in polynomial time using Edmonds' algorithm (Chu and Liu, 1965;Edmonds, 1967). Finding least cost transitions, however, can be tricky. Often the number of conventions remains the same as the size of a finite population increases, but the number of paths between conventions increases exponentially.
To find the cost of least cost transitions between conventions, Sandholm and Staudigl (2016), building on Staudigl (2012), show that, as a finite population grows large, transition costs can be approximated by solutions to continuous optimal control problems. They illustrate the method by analyzing stochastic stability under logit choice of three strategy coordination games that satisfy the marginal bandwagon property of (Kandori and Rob, 1998), under which, for strategies i, j, k, i / ∈ {j, k}, we have that π(i, i) − π(i, k) > π(j, i) − π(j, k). That is, players of strategy i gain most from playing against strategy i compared to playing against another strategy k.  consider two populations, members of which are matched to play coordination games with an arbitrary number of strategies and zero payoff off-diagonal, and who make decisions according to the logit choice rule. For large populations, the costs of escaping the basins of attraction of conventions are estimated by explicitly constructing bounding functions. See Section 6.4 for an application of these results to the relationship between bargaining solutions and stochastic stability in coordination games. Freidlin and Wentzell (1984, Chapter 6.6) show that there exists a cyclic decomposition of a perturbed adaptive dynamic. The underlying idea is simple and we illustrate using an example. Assume that the dynamic without perturbations converges to the set of conventions {v, w, x, y, z} and that the costs of least cost transitions between these conventions are given by Figure 15. From each convention, consider the least cost transition to another convention, illustrated by directed edges between conventions in Figure 16, in which the edges are labeled with their associated cost. This decomposes the state space into cycles. The pro-  Figure 16: Cyclic decomposition. The unperturbed dynamic converges to conventions in the set {v, w, x, y, z}. A directed edge from a convention corresponds to a least cost transition from this convention as per Figure 15. This decomposes the set of conventions into two sets, {v, w, x} and {y, z}. Transition costs between these two sets are determined as described in the main text.

Cyclic decomposition
cess will, in expectation, spend a long time within a cycle before it transits to another cycle. The cycles themselves can be treated as stable sets with associated transition costs. The transition cost from the set {v, w, x} to {y, z} is determined by finding the lowest cost tree on {v, w, x, ξ}, rooted at ξ, for some ξ ∈ {y, z}. The reader can check that this tree is {v → w, w → x, x → y} which has a cost of 14. To find the cost of the transition from {v, w, x} to {y, z} we subtract from this cost the cost of the lowest cost tree on {v, w, x}, which is {w → x, x → v} rooted at v, which has a cost of 4. Hence the cost of the transition from {v, w, x} to {y, z} is 10, as illustrated in Figure 16. The cost of the transition from {y, z} to {v, w, x} can be similarly determined to be 7. Economists are likely to be familiar with such modified cost arguments via the finite state space exposition of Ellison (2000), who gives corollaries of the hitting time results of Freidlin and Wentzell (1984) and the finite state space stochastic stability characterization of Young (1993a).
The above shows that, for small perturbations, the process will spend much more time in {v, w, z} than in {y, z}, and within {v, w, z} will spend much more time at convention v than at any other convention. The grouping of states into cycles, which in turn can be grouped together in metacycles and so on, gives an idea of the medium term behavior of the dynamic process. The cyclic decomposition has recently been described in a finite state space setup, first by Cui and Zhai (2010) and then by Levine and Modica (2016).
Finally, we note that Peski (2010) uses Edmonds' algorithm, which is closely related to the cyclic decomposition, to show that in symmetric, two strategy, two player coordination games played on any network under the best response dynamic with uniform errors, the convention at which every player plays the risk dominant strategy is stochastically stable, adding to existing results that the convention in the risk dominant strategy is uniquely stochastically stable on large enough complete networks (Kandori et al., 1993;Young, 1993a), but may not be uniquely stochastically stable on some networks (Blume, 1996). For completeness, consider the contrast of uniform errors with results for logit choice. Risk dominance is uniquely stochastically stable under asynchronous (only one player updates his strategy at a time) logit choice on any network. This is due to the fact that the game has a potential function and potential maximizers are stochastically stable under asynchronous logit (Blume, 1993). Potential maximizing profiles need not be stochastically stable when synchronicity in strategy updating is possible (see Alós-Ferrer and Netzer, 2010;Marden and Shamma, 2012b).

Convergence time
For a coordination game like Figure 4[ii] played on a network, Young (2011) calls a set of players autonomous if, fixing the actions of all other players, potential is maximized when all of the players in the set play A. When this is the case, the hitting time for the process to reach a state in which all of these players play A can be bounded. If the entire network is decomposable into such sets of bounded size, then the convergence time for the whole network can be similarly bounded. Ideas of autonomy can also be used to find stochastically stable states in coordination games with heterogeneous preferences (Section 4.2.2) and to consider the relationship between the aggregation of incentives and the aggregation of agency (Section 2.4). Norman (2009) considers dynamics with a switching cost. For example, under best response, if the current strategy profile is s, player i may switch from s i tô s i if and only if where δ > 0 is the switching cost and δ = 0 corresponds to a standard best response dynamic. Consider a two player, two strategy coordination game with strategies s 1 , s 2 . A finite population has two conventions under the best response dynamic, each corresponding to one of the Nash equilibria in pure strategies. There is also a mixed Nash equilibrium of the game. For population states in which strategy shares approximate the probabilities under the mixed Nash equilibrium, individuals will be almost indifferent between s 1 and s 2 . Hence, if a switching cost δ > 0 is present, at such states, individuals playing either strategy will not choose to switch strategies. New conventions are created at these states due to the switching costs. These intermediate stable states will be visited by the process along transitions between the two preexisting conventions. By arguments found in Freidlin and Wentzell (1984) and popularized in economics by Ellison (2000), these intermediate conventions can speed the process of getting from one convention to another. For example, if, with δ = 0, it takes 10 uniform errors to move between homogeneous conventions and, with δ =δ > 0, it takes 5 errors to move from a homogeneous to heterogeneous convention or vice versa, then the expected time to transit from one homogeneous convention to the other is of order ε −10 when δ = 0, and is of order ε −5 when δ =δ, as the sum of two quantities of order ε −5 is also of order ε −5 . This result on switching costs reducing waiting times generalizes, for which see the cited paper. Note that the potential conservative effect of coalitional behavior noted in Section 2.2.1 operates according to the same mechanism as this switching cost effect, but in the other direction, as some intermediate conventions cease to be conventions following the introduction of coalitional behavior. Young (2013, 2014) bound convergence time under the logit dynamic for coordination games (Figure 4[ii]) on a complete network and on any network respectively. Unlike the other papers on convergence time in this section, they deal with finite, non-vanishing amounts of noise and do not consider transitions between conventions, but give conditions under which there is a strictly positive expected per period increase in the number of players choosing A. This is a function of the error rate, the payoff α and the network. For high enough error rate and high enough α, they find a bound on expected waiting time that works for all networks.

Elimination of weakly dominated strategies
Kuzmics (2011) considers normal form games in which each position in the game corresponds to a finite population. A best response dynamic is considered, with an individual's switching rate given by an increasing (from zero) function of the difference between the payoff from a best response and the payoff from his current strategy. The dynamic is perturbed by uniform errors. Error probabilities are taken to zero as population sizes approach infinity, with the product of error probability and population size approaching infinity. It is shown that, if the switching rate is insensitive to small payoff differences, then states at which a weakly dominated strategy is played may have positive probability under the limiting invariant measure. This is because the strategy profiles at which a weakly dominated strategy does worse than a strategy that dominates it may themselves become rare as ε → 0. When this is the case, the payoff difference between the dominated and dominating strategies will be low, so that if the rate of switching to best responses is insensitive to small payoff differences, the flow of probability away from the weakly dominated strategy due to best responses may not always outweigh the flow of probability towards it due to errors. A similar dynamic is considered for continuum populations in Zusai (2017b), discussed in Section 7.4.

Further stability results
Ely and Sandholm (2005) present a model in which players in a population have fixed types and the state of the dynamic is a Bayesian strategy that spec-ifies a distribution over strategies for every type. The number of each type in the population, together with this strategy, then determines aggregate behavior. It is shown that, when the Bayesian strategies evolve under the best response dynamic, aggregate behavior can be described by a simplified best response dynamic that takes aggregate behavior as its state. The dynamic is aggregable. Zusai (2017a) studies nonaggregable dynamics. In particular, it is shown that the tempered best response dynamic (see Section 7.4) is nonaggregable: if payoff differences affect switching rates so that some types are evolving faster than others, then different Bayesian strategies that induce the same aggregate behavior may induce different aggregate trajectories. Several other common dynamics are likewise shown to be nonaggregable. Heller (2014c) shows that, contrary to what was previously thought, the concept of limit evolutionarily stable strategy (limit ESS) of Selten (1983) does not imply neutral stability. A limit ESS is a limit of ESSs of some sequence of perturbed games in which each strategy must be played with at least some nonnegative probability that approaches zero. This does not imply neutral stability as it may be that some strategy i is a limit ESS and that on the convergent sequences of perturbed games that make this so, strategy j does not outperform strategy i due to the small probability with which some strategy k is played. However, without the play of strategy k, it may be that i can be outperformed by an invading strategy j, contradicting neutral stability of i. Any ESS in a sequence of convergent ESS must, by definition, be robust to small invasions of other strategies. Heller (2014c) strengthens this requirement and defines uniform limit ESS by requiring the definition of small to remain the same along the sequence. Neutral stability is then implied in the limit by continuity. Heller (2017) analyzes the evolutionary stability of belief-free equilibria , a type of equilibrium of repeated games in which players do not observe the actions of their opponents but receive private signals which are correlated with those actions. A sequential equilibrium is belief-free if continuation strategies are optimal independently of beliefs about opponents' histories of actions. Heller (2017) shows that, if a belief-free equilibrium is evolutionarily stable, then it must be trivial, meaning that, independently of history, equilibrium strategies must specify that, each period, a Nash equilibrium of the stage game be played. If signals are informative, an equivalent result for neutral stability holds. In this case, a non-trivial belief-free equilibrium can be invaded by a mutant strategy that plays actions that are optimal per the specified continuation equilibrium, but uses information from the signals to induce correlation in actions so as to earn higher payoffs when playing against itself.
Van Veelen (2012) considers indirect invasions, in which an incumbent strategy s * is invaded by a strategy s that is neutral with respect to s * , following which another strategy invades and strictly outperform s . Specifically, strategy s * is robust against indirect invasions (RAII) if there does not exist a sequence of mutations, s * = s 0 , s 1 , . . . , s m , such that s i is neutrally unstable but not evolutionarily unstable against s i+1 for i = 0, . . . , m − 2, and s m−1 is evolution-arily unstable against s m . Placing the concept within the inclusion hierarchy of existing concepts, any evolutionarily stable strategy is RAII, any RAII strategy is neutrally stable, and any neutrally stable strategy is a Nash equilibrium. Moreover, equivalence classes of strategies that are reachable from one another via sequences of neutral mutations are shown to be the same as minimal evolutionarily stable sets of strategies under the definitions of Thomas (1985) and Balkenborg and Schlag (2001).
Building on previous work (Bendor and Swistak, 1995;Boyd and Lorberbaum, 1987;Farrell and Ware, 1989;Selten and Hammerstein, 1984) that has shown the non-existence of evolutionarily stable strategies in repeated games (due to mutations off the equilibrium path) and the existence of neutrally stable strategies in such games, García and van Veelen (2016) consider strategies that are robust against indirect invasions as per Van Veelen (2012, discussed above). For any two player repeated game, they show that any incumbent repeated game strategy s can be neutrally invaded by a strategy s that differs from s only in that its behavior after some time t no longer depends on the opponent's action at time t. That is, making a connection to Section 3, assortativity between future action sequences played between s types is unaffected by actions at time t. Consequently, (i) if profile (s , s ) (equivalently (s, s)) does not lead to a Nash equilibrium of the stage game being played at t, then s can be invaded by some strategy s * that maximizes its stage game payoff at t. Furthermore, (ii) if (s , s ) (equivalently (s, s)) does not lead to efficient play at t amongst symmetric feasible action profiles, then s can be invaded by some strategys that uses actions at t as a signal to induce assortativity, in that, in subsequent periods,s plays efficiently when playing againsts and as s when playing against s . The reader may recognize arguments (i) and (ii) as almost perfect parallels of the arguments supporting Nash behavior and efficient behavior in Newton (2017a) as discussed in Section 3.2.1. Argument (ii) is a secret handshake argument (Robson, 1990), the connections to assortativity of which were remarked at the end of Section 4.1.4.

Further convergence results
Oyarzun and Ruf (2014) consider stochastic processes in which the object of interest is a variable P t that takes values from zero to one, for example the rate of adoption of some technology, or the probability of choosing an optimal action in a decision problem. Sufficient conditions are given for P t to converge to one with high probability as t → ∞. Firstly, the expected relative hazard rate, (Pt+1 − Pt) /Pt(1 − Pt) (Young, 2009a), is bounded below so that P t is a submartingale (see, e.g. Williams, 1991). Secondly, the difference between successive values of P t is required to be sufficiently small. This convergence result is applied to the individual learning model of Börgers et al. (2004). It is further shown that if the difference between successive values of P t diminishes over time at an appropriate rate, convergence obtains almost surely. This result is applied to a variant of the imitative dynamic of Schlag (1998) and to the reinforcement learning model of Erev and Roth (1998, see Section 7.1).

Distributed control
Rather than ask what the implications of evolutionary dynamics are for games, the literature on distributed control asks how a decentralized system can be designed to achieve some goal (Marden and Shamma, 2012a). It may be impossible for components (agents) within a system to communicate with some centralized decision maker, or even with each other, so it may be necessary to instead focus on the optimal design of agents' decision rules so as to best achieve the goals of a planner. For example, swarm robotics (see, e.g. Brambilla et al., 2013;Nemitz et al., 2017) aims to accomplish complex global tasks through the interactions of large groups of autonomous agents. Quijano et al. (2017) discuss how distributed control can be used in urban planning to create smart cities, giving as examples the design of lighting systems, efficient power generation by microgrids and the control of drainage systems. The literature is particularly concerned with concerns of efficiency, computational complexity (difficulty of computations) and time complexity (number of iterations required for convergence).
From an efficiency perspective, it may not always be possible to design a system that achieves the optimal outcome. The price of anarchy is a measure of efficiency defined to equal the ratio of the optimal value of a goal function to the worst possible equilibrium outcome. For games of resource allocation amongst agents, recent results on the price of anarchy under Nash equilibrium and under a form of correlated equilibrium are found in Marden and Roughgarden (2014) and Roughgarden (2015) respectively.
One question that can be asked is whether payoffs for agents can be chosen to give a potential game structure, thus ensuring convergence under a variety of common dynamics. Again in the context of resource allocation, one possibility is for the marginal utility of a resource to an agent to be set equal to the marginal increase in the goal function, taking as given the other agents that hold the resource. Another alternative that also generates a potential game is to use a weighted Shapley value (Marden and Wierman, 2013).
Regarding computational complexity, work has been done to reduce the complexity of dynamics that are known to have nice convergence properties. For example, Marden et al. (2009) give a version of fictitious play in which a player, rather than considering the historical distribution of play by each of his opponents separately, instead assesses the effectiveness of actions against entire action profiles (excluding his own action) played in past periods, effectively assuming correlation in opponents' play. In finite potential games, this dynamic is shown to converge almost surely to a pure Nash equilibrium.
Additional state variables can be added to a dynamic for the explicit purpose of obtaining desirable properties related to efficiency, complexity or convergence. Building on work of Young (2009b) that gives convergence to Nash equilibrium, Pradelski and Young (2012) give dynamics in which an agent's mood affects his likelihood of experimentation, and the dynamic converges to an efficient (in terms of sums of utilities) Nash equilibrium with arbitrarily high probability.  do similarly, but for efficient strategy profiles rather than just Nash equilibria. Similar use of an additional state variable is found in the use of 'market sentiment' in Pradelski (2015), which was discussed in Section 6.2. Marden (2012) extends the idea of potential to incorporate such a state variable, defining a potential function φ on action profiles and the state variable so that (i) fixing the value of the state variable, φ is a standard potential function on action profiles, and (ii) as long as the action profile remains the same, φ increases as the state evolves.
Open Topic 9 Distributed control is an area in which the methods discussed in Section 2 could be especially fruitful, especially as there is no prejudice as to what may constitute an agent. Indeed, Marden and Shamma (2012a) note that "There is some flexibility in defining what constitutes a single player. For example in wind energy harvesting, a player could be a single turbine or a group of turbines." Consequently, we can imagine systems in which both the wind turbine and the group of turbines exhibit agency in a complementary manner. Izquierdo et al. (2018) introduce "ABED: Agent-Based Simulation of Evolutionary Game Dynamics", open source software that, for one and two population models, can simulate many of the dynamic processes covered in this survey (see Sections 4.2, 7), including imitative dynamics, dynamics based on sampling, and dynamics based on best responses. It offers options to adjust the underlying game, the size of the populations, the decision rule, the likelihood of perturbations, and the frequency and synchronicity of updating. Visual outputs track the proportions with which each strategy is played in the populations and the expected payoffs of each strategy. Angus and Newton (2015) make MatLab code and documentation publically available for simulating games on networks under coalitional better response dynamics. The full model is a multi-generational group selection model of the evolution of the ability to participate in collective agency (discussed in Section 2.3), but the subroutines that deal with coalitional updating on networks within a single generation build on code written for Newton and Angus (2013), which was published without simulations as Newton and Angus (2015).

EMPIRICS
It can be asked whether the type of models surveyed in the current paper are a good fit for observed behavior. "If players play game X under a best response dynamic, then they will converge to convention y" is the type of theorem that  Figure 17: Information and context in an experiment. If only the shaded elements pertain, this is sufficient to make it possible that players follow an individualistic best response dynamic. However, all of the elements are compatible with players following such a dynamic, so there is nothing wrong with including any of them in a situational context that is being explored. Some of the elements, such as telling subjects individual best responses, might be expected to work towards inducing such a dynamic, whereas others, such as allowing subjects to talk, might be expected to work against it and in favor of some other dynamic such as coalitional best response. we find in these models. If a researcher attempts to replicate this implication empirically and convergence to y is not obtained, then it must be either that game X was not being played or that players did not follow a best response dynamic. Both of these things may be affected by all kinds of extraneous factors that are not in the original model. Consider Figure 17. The shaded inputs are sufficient for a player to follow a best response dynamic. The remaining inputs are then unnecessary. If a player does indeed follow a best response dynamic, his behavior will be the same whether or not he knows the payoffs of his opponents, in the same sense that if Newton's apple were to suddenly become aware of the laws of motion, this would not affect its acceleration towards the ground. However, it may be that knowledge of his opponents' payoffs does indeed affect a player's choices, creating a situation in which the theoretical model does not apply. The knowledge is not part of the model, but is part of the situational context. The importance of extraneous context is something about which the theoretical models have little to say, but about which much can be learned from empirical studies.
Another point to consider is that distinct behavioral dynamics may differ from one another only when they are not at rest. A given state y may be a rest point under multiple dynamics which exhibit different speeds and trajectories when not at rest. It may be that players play game X and convergence to y occurs, but that we do not observe the intervening periods during which convergence occurs and, consequently, are left with multiple plausible candidates for the exact dynamic, although dynamics for which y is not a rest point can be ruled out. Furthermore, different dynamics can have very different basins of attraction corresponding to any given rest point (see, e.g. Golman and Page, 2010, discussed in Section 7.4), so without knowing the process, it will not be possible to infer the stability properties of the observed rest point. From an empirical perspective, this shows that it is important to observe systems that are not at rest in order to distinguish between different dynamics. 9.1. Best and better response Oprea et al. (2011) have subjects play a hawk-dove game in continuous time, earning flow payoffs based on the current strategy profile. Both one population and two population treatments are carried out. In the one population treatment, play converges to close to the mixed Nash equilibrium proportions. In the two population treatment, play converges to an equilibrium in which one population plays Hawk and the other plays Dove. Both of these results are in accordance with the predictions of several theoretical dynamics (best response, replicator etc.). Subjects are shown information on their own strategy and payoffs and the average strategy and payoffs of their opponents. Note that the actual payoffs in this experiment are given by the average payoff of the game played against all opponents. Payoffs in the standard theoretical models can indeed be interpreted in this way, but can also be interpreted as expected payoffs under random matching or, in continuum populations, as average realized payoffs under random matching. This emphasizes that the mapping of theory to experiment is not one to one. Cason et al. (2013) consider a similar setup to the paper above, but consider rock, paper, scissors games in both continuous and discrete time. Here we discuss the continuous case. Players choose mixed strategies from a heat map on a simplex that shows them which strategies offer the highest instantaneous payoffs. In one treatment a player moves instantly to the chosen strategy, in another treatment his strategy moves continuously towards his chosen 'target'. Cyclic average behavior is observed. When the game payoffs are chosen so that the Nash equilibrium is stable (unstable) under the continuous best response dynamic, the amplitude of cycles is observed to decrease (increase) over time, but not as much as it would under the theoretical dynamic. In particular, in the stable case, average play (within a period) does not come to approximate the mixed strategy Nash equilibrium. This is perhaps related to the low population size of eight. Doraszelski et al. (2018) considers the frequency response (FR) market, a market in which electricity providers are paid to adjust supply to maintain the frequency of oscillations of alternating current in the United Kingdom electric power grid within a 1% band of 50 Hertz. A learning model is fitted to the data that has two components. (i) Suppliers of FR form beliefs about the prices offered by other suppliers through a fictitious play dynamic that, similarly to Marden et al. (2009, see Section 8.4), assumes correlation in the prices offered by the other suppliers. This dynamic is discounted so that recent periods are weighted more heavily. (ii) Suppliers learn about the model of demand for FR through adaptive learning, estimating demand as an econometrician might (Evans and Honkapohja, 2012). They find that, following an initial period of disorder after the market was created, there was a period in which their model considerably outperforms equilibrium predictions. However, this outperformance does not persist into the final part of their dataset, which they interpret as indicating that suppliers had by then become more adept at rapidly adjusting to changing market conditions. Following in the footsteps of Young and Burke (2001), who use evolutionary methods to study crop sharing norms, Koch and Nax (2017) study groundwater usage by farmers in the Upper Big Blue district of Nebraska in the United States of America. The model, one of common resource usage (see also Sethi and Somanathan, 1996), is a stochastic game, with the state variable being the amount of groundwater at the start of a farming season. The solution concept used for the game as a whole is Markov perfect equilibrium, but holding fixed continuation values in auxiliary games for possible future groundwater states, equilibrium behavior in the current period is justified by the convergence of a better-reply (i.e. better response) dynamic (Dindoš and Mezzetti, 2006;Friedman and Mezzetti, 2001). This is similar to the convergence of strategies, taking continuation values as given, shown by Leslie et al. (2017, discussed in Section 7.4), who also show convergence of continuation values under a best response dynamic designed specifically for stochastic games. Koch and Nax (2017) find that, contrary to their model's predictions, there is no empirical support for strategic substitutability in groundwater usage. To the contrary, low groundwater usage by farmers seems to induce low usage by other farmers. Friedman et al. (2015) consider a Cournot oligopoly game (two and three player treatments). Subjects observe the actions taken and payoffs gained by themselves and their opponents in the previous period. Unlike previous experimental studies (e.g. Huck et al., 1999) that have confirmed predictions that Walrasian equilibrium will arise (see discussion in Section 4.2.5 above), they find that, after initial increases in production move play towards the Walrasian outcome, production decreases, eventually reaching levels close to collusive profit maximizing outcomes as players mimic reductions in production by their opponents. One way of interpreting this behavior is that players create a new agent, coming to adopt a new heuristic that communicates via play and improves the welfare of all (see Section 2, in particular Open Topic 2).

Imitation
Clemm von Hohenberg et al. (2017) study a model in which an individual has an opinion o i before he is subject to social influence. After he is subjected to social influence from a population with opinions distributed according to f , he has an opinion v i . The difference in opinion v i − o i is modelled as being a linear function of the first four moments of f . The parameters in the model are linked to three theories of social influence, (i) the linear positive model in which players adjust their opinion towards average opinion, with the size of the adjustment linear in the distance they find themselves from the average; (ii) the moderated positive model in which individuals are less influenced by those with whom they have large differences of opinion; (iii) the negative influence model in which individuals are repelled by the opinions of those with whom they have large differences of opinion. The study asks participants for an opinion after reading an article and being shown a distribution of the opinions of others (or a simulacrum thereof). Data from the experiment fits model (i). Using the model to predict long run dynamics in large populations, they thus show that social influence will reduce variance in opinions. Shortly after data for the experiment were collected, the tool used to collect the data was changed so that users had to give an opinion before being shown the distribution of prior opinions. Data from after this change further supports the results of the study. Mohlin et al. (2017) consider data collected byÖstling et al. (2011) on the play of a lowest unique positive integer game in which, simultaneously, each player chooses a positive integer that is less than or equal to some maximum value K, and the player that chooses the lowest integer that has not been chosen by any other player wins. Mohlin et al. (2017) find that a learning model they call similarity-weighted global cumulative imitation does a reasonable job of tracking the data. The model is essentially a reinforcement learning model (see Section 7.1) with two differences. Firstly, a player reinforces the weights he gives to strategies according to the outcomes of the strategies played by every player, not just the strategy he himself played. Secondly, strategies close to the winning integer, and not just the winning integer itself, are positively reinforced. -Chellew et al. (2015) considers three models of learning in public goods games. A particular goal of the paper is to explain declining payoffs when these games are played repeatedly over time. They consider three treatments, (i) a black box treatment in which participants are told that a black box into which they make their contribution will calculate the amount they get back according to a formula; (ii) a treatment in which participants knew they were playing a public goods game and were told the actions taken in the previous period by the other participants; and (iii) a similar treatment in which participants were also told the payoffs obtained by other participants. They test three hypotheses, (a) payoffbased (completely uncoupled -see Section 7.6) learning in which a previous increase in payoff following an increase (decrease) in contribution leads to an increase (decrease) in contribution (and vice versa for decreases in payoffs); (b) payoff-based learning under an assumption that players have altruistic concerns for the payoffs of other players; and (c) payoff-based learning plus a conditional cooperation motive that leads to an increase in contribution if other participants have increased their contributions. The evidence of the study strongly supports hypothesis (i) but not the other hypotheses, and in fact finds evidence of spite rather than altruism.  also look at payoff-based learning in public goods games and test five specific behaviors, (i) asymmetric inertia, (ii) asymmetric volatility, (iii) asymmetric breadth, under which after a decrease (increase) in payoffs players respectively exhibit lower (higher) inertia in action choice, higher (lower) volatility in action choice, and larger (smaller) changes in their choice of action; (iv) reversion, under which a change in action that leads to a decrease (increase) in payoff is reversed (retained); and (v) directional bias, in which if there is a salient ordering on the actions, a player who has changed their action and seen a payoff increase (decrease) will change it again in the same direction (change it in the opposite direction). Evidence for all five of these features is found at a statistically significant level, regardless of whether (a) a pure black box approach is adopted or (b) players have some idea of the structure of the game.

Errors in perturbed dynamics
Mäs and Nax (2016) study non-best response behavior (i.e. errors) in learning models. They look at two strategy coordination games on networks in which each player has a favorite strategy and the payoff of a player is the number of his neighbors who take the same strategy as he does, plus a bonus payoff if he plays his favorite strategy. Most choices (approx. 96%) are best responses to the strategies of the previous period. Errors (i) exhibit payoff-dependence, decreasing in frequency as conjectured payoff loss relative to playing a best response increases; (ii) occur with higher probability if the player changed strategy in the previous period; (iii) occur with higher probability if the players experienced a decrease in payoff in the previous period; and (iv) exhibit intentional bias (see Section 4.2.1), in that players are more likely to make an error when their best response is not their favorite strategy. Lim and Neary (2016) conduct a similar experimental study, but consider the language game of Neary (2012) (see Section 4.2.2) played in a population. The study also finds evidence of payoff-dependence and intentional bias in error probabilities.  consider two populations, players from which are matched to play a two player coordination game with zero payoffs off-diagonal (see Section 6.4). In experimental treatments in which each player in the game has five strategies, high levels of best response play (approx. 90%) are observed, as well as payoff-dependence and intentional bias in errors. As the game has five strategies instead of the two strategies in the previous studies, it is possible to understand intentional bias in the stronger sense that, for a given best response, errors that involve playing strategies associated with more preferred conventions (from the perspective of the error-making player) are more likely than errors playing strategies associated with less preferred conventions. In a two strategy model this cannot be observed, as for a given best response, there is only one strategy that can be played in error.
Open Topic 10 There is much work to be done on evolution and empirics.
(a) That the behavior of subjects in context X is approximated by dynamic Y may be non-generic in the sense that small changes to X may lead to the connection being broken. This is one reason, beyond the usual reasons, that replication is important, as any replication will never replicate X exactly (e.g. the weather outside the laboratory will be different). An important question is then the size of the set of contexts containing X that can be approximated by Y . This can be examined by including or excluding elements such as those of Figure 17. Resources permitting, for any positive results (i.e. X → Y mapping), X can be adjusted until the mapping fails. (b) Further study of separate attributes and features of decision rules, as discussed in several papers in Sections 9.3, 9.4, could be promising. In particular, the cues and information that influence each feature could be studied. (c) There is much real world time series data that could be considered using evolutionary models. (d) Theories of the evolution of traits, including preferences, should be tested, as suggested in Open Topic 5.

CONCLUSION
We began this survey by summarizing models of behavior as providing answers to the question who does what to whom and in what circumstances? We have seen how evolutionary game theory can be used to study and propose answers to every part of this question. Moreover, the models that arise are rich, deep and plausible, demonstrating how interaction can be complex even when decision making is not. As the survey has progressed, themes have arisen that bind together seemingly different topics. For example, assortativity has shown itself to be relevant to a broad range of subject matter. In particular, we have seen how every noncooperative way of inducing cooperation in prisoner's dilemmas, from tit-for-tat, through secret handshakes and parochial mutant invaders, to the most arcane strategies in repeated games, all amount to inducing assortativity in the strategies that are played against one another. Another theme has been the importance of length-based and steepness-based selection in perturbed dynamics. These effects sometimes operate in isolation and sometimes interact. How they interact depends on properties of the stochastic perturbations, a fact that motivates the empirical challenge of discovering which kind of perturbations are observed in which contexts.
We conclude by mentioning some areas that the author believes warrant special attention over the coming years. Firstly, there is still much work to be done to integrate and deepen our understanding of the role of agency in evolutionary game theory (see Open Topics 1,2,8,9). Work to date has only scratched the surface of understanding the role of the 'who' that is a fundamental part of our behavioral question. Secondly, evolutionary methods should establish themselves more firmly in applied social science. For example, there is no reason that sim-ple evolutionary theories cannot play a significant role in the study of industrial organization. There is a need for someone to do for evolution and industrial organization what Spiegler (2011) has done for bounded rationality and industrial organization. Thirdly, there should be more rigorous empirical research that gradually and carefully increases our understanding of evolution and adaptive decision making in practice (see Open Topics 5, 10). We need to know under what conditions certain models should be used and under what conditions they should be avoided.
The author hopes that the reader has found this survey as stimulating to read as it was to write and that it strengthens the shared intention of researchers in the field that the field grow and flourish.