Wright meets Markowitz: How standard portfolio theory changes when assets are technologies following experience curves

We consider how to optimally allocate investments in a portfolio of competing technologies using the standard mean-variance framework of portfolio theory. We assume that technologies follow the empirically observed relationship known as Wright's law, also called a"learning curve"or"experience curve", which postulates that costs drop as cumulative production increases. This introduces a positive feedback between cost and investment that complicates the portfolio problem, leading to multiple local optima, and causing a trade-off between concentrating investments in one project to spur rapid progress vs. diversifying over many projects to hedge against failure. We study the two-technology case and characterize the optimal diversification in terms of progress rates, variability, initial costs, initial experience, risk aversion, discount rate and total demand. The efficient frontier framework is used to visualize technology portfolios and show how feedback results in nonlinear distortions of the feasible set. For the two-period case, in which learning and uncertainty interact with discounting, we compare different scenarios and find that the discount rate plays a critical role.


Introduction
There is a fundamental trade-off, encountered throughout life, between investing enough effort in any one activity to make rapid progress, and diversifying effort over many projects simultaneously to hedge against failure. On the one hand, by focusing on a single task one can quickly accumulate experience, become an expert, and reap rewards more efficiently. But on the other, unforeseen circumstances can impede progress or make the rewards less valuable, so it may be wise to maintain progress on several fronts at once, even if individually slower. This brings to mind the familiar adage "don't put all your eggs in one basket", and at first glance appears very similar to the question of how diversified a portfolio of financial investments should be. However there is a key difference, which is that here learning is involved: the more effort we invest in one area, the more effective that effort becomes -so we may want to put all our eggs in one basket after all.
The dilemma is ubiquitous, and is understood intuitively by us all as we learn new skills, engage in new projects, and attempt to plan for the future. For example, consider trying to decide how many courses to take in university; or how many languages, musical instruments, sports, or web application frameworks to learn. Focusing on one, or just a few, allows us to gain expertise and reach a more rewarding phase of activity sooner. Or at the organisational level, firms and governments must decide how many, and which, strategic and technological capabilities to develop. We present a simple model for understanding this trade-off, and show how it is related to the optimal diversification problem for financial assets.
The reason this decision framework is of particular interest is that, despite its simplicity, it shares several important features with the question of how to allocate investment among competing technologies. This is because often, in the long run, scientific advances and knowledge gains mean that performance-weighted technology investment costs decrease as cumulative deployment increases. Put simply, the more we invest in a technology (whether at the R&D, deployment or other stage) the more effective the technology becomes at delivering the same output, so future investment costs are lower, per unit of output. Hence, in order to achieve certain long term technological goals, understanding the correct allocation of investment among available substitute technologies is vital. The specific question we have in the back of our minds is how to allocate funding among potential clean energy technologies to accelerate the transition to a zero carbon economy -should we invest in solar photovoltaics, or offshore wind, or next generation nuclear, or carbon capture and storage, or a little bit in each?
Despite this high-level motivation, here we focus in on a very simple conceptual model representing the underlying trade-off. The key assumption we make is that increased cumulative investment in a technology leads to reduced investment costs (but with some degree of uncertainty). This mechanism is not so straightforward in reality, as to some extent causality flows both ways. In addition, there are many other complicating factors in real applications, such as correlations between projects, and spillovers (incoming and outgoing) of various kinds. However, while stating the caveats clearly, we set aside these issues for now and just focus on the core problem, which is to find the optimal risk-averse investment in competing technologies following experience curves. This setting brings together the specialization incentives of the learning curve model with the diversification incentives of modern portfolio theory, allowing us to characterize the optimal solution to the trade-off between diversifying and specializing.
Our approach is to consider multiple independent technologies (two), increasing returns to investment (through experience curves), uncertainty (cost is a stochastic process), and a risk-averse decision maker (who minimizes a mean-variance value function). Since the two technologies follow uncorrelated stochastic processes, diversification tends to reduce the risks, but at the same time increasing returns tend to favor specialization. Investing in one option drives down its marginal cost, making it more and more attractive, but ex-ante uncertainty in future benefits from learning suggests that diversifying can limit the risk of over-investing in a technology that eventually shows a poor performance. We characterize optimal investment as a function of learning characteristics (rate and uncertainty of learning), initial conditions (cost competitiveness and accumulated experience), risk aversion and the level of demand. For simplicity, we consider the one-period investment decision.
In classical (Markowitz) portfolio theory, the optimal allocation of investments is unique. In general, the further a portfolio is from the optimum, the worse is its value. By contrast, when the positive feedback of endogenous technological progress is strong, it is better to invest mostly in either of the two options than to split investment more evenly. Except in some knife-edge cases, one of the two specialized portfolios is better than the other, but which one is best depends on the parameters. As a result, a small change in one of the parameters can result in the optimal portfolio being completely different.
In general we characterize three different regimes. In the first regime, one technology is so much better than the other that it dominates the portfolio entirely. This happens either because there is no risk aversion (so we revert to the classic deterministic learning curve winner-takes-all scenario) or because the relative advantage of one technology in terms of initial conditions or speed and uncertainty of learning is very strong. In the second regime, the optimal portfolio features unambiguous diversification, that is, the objective function has a unique optimum corresponding to a balanced mix of technologies. In the third regime, there are two local optima of similar value, corresponding to quite different investment policies. As parameters change, the transition of the global optimum from one local optimum to the other is abrupt. In other words, in this critical region, a small change in a parameter causes a large change in the optimal policy.
We show what this finding implies for the theory of path-dependence and lock-in, and characterize lock-in as a situation where investing in a fast-learning technology is not currently optimal, but it would be if a higher level of demand existed. Intuitively, whether or not one should attempt to bring a technology down its learning curve depends on the size of the market. For some parameter values though, the transition is sharp -there exists a critical level of demand below which investment in the fast-learning technology is limited, and above which it becomes dominant (i.e. the global optimum switches from one local optimum to the other).
We show analytically that a Markowitz-like case may be recovered in two different ways. First, when there is no learning, increasing returns are absent and it becomes highly unlikely that one would want to specialize entirely. (Specialization is still possible, but only because one technology is currently much better then the other, not because investing in it makes it better.) Second, when future demand is very small, compared to the current level, the potential for learning is insignificant and therefore investment is never enough for increasing returns to really matter.
Our results relate to several different branches of literature. First of all we are motivated by the optimal energy systems, energy transition and climate change literatures. It is now clear that to avoid a rise in temperatures that would have "dangerous" effects we must attain net zero carbon emissions, and one of the main factors involved in this is a switch to a clean energy system. However, dirty energy technologies are currently considered to be both cheap and convenient (in some ways due to legacy energy system design and infrastructure, and societal embeddedness), whereas alternatives are still expensive, even though their cost is falling, sometimes very fast. In this context the questions arise of what costs will be in the future, which decisions affect these costs, and what is the best investment or tax/subsidy policy. Energy systems are highly complex 1 and energy experts generally rely on highly detailed models of the energy production and consumption mix. To include endogenous technological change in these models, a simple solution which has been widely adopted (and criticized) is that of experience curves (Gritsevskyi & Nakićenovi 2000, Barreto & Kypreos 2004, Alberth & Hope 2007, Criqui et al. 2015, Webster et al. 2015. However, these models end up being very complex so that optimal policies are very hard to determine and understand. Numerical methods have to be used, and it is not always the case that the global optimum is found. Analytical approaches for these complex models generally have to assume a deterministic setting, so that the increasing returns induced by the learning curve lead to full specialization, see for instance Wagner (2014). In this paper we only wish to understand the fundamental trade-off involved in technology investment: diversification against specialization, and the risk of lock-in in systems with path dependent, self-reinforcing dynamics. Therefore, we do not attempt to provide a realistic model of the energy system but instead work with a more parsimonious model, so that we have confidence that we are able to find all global optima. We can provide results which are intuitive, and that we think would carry over to more complex models to a good extent. Our approach is both more realistic and interesting than deterministic or single-technology models, but also simpler and clearer than large models incorporating all the intricacies of energy systems. Because some of our assumptions are inadequate for energy systems (e.g. no time to build, perfect substitutability, etc.), we do not provide a direct empirical application of our results. Instead, we focus on a theoretical contribution at the intersection of the learning-by-doing and portfolio literatures.
To model technological progress, we use a very specific parametric model. Technological progress is not perfectly predictable, but in many detailed empirical cases it has been found that unit costs tend to decrease by a constant percentage every time cumulative production doubles. Subject to some uncertainty about future shocks, the cost of a technology follows an experience curve which is technologyspecific. This relationship between unit cost and cumulative investment has been observed for a long time (Wright 1936, Alchian 1963, Thompson 2012 and is generally explained by the fact that during production learning-by-doing takes place. Starting with Arrow (1962), a large literature has developed to analyse the consequences of this relationship for pricing and output decisions (Rosen 1972, Spence 1981, Mazzola & McCardle 1997. Learning-by-doing decreases marginal cost, which gives an advantage to size and may encourage predatory pricing (Cabral & Riordan 1994) or legitimate the protection of infant industry from international competition (Dasgupta & Stiglitz 1988). When one considers a single firm operating a single technology, learning by doing generates irreversibilities and creates an incentive to delay investment. The optimal investment dynamics can be characterized using the theory of real options (Brueckner & Raymon 1983, Majd & Pindyck 1989, Della Seta et al. 2012. In general, the literature does not study multiple technologies at the same time; when it does so, for instance when characterizing the social optimum for a multi-firm sector, it is generally in absence of uncertainty. Given our motivation to understand optimal investment in energy technologies, which are very diverse and uncertain, we turn to another branch of literature which has dealt in detail with investment in multiple uncertain assets, that is modern portfolio theory (Markowitz 1952).
Modern portfolio theory considers a risk-averse decision maker who wishes to invest in financial assets. The key result of portfolio theory is that there exists an optimal way of combining assets in a portfolio such that expected returns are maximized, conditional on a given level of risk (or that risk is minimized, conditional on a given level of expected returns). We argue that this idea is well suited for thinking about technology investment, and we borrow from portfolio theory the mean-variance value function (in our case, both expected costs and variance of the portfolio have to be minimized). For simplicity, however, we assume that technologies are uncorrelated. As opposed to a "learning curve technology", a key property of a financial asset is that investing in it does not change its value, although there are some important exceptions 2 . We recover a classical portfolio setup when the learning parameter is zero or when the total market size is very small compared to the initial production.
To sum-up, our model aims at including both an element of increasing returns, favoring specialization, and an element of risk, favoring diversification. Besides our general motivation (energy systems) and the two major ingredients of our model (experience curves and mean-variance portfolio theory), our setup relates to a large literature dealing with optimal control of stochastic processes, which goes well beyond economics and operation research. Of more direct interest are the applications to technology, R&D and innovation problems, where the questions of increasing returns and lock-in are more salient. When investing in an option makes it better and better, history matters. Atkinson & Stiglitz (1969) already pointed out that localized technological progress, an important source of which is learning-by-doing, would justify investing in a technology that is not yet the cheapest. In the technology choice literature, it is well known that increasing returns and uncertainty may result in situations where poor technological options dominate (David 1985). In a model of two competing standards operating under network externalities, Arthur (1989) showed that if chance favors an intrinsically worse option early on, this option's accumulated experience gives it an edge for obtaining the marginal consumer. As this advantage accumulates, it may forever exceed the benefits from switching to the intrinsically better option. In this context a policy maker is interested in a policy that optimally explores the merit of different options before making a final choice. Cowan (1991) characterized a social planner's optimal decision in a two arm bandit framework, where there is a choice between one of two technologies at every period. In this model, there exists an optimal policy known as the Gittins index, but according to this policy eventually a single technology will be chosen. Thus early bad luck may induce the social planner to lock in the wrong technology. While this theoretical literature often refers to learning-by-doing 3 , it attempts to model other forms of increasing returns all at once and therefore does not model more explicitly how cost decreases with investment. Zeppini (2015) considered learning curves for clean and dirty technologies in a discrete choice framework, with social interactions as an additional source of increasing returns to adoption and lock-in. He found that policies inducing the clean technology to progress down its learning curve faster have greater potential to induce smooth technological transitions, as opposed to traditional policies such as a pollution tax which can work only by being large enough to induce an equilibrium shift. Finally, another branch of literature has focused on finding optimal liquidation strategies (Almgren & Chriss 2001, He & Mamaysky 2005. Another situation in which financial portfolios incorporate feedback effects is when learning about an asset is taken into account. An investor who is familiar with a particular asset makes more precise estimates of expected returns, so that this asset is relatively more valuable than other assets (Boyle et al. 2012). In turn, holding a lot of a particular asset makes information acquisition about that asset more valuable, which can generate a positive feedback that encourages specialization (Van Nieuwerburgh & Veldkamp 2010).
3 When increasing returns are from the consumer side, typically as in Arthur (1989), they are generally motivated as learning-by-using following Rosenberg (1982).
has contrasted the benefits of increasing returns against the benefits of technological diversity by assuming that further technological progress takes place through recombination. This implies that there is some value in giving up on increasing returns from specialization and keeping a range of diverse technologies available for further re-combination (Van den Bergh 2008, Zeppini & van den Bergh 2013).
The paper is organized as follows. Section 2 defines the stochastic process for the experience curves and the optimization problem. Section 3 shows how it relates to Markowitz portfolios. Section 4 presents the main results of the optimization and shows under which conditions diversification is optimal. Section 5 analyzes in detail the objective function by characterizing how the number and nature of optima changes with underlying parameter values. Section 6 studies the effect of total demand. Section 7 returns to the comparison of financial and technology portfolios and shows how the efficient frontier changes when technology-like assets are introduced. Section 8 establishes conditions to escape lock-in by studying the case where a mature, cheap but slow-learning technology dominates the market but faces competition from a young, expensive but fast-learning challenger. Finally, Section 9 concludes.

Model
Consider the development of a single technology over one time period. The unit cost of the technology at time t is c t (measured in $/unit), and its cumulative production 4 (measured in units) is z t . Let t = 0 be the present time and t = 1 be some given future time. The current unit cost is c 0 and current cumulative production is z 0 . Production during the period is q, and the cumulative production at t = 1 is z 1 = z 0 + q. We first present the stochastic model for a single technology then consider a portfolio of two such technologies.

Wright's law
The standard form of the experience curve is where the constant α is the experience exponent (or Wright exponent) for this technology. This leads to two related concepts often used in the literature: the "progress ratio" is defined as the relative cost level seen after each doubling of cumulative production, P R = 2 −α , while the "learning rate" is defined as the relative cost reduction seen after each such doubling, LR = 1 − 2 −α . Dutton & Thomas (1984) report learning rates from different studies and find that the vast majority lie between 5% and 40%, corresponding to values of α lying approximately within the range (0.07, 0.7). However, commodities such as minerals and fossil fuels mostly have α ≈ 0 since they do not exhibit a significant cost decrease over the long run (Newbold et al. 2005, McNerney et al. 2011). The power law relationship between cost and cumulative production was first noted by Wright (1936) in the context of the production of airplanes, so we call it Wright's law. Since then it has been found to describe the available evidence for a number of technologies fairly well (Nagy et al. 2013). In contrast to a large part of the theoretical literature on experience curves, which deals only with the deterministic form, we model uncertainty explicitly. To do this we make the future cost stochastic by assuming additive noise η on the log-first-difference version of Eq. (1): This equation models a situation where, over the course of one period, an underlying linear trend in log-log space advances according to Wright's law, but then is hit by a random shock. It is one of the simplest possible ways of incorporating uncertainty in the experience curve model, chosen here specifically for its clarity and simplicity 5 . The cost of production at t = 1, interpreted as the average (or constant), withinperiod cost, is then given by So there is a distribution of possible future costs c 1 , and Eq. (3) shows clearly how it depends on: i ) the current state, c 0 , z 0 , of the technology, ii ) the technology's experience exponent α, iii ) the choice of production q over the period, and iv ) the noise distribution η. Next, we suppose that the shock is normally distributed with mean zero and volatility σ 2 , η ∼ N (0, σ 2 ). Then the cost is log-normally distributed, and by the standard log-normal properties the cost expectation and variance are given by These two properties of the stochastic experience curve, specified uniquely by the four parameters c 0 , z 0 , α, σ, will now be used to construct the portfolio model.

The optimization
Consider two independent technologies, A and B, each evolving according to the form of Wright's law proposed above, with their own technology-specific parameters. We label variables and parameters with superscripts (e.g. Suppose the technologies are perfect substitutes and that there is a fixed, exogenous demand K, which must be satisfied exactly by some combination of production of the two technologies 6 , i.e. there is a production constraint K = q A +q B . Production is non-negative, so q A , q B ∈ [0, K], and choosing q A also determines q B = K − q A . We use q A as the control variable in the following optimization and present results in terms of the share of total production in technology A, q A /K. Let the total cost of production during the period be V (q A ). This is just the sum of unit costs times unit productions where stochastic costs c i 1 depend nonlinearly on productions q i , as in Eq.
(3). Thus for a fixed, known set of technology parameters {c i 0 , z i 0 , α i , σ i } i=A,B and total demand K, each choice of production q A maps to a distribution of total costs V . The tools for addressing this type of problem are well developed, see for example Krey & Riahi (2013). The goal here is to understand how the parameters and the choice of production together generate the total system cost distribution, from which an optimal production portfolio may be identified. We perform a mean-variance analysis on V because it is simple, intuitive and illustrates clearly the key features of the system. Let λ ≥ 0 be a risk aversion parameter and f be the composite mean-variance objective function. The optimization problem is then minimize: subject to: The aim therefore is to find the production mix which, while meeting the production constraint, minimizes the expected total cost of production, plus an additional term characterizing the spread of the distribution of possible outcomes. The risk aversion parameter λ scales the contribution of the variance term in f , reflecting the extent to which the decision maker prefers to minimize exposure to cost uncertainty. In the risk-neutral case (λ = 0) the variance term has zero weight so the optimization just discovers the production mix with lowest expected total cost (which in this case is just a single technology). Conversely, in the high risk aversion case (λ 1) the second term in f dominates the first and so the optimization discovers the production mix with lowest total cost uncertainty, regardless of its expectation. In the intermediate regime both terms play a significant part in determining the outcome of the optimization. Using Eqs. (4), (5) and (6) the objective function in problem (7) may be written explicitly as (8) Thus f is just the sum of one cost-expectation-based component and one costvariance-based component for each technology; covariance terms are zero due to the technology independence assumption (i.e. η A and η B are uncorrelated).
This is a non-convex optimization problem so it may have more than one local minimum. Since there is only one free variable though, q A , it is relatively quick to solve by brute force optimization. Denote the optimum by q A * 3 Technological maturity and the no-learning limit

Markowitz portfolios
Consider briefly the topic of Markowitz portfolio analysis for standard financial assets (Markowitz 1952). Let r = (r 1 , . . . , r n ) T be a vector of stochastic returns (possibly correlated) and w = (w 1 , . . . , w n ) T be a vector of portfolio weights. The portfolio return distribution is V (w) = w T r, on which a mean-variance optimization is carried out, with w as control variable. The classic form of the problem is maximize: subject to: j=1,...,n w j = 1.
Since this is a mean-variance optimization it looks very similar to our technology portfolio problem (7). There are several differences though; three are superficial but one is fundamental. First, in the Markowitz case the decision maker seeks high expected portfolio return and low variance, while in the technology case they seek low expected portfolio cost and low variance 7 -hence the sign difference of the variance terms in (7) and (9). Second, short-selling is in general allowed, so portfolio weights w j are not restricted to being non-negative. Third, returns are generally assumed to be correlated, and a lot of attention is paid to understanding these correlations. Finally though, the fundamental difference between the two problems is that in the Markowitz case asset returns are purely stochastic, so portfolio weights do not affect asset performance, while in the experience curve model the stochastic costs depend explicitly on production, so portfolio weights do affect technology performances. The more one invests in a given technology, the better it gets; there is nonlinear feedback in the technology portfolio model but not in the Markowitz model.

Comparing financial and technology portfolios
To better understand the differences between the two portfolio types, we make a more accurate comparison by using a restricted version of the Markowitz model: the no short-selling, enforced budget, uncorrelated, two-asset model. This is a direct equivalent of our technology portfolio problem in a standard financial setting. It eliminates the second and third superficial differences listed above, making it easier to observe feedback effects.
Suppose there are two assets, A and B, with uncorrelated normal returns r A ∼ N (µ A , (s A ) 2 ) and r B ∼ N (µ B , (s B ) 2 ). Then let q A and q B = (1− q A ) be the proportion of wealth invested in A and B respectively, with q A , q B ∈ [0, 1] (the no shortselling condition). The portfolio return distribution is then V (q A ) = i=A,B r i q i , and the objective function to be maximized is Note that this is just quadratic in portfolio weights q i . Then returning to the technology portfolio problem and considering the role of demand K and initial cumulative productions z A 0 and z B 0 in the objective function, a simple calculation reveals the connection between the financial and technology models. Observe that the technologies objective function, Eq. (8), may be written When q i /z i 0 is small we can approximate this in a simpler form. If the maximum future production of technology i is much less than its current cumulative production (K z i 0 ), then q i /z i 0 1 and the binomial series representation may be used. Thus if K z i 0 holds for both technologies then to zeroth order the objective function becomes which no longer includes the learning exponents α i (see Appendix A for further details). Apart from the sign difference of the variance terms, this has the same form as the Markowitz model (Eq. (11)). In this limit learning plays no part, there is no feedback process by which production affects future costs, and hence a Markowitz-like portfolio problem is the limiting case of the Wright's law portfolio problem as learning effects tend to zero.
Eq. (13) shows that a low-learning regime can exist in two ways for a given technology: first, if its learning rate is intrinsically small, and second, if its initial cumulative production is very large compared to the total demand. The latter condition is problem-specific since it depends on K, not just on the technology itself. All else being equal, as K → 0 technologies behave increasingly like standard financial assets, as noise increasingly dominates learning effects. Very mature technologies automatically behave like standard financial assets (since the incremental gains due to learning decrease with maturity by definition in Wright's law).
Within any given problem then, each technology lies somewhere on a spectrum between more technology-like and more asset-like, depending on the entire set of parameters. We use "asset-like" simply to mean that learning effects are negligible relative to noise, as with standard financial assets.
A key observation is that in real world applications z i 0 is the only one of the four technology parameters (c i 0 , z i 0 , α i , σ i ) likely to vary by up to many orders of magnitude between technologies -the others tend to be more closely bunched. Therefore large differences in initial cumulative production, when they exist, may have a much greater effect on the way technologies behave in the optimization (and hence on the outcome) than we might at first think. Similarly for total demand; this is exogenously defined and can vary by orders of magnitude depending on model context (due to e.g. level of aggregation of technologies), yet it plays a key role in determining how the technologies behave in the model. These issues would be critical for applications.

Optimization results
The goal here is to understand how the optimal allocation of production between the two competing technologies depends on the technology-specific learning parameters (learning exponent α and volatility σ) and the initial conditions (cost competitiveness c 0 and cumulative production z 0 ) under varying levels of risk aversion λ, for fixed demand K. To do this we first hold all model parameters constant, then vary technology B Wright exponent α B and risk aversion λ. This generates a grid of tuples (α B , λ). At each point of this grid the portfolio optimization (7) is performed, and the resulting collection of optima is plotted, giving the surface of optimal production of technology A as a share of total production, q A * /K. The whole process may then be repeated for each of the other technology parameters σ B , c B 0 or z B 0 .

Effects of the relative learning exponents α
We set the initial conditions and parameter values to those shown in Table 1.
Almost identical technologies are used here as this allows us to understand the effects of varying different parameters most effectively. (Asymmetrical technologies are considered in Section 8.) Note that the total production is twice the initial cumulative production of each technology. Hence the technologies are relatively immature, in the sense that there is plenty of potential left for learning to take place relative to how much has occurred in the past. As shown above, this is necessary since if both technologies are sufficiently mature a near-Markowitz scenario emerges.   Fig. 1 shows the surface of optimal technology A production share, q A * /K, over a grid of α B and λ values. This shows how risk aversion and relative learning exponents affect the composition of the optimal portfolio. Red areas correspond to higher production of technology A being optimal and blue areas correspond to higher production of technology B being optimal.

Symbol Description Tech
When risk aversion is low the optimal strategy is to concentrate production entirely in either A or B, depending on relative learning exponents, whereas when risk aversion is high portfolios are diversified over both technologies. This is consistent with a general understanding of both deterministic experience curves (in which specialization is always optimal) and standard portfolio theory (in which diversification reduces portfolio risk). However, the nature of the transitions between these regimes depends on model parameters and is of great interest. For low risk aversion there is a discontinuity in the surface, indicating a region of extreme sensitivity to model parameters: an incremental change in either learning exponent or risk aversion can lead to a large change in the optimal portfolio. In contrast, for high risk aversion the surface is smooth, so the optimal portfolio is robust to small changes in parameters. As we shall see, this is caused by the existence of multiple local minima of the objective function in the low risk aversion regime, and a single global minimum in the high risk aversion regime.
On the λ = 0 boundary, variance terms do not feature in the optimization so production is concentrated in the technology with the best expected outcome. As risk aversion increases, up to around 0.2, the asymmetry in noise variance becomes apparent and the threshold for switching from 100% A to 100% B gradually shifts to larger α B values. The preference for the higher learning exponent technology (B in this region) is traded off against a preference for the less noisy technology (A), since the optimization penalizes higher noise variance. As risk aversion increases further portfolios become increasingly balanced. The surface discontinuity becomes Figure 1: Surface of the (global) optimal production in technology A as a share of total production. This shows how the optimal strategy varies with risk aversion λ and technology B Wright exponent α B (with α A fixed at 0.5). For low risk aversion full specialization is optimal, with production either 100% in technology A or 100% in technology B depending on relative learning exponents. For high risk aversion diversification over both technologies is preferred and the optimal share varies smoothly with both α B and λ. For intermediate risk aversion the optimal portfolio is diversified but is highly sensitive to model parameters, with production switching instantaneously between distinct optima under infinitesimal changes in α B or λ. Parameter values are shown in Table 1.  Fig. 1. This is the surface of optimal investment share in asset A for varying values of risk aversion and asset B expected return. Portfolios are more specialized for low risk aversion and more diversified for high risk aversion as before. However, in contrast to the technologies case the surface is continuous ∀λ > 0, due to the convexity of the objective function.
less pronounced as the two local optima on either side of it approach a common value. Eventually the discontinuity disappears, and a single stable global optimum exists thereafter.
Therefore some combinations of technologies and risk preferences are more robust than others: in some regions the solution is not particularly sensitive to changes in the underlying parameters, while for others it is extremely sensitive. In the unstable regions, a parameter estimation error could lead to a mix of technologies being chosen that is very different from the true optimal mix.

Comparison with Markowitz portfolios
To illustrate how nonlinearities in the technology portfolio affect the optimization results relative to the financial assets case, we plot the corresponding surface of optimal portfolio weights for the equivalent Markowitz system (10). With model parameters in Eq. (11) set to µ A = 0.5, s A = 1.0, s B = 1.1, Fig. 2 shows the surface of optima over a grid of varying asset B expected return µ B , and risk aversion λ. The usual patterns are present: portfolios are more diversified for higher risk aversion and more specialized for lower risk aversion, and full specialization occurs when one asset sufficiently outperforms the other. However the crucial difference is that now the surface is continuous everywhere except at one single point on the λ = 0 boundary. There are no positive values of risk aversion at which portfolios transition instantaneously from one state to another; portfolios vary continuously with both risk aversion and model parameters. This is because the Markowitz problem is convex. Without the Wright's law nonlinearity in f there do not exist multiple local minima for portfolios to instantaneously switch between as parameters vary, and hence no unstable regions of parameter space.

Analysis
Next we present some analytical observations which help in understanding the character of the problem and the shape of the surface in Fig. 1.

Corner and interior solutions
Since the optimization domain is bounded (q A ∈ [0, K]), solutions are either corner solutions or interior solutions. Corner solutions (q A * = 0 or K) satisfy f (q A * ) = 0 almost everywhere in parameter space, while interior solutions (q A * ∈ (0, K)) always satisfy f (q A * ) = 0. Corner solutions form both the dark red horizontal plateau with q A * = K on the left of Fig. 1 and the dark blue horizontal floor section with q A * = 0 at the front of the plot. All other points of the surface are interior solutions, at which optimal portfolios are diversified.

Local and global minima of the objective function
The nonlinearity in the model generates interesting behaviour because in some regions of parameter space the objective function has multiple local optima. Fig.  3 shows how the objective function varies along one particular line in parameter space. Risk aversion is fixed at λ = 0.25 and the technology B learning exponent is varied. The objective function is plotted for three different values of α B , showing how distinct local minima emerge and disappear. As α B varies the global minimum switches from one local minimum to another, and very different portfolios of approximately equal objective value exist simultaneously. When the surface discontinuity in Fig. 1 is crossed the global minimum switches from one local minimum to the other. This means that a parameter estimation error can lead to suboptimality and a portfolio that is significantly different to the correct optimal portfolio. Fig. 4 plots the locations of the different optima against α B . This shows how if the measured value of α B is, for example, 0.7 ± 0.02, then the optimal production share is roughly a 20:80 split, but either technology could be the dominant one, depending on what the true value really is.
Finally, since Fig. 4 is just the λ = 0.25 section through Fig. 1, it is apparent that if all optima were plotted on Fig. 1, not just the global minima, the surface would For smaller α B there is a single interior local minimum with production concentrated mainly in A. As α B increases a second local minimum appears, which then becomes the global minimum, and production switches to being mainly concentrated in B. This is what happens as the surface discontinuity in Fig.  1 is crossed -highly differentiated portfolios of approximately equal objective value exist simultaneously. double back under itself in a fold, smoothly connecting the upper and lower edges of the discontinuity. This type of geometry is well-known from the cusp catastrophe bifurcation (see e.g. Poston & Stewart (2014)). Although our setting is different since parameters here are not dynamic, the similarity is worth noting; both involve plotting the zeros of an underlying nonlinear system, resulting in a multivalued surface representing alternative stable states. In Appendix B we describe some other properties of the system that can be derived analytically. To study the other three parameter pairs we repeated the same procedure for each of them in turn.

Effects of other technology parameters
We found that all technology parameters produce the same effects as α i ; each generates a figure analogous to Fig. 1, but with α B replaced by the corresponding parameter: c B 0 , z B 0 , or σ B . Therefore each parameter pair has its own distinct regions of stability and instability in parameter space, just like α i . The only remaining parameter in the model is demand K, which we examine next.
6 Effect of total demand; demand-driven lock-in In the example system used so far, total demand K is twice the initial cumulative production of both technologies (K = 2z A 0 = 2z B 0 ). The potential for learning is therefore high and the model behaves very differently than the Markowitz case. We now consider how this behaviour changes as demand is varied. Intuitively we would expect that, all else being equal, the larger K is, the more potential there is for experience to accrue, so the more concentrated the portfolio will be in one technology. We demonstrate that our model produces this behaviour, and show in detail the transition from the small-K, low-learning regime to the large-K, highlearning regime.
In Fig. 5, technology A Wright exponent is still fixed at α A = 0.5, and instead of varying α B as before we also fix it, at α B = 0.65, and vary K. Risk aversion is fixed at λ = 0.25 and other parameters are those shown in Table 1 as before (so technology B progresses faster than A but is still slightly noisier). The plot shows how the local and global optima of the objective function vary with K while all other parameters are held constant. Fig. 5 is therefore analogous to Fig. 4, but with K as the independent variable. As well as the maximum and minima of f , the plot also shows the minimum of the Markowitz approximation to f for comparison. Again we observe instantaneous switching between portfolio states as demand varies.
As K goes from approximately 0 to 1 the share of technology A in the optimal portfolio (dashed black line) initially decreases sharply then reverses direction and Figure 5: Plot showing how optima of the objective function vary with total production K. The minimum of the Markowitz approximation to f (Eq. 14) is also shown for comparison. As K goes from approximately 0 to 1 the system transitions from a Markowitz-like, low-learning regime to a technology-like, learning regime. When demand is small (approximately K < 4) the more certain, slower progressing technology A (α A = 0.5) dominates the portfolio, but when demand is high there is enough scope for progress to occur that the noisier, faster progressing technology B (α B = 0.65) becomes optimal. The transition between these states is instantaneous as the global optimum switches between different local minima of equal objective value. Risk aversion is fixed at λ = 0.25 and other parameter values are those shown in Table 1 as before.
increases again. In this region the technologies transition from behaving in a more "asset-like" way to a more "technology-like" way, due to the increased potential for learning. This interpretation is confirmed by noting that for very small K, the minimum of f and the minimum of the Markowitz approximation to f roughly coincide (i.e. the black and yellow dashed lines overlap), so the solution for technologies is almost the same as for financial assets. But as K increases the two curves diverge due to the increasing impact of the nonlinearities in f .
When K is between approximately 1 and 4, despite technology B having a larger Wright exponent, demand is still too low for it to make enough progress along its experience curve to outweigh its higher variability, so technology A dominates. But as K increases a threshold is crossed (when K is just less than 4), and production suddenly switches to B. Thus we observe demand-driven technological lock-in, in the sense that if only a small amount of future demand is considered then it is optimal to continue investing in the slower progressing, less uncertain technology, but if market size is large enough then it is optimal to switch to the noisier, faster progressing technology now.

The efficient frontier
Technology portfolios can also be viewed in the efficient frontier framework. This technique is well-known in portfolio theory, and involves plotting each portfolio as a point in expected-return/variance space. We first describe the approach for the restricted Markowitz problem introduced earlier, then show how it applies for technologies.

Financial assets
In the Markowitz system defined above (Eq. (10)) there are two assets, with known return distributions. The portfolio weight of asset A, q A , is the single free control variable. Each q A ∈ [0, 1] describes a unique portfolio and as q A varies from 0 to 1 all feasible portfolios are spanned. For a given value of risk aversion only one of these portfolios is optimal. Each portfolio has a return distribution V (q A ), the expectation and variance of which can be used to plot a single point in expectationvariance space representing the portfolio. This gives the well-known Markowitz diagram: the x-axis is the portfolio variance, Var V (q A ) , and the y-axis is the expected return, E V (q A ) . The feasible set of portfolios is the curve traced out on these axes as q A varies from 0 to 1. Fig. 6 shows the feasible set for the two assets with fixed parameters µ A = 0.5, µ B = 0.6, s A = 1.0 and s B = 1.1. This horizontal parabola is known as the Markowitz bullet. The colour scheme is the same as before (cf. Fig. 2) so dark red corresponds to 100% asset A and dark blue to 100% asset B.
From the definition of f (Eq. (10)) we have E [V ] = f (V ) + λVar [V ], so on these axes the isolines of f (i.e. level sets of f ), for any fixed value of risk aversion, Figure 6: The feasible set of portfolios for the Markowitz system Eq. (10), with µ A = 0.5, µ B = 0.6, s A = 1.0 and s B = 1.1. This is the path of portfolios traced out as the proportion of asset A in the portfolio varies from 0% (dark blue, q A = 0) to 100% (dark red, q A = 1). To demonstrate how risk aversion and optimality are related geometrically, an isoline of f for risk aversion λ = 0.25 is plotted. The black dot at the point of tangency with the feasible set is the unique optimal portfolio for this λ. The two other black dots represent the two full specialization portfolios.
are just the straight lines of gradient λ. The value of f for each portfolio lying on a given isoline (i.e. the points of intersection with the feasible set) is given by the y-axis intercept. Then since we want to maximise f in this problem, the optimal portfolio for this λ is the unique point of intersection of the feasible set and the isoline of gradient λ with highest y-axis intercept. The efficient frontier is defined as the set of all portfolios which are optimal for some value of λ. Therefore, since λ ∈ [0, ∞), the efficient frontier here is the segment of the feasible set furthest into the upper-left-most quadrant of the diagram. These elements are all shown on Fig.  6.

Technologies
In contrast to the Markowitz model, in the technologies model we want to minimize both the variance and the expected cost, so the sign of the variance part of the objective function is reversed. The isolines of f are therefore now the downwardsloping straight lines, of gradient −λ ∈ (−∞, 0]. And since lower f is now better, an optimal portfolio is a point of intersection of the feasible set with the isoline of lowest y-axis intercept. The efficient frontier therefore consists of the part of the feasible set furthest into the lower-left-most quadrant of the diagram. Fig. 7 shows the expectation-variance diagram for two technologies, analogous to Fig. 6. Technology B has Wright exponent α B = 0.65 and other parameters are those shown in Table 1 as before. There are two significant differences between how technologies and financial assets appear in this framework. First, for technologies the feasible set is rotated and stretched. This is because the objective function is highly nonlinear in portfolio weights, not just quadratic. Second, as a direct consequence of this, the efficient frontier may now be split in to two disconnected components. This is the case in Fig. 7, where the efficient frontier consists of both the long red segment on the left (mainly technology A) and the isolated end-point on the right (100% technology B). This splitting of the efficient frontier is how instantaneous optimum switching is manifested in expectation-variance space: as risk aversion goes from λ = 0 to ∞ the optimal portfolio traverses the efficient frontier from one end to the other, jumping from one component to the other at the critical value λ = λ crit . At the point of the discontinuity f has two distinct minima of equal value, and there are two optimal portfolios, both of which lie on the same isoline, of gradient −λ crit . In this case these are the 100% A and 100% B portfolios.
Note that since α B = 0.65 in Fig. 7, the efficient frontier shown corresponds to the α B = 0.65 section through the surface in Fig. 1. Hence the change in the optimal portfolio as risk aversion varies can be traced out equivalently on both diagrams.
Viewing the problem in the expectation-variance framework makes it clear how technology portfolios of similar (or equal) value can coexist simultaneously (unlike in financial portfolios). Similar value portfolios may have either large expectation and small variance, or vice versa, or some combination in between. And since the Figure 7: The feasible set of portfolios for technologies with α A = 0.5, α B = 0.65. This is the path of portfolios traced out as the proportion of technology A production in the portfolio varies from 0% (dark blue, q A = 0) to 100% (dark red, q A = K). Other parameters are those shown in Table 1 as before. An isoline of f corresponding to risk aversion λ = 0.25 is plotted. The black dot at the point of tangency with the feasible set is the unique optimal portfolio for this λ. The two other black dots represent the two full specialization portfolios. Compared to the Markowitz case, the feasible set is rotated and stretched, and the efficient frontier is now a disconnected set. feasible set is no longer parabolic (due to Wright's law nonlinearities), there may be many very different portfolios lying near the optimal isoline. For example, in Fig. 7 all portfolios with 60-90% technology A (red) lie very near to the optimal λ = 0.25 isoline, because the feasible set has very low curvature here. This is suggestive of the behaviour we would expect to see in a multi-technology model. The increasing returns dynamic allows many different ways of generating portfolios of similar value, using different combinations of the various technologies' expectations and variances. This would result in a highly non-convex optimization problem with many local minima. Fig. 8 shows the effect of total demand on the feasible set and optimal portfolios. The technologies are fixed, and are the same as in Fig. 7. K is the only parameter which varies. Although the scales on the axes differ in each plot, the ratio between them is constant. Indeed, the dashed lines all have gradient λ = −0.25; they are the isolines corresponding to the optimal portfolio for λ = 0.25 in each case.

Effect of demand on the efficient frontier
When K is tiny there is very little potential for learning so the technologies behave in an "asset-like" way, and the feasible set is almost a parabolic Markowitz bullet. Here technology A (red) dominates the optimal portfolio for all values of risk aversion. As K increases the potential for learning increases, so the nonlinearities in f start to have an impact and the feasible set starts to become distorted. At around K = 1 the blue arm drops below the red arm, splitting the efficient frontier in two and indicating the presence of two equal value portfolios for the first time.
Here, for λ ≤ λ crit 100% technology B (blue) is optimal, but for λ ≥ λ crit technology A (red) is dominant. At around K = 4 the two arms cross and large portions of the feasible set are very close to the optimal isoline. Hence there are many different nearly-optimal portfolios in this case. After this point the two arms of the feasible set cross over completely so that technology B is dominant for all values of risk aversion. As K gets very large (> 10) the potential for learning is so high that near full specialization in the fastest-learning technology (B) is optimal for all levels of risk aversion.

Asymmetrical technologies and escaping lock-in
The paper so far has focused on the case of two almost symmetrical technologies, studying the behavior of the optimal portfolio as one parameter is changed, while all others are held constant. We now consider a more realistic and interesting example in which an established technology is challenged by a newcomer. Suppose the setting is one where a cheap, mature, slow-learning technology A is challenged by a costly, young, fast-learning technology B. Table 2 shows the parameter values used here.  (Table 1). Again, dark blue corresponds to 0% technology A (q A = 0) and dark red to 100% (q A = K). The dashed lines are λ = 0.25 isolines of f , and the black dots are the optimal portfolio for this risk aversion. The axes scales have been omitted for clarity (they are different for each plot). These plots show how the problem transitions from a Markowitz-like, low-learning regime when K is small, to a highly nonlinear high-learning regime when K is large.  Repeating the demand-driven lock-in analysis of Section 6, Fig. 9 shows the optimal portfolio surface over total demand and risk aversion axes. Technology A's initial cost advantage is so strong that a demand of at least 20 times the initial cumulative production of technology B is required to prevent full specialization in A, even for high risk aversion. We observe the familiar optimum switching as demand increases, demonstrating again how important the role of anticipated future demand is in determining the optimal production mix. In this case, the K-λ parameter space separates into three qualitatively different regimes: i ) for low K 100% technology A is optimal, ii ) for high K but low risk aversion 100% technology B is optimal, and iii ) for high K and at least moderate risk aversion, the optimal proportion of technology A is around 0-40%. Results like this could be very useful in applications, where often the key challenge is simply reducing the dimension of the decision space.
To summarise the behaviour of the model, note the direction in which each of the parameters would need to move in order to escape lock-in to an incumbent technology. All things being equal, avoiding lock-in to technology A would require: decreased B maturity, decreased B initial cost, increased B Wright exponent, decreased B volatility, increased demand K, or increased risk aversion λ.

Conclusion
In this paper we considered two technologies following stochastic experience curves and we characterised the optimal investment strategies. We used an objective function that accounts for total portfolio cost and uncertainty. The optimal investment depends on risk-aversion, initial conditions (relative technology maturity and initial cost), progress characteristics (mean progress rate and uncertainty of future shocks) and market size.
In contrast to classical Markowitz portfolios, in our setting investment lowers marginal cost, creating a larger and larger incentive to continue investing in the same option. But contrary to the deterministic case, in which these self-reinforcing effects lead to complete specialization, we find that accounting for uncertainty and risk aversion promotes diversification even if one option has better intrinsic techno- Figure 9: Optimal share of production of incumbent technology A (red) when in competition with challenger technology B (blue), for varying demand K and risk aversion λ. Parameter values are those shown in Table 2. Technology A has such a strong initial cost advantage that when demand is low (and hence the potential for learning is low), it is optimal to specialize fully in A, for all values of risk aversion shown. As demand increases, so does the potential for learning, and in order exploit the faster-learning challenger the global optimum switches instantaneously to a new local minimum, in which technology B dominates. In this example the parameter space separates into three qualitatively different regimes, 100% A, 100% B and approximately 0-40% A. logical characteristics. Our results therefore show how the choice of specializing or diversifying depends on the underlying parameters and initial conditions. Crucially, we find that the nonlinearity of the problem leads to multiple local optima, so that very different optimal portfolios can exist simultaneously, and the global optimum switches instantaneously between them as parameters change. This means that inside a critical region of parameter space a small change in one of the parameters can lead to a very significant change in the optimal portfolio.
We also established an analytical connection between portfolios of technologies and portfolios of financial assets. The Wright's law model of endogenous technological change may be expanded in a series approximation, the leading terms of which are equivalent to those found in the Markowitz model for financial assets. Only the higher order nonlinear terms contribute towards the learning feedback, and hence the Markowitz model may be regarded as the no-learning limit of the technology portfolio model. We also showed that the strength of learning feedback in the model depends on the complete set of model parameters, but in particular on technologies' previous cumulative production and the level of total future demand. As a result, each technology may be viewed as existing somewhere on a spectrum between more asset-like and more technology-like, depending on the specific set of parameters present. These findings help establish a theoretical basis for understanding how technologies behave in a simple mean-variance framework, and give insight in to how multiple optima can arise in the context of uncertain endogenous technological change.
Setting x to q i /z i 0 (with q i < z i 0 ) and α to −α i gives Thus if z i 0 is much larger than q i (specifically, if (q i ) 2 z i 0 ), then the approximation formed by discarding everything except the zero order terms in the series expansions will be reasonable. This gives the lowest order approximation to f , which has the same form as the Markowitz objective function, with c i 0 e (σ i ) 2 /2 playing the role of expected return and (c i 0 ) 2 (e (σ i ) 2 − 1)e (σ i ) 2 the role of the variance. Note that although the series expansion (18) converges provided q i < z i 0 , and is itself reasonably approximated by its zero order term (i.e. 1) provided q i z i 0 , the lowest order approximation to f (Eq. (20)) involves discarding the (q i ) 2 z i 0 term in the expectation part of Eq. (19), so is reasonable only when (q i ) 2 z i 0 . Thus the correct condition for the full optimization problem to be Markowitz-like is K 2 z i 0 ∀i. Higher order approximations to f , and the corresponding conditions for validity, may be computed similarly. See e.g. Kraus & Litzenberger (1976) for further details on portfolio selection involving preferences over skewness.
B Analytical points to accompany section 5

B.1 Onset of diversification
For a given pair of technologies the risk neutral (λ = 0) portfolio always concentrates production entirely in one technology due to increasing returns, while for sufficiently large risk aversion the portfolio is diversified over both technologies. The value of risk aversion at which the transition between these two regimes occurs (i.e. the onset of diversification) may be found analytically by calculating the intersection of corner and interior solutions. This is done by substituting the relevant boundary condition (q A = 0 or q A = K) in the first-order condition equation (f (q A ) = 0), and solving for λ. For example, if we fix α B = 0.8 and use our standard parameters from Table 1 then solving f (0) = 0 yields the value λ diversification = 0.254. Checking this against Fig. 1 confirms that this is indeed the point at which technology A first enters the portfolio in the case when α B = 0.8.

B.2 Location of the discontinuity
Observe that in Fig. 3 the value of the global minimum varies continuously with α B . This follows from the smoothness of the objective function. In fact the global minimum varies continuously with all the underlying parameters, in particular over the entire α B -λ grid of Fig. 1. Hence the value that f attains at the surface discontinuity in Fig. 1 is the same when approached from both sides, so we have f (q A * ) = f (K − q A * ). This means that near the discontinuity, portfolios on either side of it are symmetric about a 50% production share. For example, in Figs. 3 and 4 the neighbouring portfolios either side of the discontinuity (i.e. the point at which the optimum switches) are seen to be symmetric about 50%, at approximately 20% and 80% shares. Additionally, in the atypical case where q A * is known exactly (i.e. in the small-λ, full specialization region, where q A * = 0 or K), this insight allows the location of the discontinuity to be calculated analytically. For example, fixing λ = 0.1 and solving f (0) = f (K) for α B yields the value α B switch = 0.681, which matches Fig. 1.