Stochastic programming is a framework for modeling optimization problems that involve uncertainty. Whereas deterministic optimization problems are formulated with known parameters, real world problems almost invariably include some unknown parameters. When the parameters are known only within certain bounds, one approach to tackling such problems is called robust optimization. Here the goal is to find a solution which is feasible for all such data and optimal in some sense. Stochastic programming models are similar in style but take advantage of the fact that probability distributions governing the data are known or can be estimated. The goal here is to find some policy that is feasible for all (or almost all) the possible data instances and maximizes the expectation of some function of the decisions and the random variables. More generally, such models are formulated, solved analytically or numerically, and analyzed in order to provide useful information to a decisionmaker.^{[1]}
As an example, consider twostage linear programs. Here the decision maker takes some action in the first stage, after which a random event occurs affecting the outcome of the firststage decision. A recourse decision can then be made in the second stage that compensates for any bad effects that might have been experienced as a result of the firststage decision. The optimal policy from such a model is a single firststage policy and a collection of recourse decisions (a decision rule) defining which secondstage action should be taken in response to each random outcome.
Stochastic programming has applications in a broad range of areas ranging from finance to transportation to energy optimization.^{[2]}^{[3]} We present an example of optimizing an investment portfolio over time.
TwoStage Problems
The basic idea of twostage stochastic programming is that (optimal) decisions should be based on data available at the time the decisions are made and should not depend on future observations. Twostage stochastic programming formulation is widely used formulation in stochastic programming. The general formulation of a twostage stochastic programming problem is given by:
\min_{x\in X}\{ g(x)= f(x) + E[Q(x,\xi)]\}
where Q(x,\xi) is the optimal value of the secondstage problem
\min_{y}\{ q(y,\xi) T(\xi)x+W(\xi) y = h(\xi)\}
The classical twostage linear stochastic programming problems can be formulated as
\begin{array}{llrrr} \min\limits_{x\in \mathbb{R}^n} &g(x)= c^T x + E[Q(x,\xi)] & \\ \text{subject to} & Ax &=& b \\ & x &\geq& 0 \end{array}
where Q(x,\xi) is the optimal value of the secondstage problem
\begin{array}{lrrr} \min\limits_{q\in \mathbb{R}^m} & q(\xi)^T y & \\ \text{subject to} & T(\xi)x+W(\xi)y &=& h(\xi) \\ & y &\geq& 0 \end{array}
In such formulation x\in \mathbb{R}^n is the firststage decision variable vector, y\in \mathbb{R}^m is the secondstage decision variable vector, and \xi(q,T,W,h) contains the data of the secondstage problem. In this formulation, at the first stage we have to make a "hereandnow" decision x before the realization of the uncertain data \xi, viewed as a random vector, is known. At the second stage, after a realization of \xi becomes available, we optimize our behavior by solving an appropriate optimization problem.
At the first stage we optimize (minimize in the above formulation) the cost c^Tx of the firststage decision plus the expected cost of the (optimal) secondstage decision. We can view the secondstage problem simply as an optimization problem which describes our supposedly optimal behavior when the uncertain data is revealed, or we can consider its solution as a recourse action where the term Wy compensates for a possible inconsistency of the system Tx\leq h and q^Ty is the cost of this recourse action.
The considered twostage problem is linear because the objective functions and the constraints are linear. Conceptually this is not essential and one can consider more general twostage stochastic programs. For example, if the firststage problem is integer, one could add integrality constraints to the firststage problem so that the feasible set is discrete. Nonlinear objectives and constraints could also be incorporated if needed.^{[4]}
Distributional assumption
The formulation of the above twostage problem assumes that the secondstage data \xi can be modeled as a random vector with a known probability distribution (not just uncertain). This would be justified in many situations. For example \xi could be information derived from historical data and the distribution does not significantly change over the considered period of time. In such situations one may reliably estimate the required probability distribution and the optimization on average could be justified by the Law of Large Numbers. Another example is that \xi could be could be realizations of a simulation model whose outputs are stochastic. The empirical distribution of the sample could be used as an approximation to the true but unknown output distribution.
Discretization
To solve the twostage stochastic problem numerically, one often need to assume that the random vector \xi has a finite number of possible realizations, called scenarios, say \xi_1,\cdots,\xi_K, with respective probability masses p_1,\cdots,p_K. Then the expectation in the firststage problem's objective function can be written as the summation:
E[Q(x,\xi)]=\sum\limits_{k=1}^{K} p_kQ(x,\xi_k)
and, moreover, the twostage problem can be formulated as one large linear programming problem (this is called the deterministic equivalent of the original problem, see section .
When \xi has an infinite (or very large) number of possible realizations the standard approach is then to represent this distribution by scenarios. This approach raises three questions, namely:
 How to construct scenarios, see ;
 How to solve the deterministic equivalent. Commercial optimizers such as CPLEX and Gurobi can solve large linear/nonlinear problems. NEOS ^{[5]} server hosted at the Argonne National Laboratory allows free access to many modern solvers. The structure of a deterministic equivalent is particularly amenable to apply Benders' decomposition.;
 How to measure quality of the obtained solution with respect to the "true" optimum.
These questions are not independent. For example, the number of scenarios constructed will affect both the tractability of the deterministic equivalent and the quality of the obtained solutions.
Stochastic linear program
A stochastic linear program is a specific instance of the classical twostage stochastic program. A stochastic LP is built from a collection of multiperiod linear programs (LPs), each having the same structure but somewhat different data. The k^{th} twoperiod LP, representing the k^{th} scenario, may be regarded as having the following form:
\begin{array}{lccccccc} \text{Maximize} & f^T x & + & g^T y & + & h_k^Tz_k & & \\ \text{subject to} & Tx & + & Uy & & & = & r \\ & & & V_k y & + & W_kz_k & = & s_k \\ & x & , & y & , & z_k & \geq & 0 \\ \end{array}
The vectors x and y contain the firstperiod variables, whose values must be chosen immediately. The vector z_k contains all of the variables for subsequent periods. The constraints Tx + Uy = r involve only firstperiod variables and are the same in every scenario. The other constraints involve variables of later periods and differ in some respects from scenario to scenario, reflecting uncertainty about the future.
Note that solving the k^{th} twoperiod LP is equivalent to assuming the k^{th} scenario in the second period with no uncertainty. In order to incorporate uncertainties in the second stage, one should assign probabilities to different scenarios and solve the corresponding deterministic equivalent.
Deterministic equivalent of a stochastic problem
With a finite number of scenarios, twostage stochastic linear programs can be modelled as large linear programming problems. This formulation is often called the deterministic equivalent linear program, or abbreviated to deterministic equivalent. (Strictly speaking a deterministic equivalent is any mathematical program that can be used to compute the optimal firststage decision, so these will exist for continuous probability distributions as well, when one can repesent the secondstage cost in some closed form.) For example, to form the stochastic equivalent to the above stochastic linear program, we assign a probability p_k to each scenario k=1,\cdots,K. Then we can minimize the expected value of the objective, subject to the constraints from all scenarios:
\begin{array}{lccccccccccccc} \text{Maximize} & f^T x & + & g^T y & + & p_1h_1^Tz_1 & + & p_2h_2^Tz_2 & + & \cdots & + & p_Kh_K^Tz_K & & \\ \text{subject to} & Tx & + & Uy & & & & & & & & & = & r \\ & & & V_1 y & + & W_1z_1 & & & & & & & = & s_1 \\ & & & V_2 y & & & + & W_2z_2 & & & & & = & s_2 \\ & & & \vdots & & & & & & \ddots & & & & \vdots \\ & & & V_Ky & & & & & & & + & W_Kz_K & = & s_K \\ & x & , & y & , & z_1 & , & z_2 & , & \cdots & , & z_K & \geq & 0 \\ \end{array}
We have a different vector z_k of laterperiod variables for each scenario k. The firstperiod variables x and y are the same in every scenario, however, because we must make a decision for the first period before we know which scenario will be realized. As a result, the constraints involving just x and y need only be specified once, while the remaining constraints must be given separately for each scenario.
Scenario Construction
In practice it might be possible to construct scenarios by eliciting expert's opinions on the future. The number of constructed scenarios should be relatively modest so that the obtained deterministic equivalent can be solved with reasonable computational effort. It is often claimed that a solution that is optimal using only a few scenarios provides more adaptable plans than one that assumes a single scenario only. In some cases such a claim could be verified by a simulation. In theory some measures of guarantee that an obtained solution solves the original problem with reasonable accuracy. Typically in applications only the first stage optimal solution x^* has a practical value since almost always a "true" realization of the random data will be different from the set of constructed (generated) scenarios.
Suppose \xi contains d independent random components, each of which has three possible realizations (for example, future realizations of each random parameters are classified as low, medium and high), then the total number of scenarios is K=3^d. Such exponential growth of the number of scenarios makes model development using expert opinion very difficult even for reasonable size d. The situation becomes every worse if some random components of \xi have continuous distributions.
Monte Carlo sampling and Sample Average Approximation (SAA) Method
A common approach to reduce the scenario set to a manageable size is by using Monte Carlo simulation. Suppose the total number of scenarios is very large or even infinite. Suppose further that we can generate a sample \xi^1,\xi^2,\cdots,\xi^N of N replications of the random vector \xi. Usually the sample is assumed to be independent identically distributed (i.i.d sample). Given a sample, the expectation function q(x)=E[Q(x,\xi)] is approximated by the sample average
\hat{q}_N(x) = \frac{1}{N} \sum_{j=1}^N Q(x,\xi^j)
and consequently the firststage problem is given by
\begin{array}{rlrrr} \hat{g}_N(x)=&\min\limits_{x\in \mathbb{R}^n} & c^T x + \frac{1}{N} \sum_{j=1}^N Q(x,\xi^j) & \\ &\text{subject to} & Ax &=& b \\ & & x &\geq& 0 \end{array}
This formulation is known as the Sample Average Approximation method. The SAA problem is a function of the considered sample and in that sense is random. For a given sample \xi^1,\xi^2,\cdots,\xi^N the SAA problem is of the same form as a twostage stochastic linear programming problem with the scenarios \xi^j., j=1,\cdots,N, each taken with the same probability p_j=\frac{1}{N}.
Statistical Inference
Consider the following stochastic programming problem
\min\limits_{x\in X}\{ g(x) = f(x)+E[Q(x,\xi)] \}
Here X is a nonempty closed subset of \mathbb{R}^n, \xi is a random vector whose probability distribution P is supported on a set \Xi \subset \mathbb{R}^d, and Q: X \times \Xi \rightarrow \mathbb{R}. In the framework of twostage stochastic programming, Q(x,\xi) is given by the optimal value of the corresponding secondstage problem.
Assume that g(x) is well defined and finite valued for all x\in X. This implies that for every x\in X the value Q(x,\xi) is finite almost surely.
Suppose that we have a sample \xi^1,\cdots,\xi^N of Nrealizations of the random vector \xi. This random sample can be viewed as historical data of N observations of \xi, or it can be generated by Monte Carlo sampling techniques. Then we can formulate a corresponding sample average approximation
\min\limits_{x\in X}\{ \hat{g}_N(x) = f(x)+\frac{1}{N} \sum_{j=1}^N Q(x,\xi^j) \}
By the Law of Large Numbers we have that, under some regularity conditions \frac{1}{N} \sum_{j=1}^N Q(x,\xi^j) converges pointwise with probability 1 to E[Q(x,\xi)] as N \rightarrow \infty. Moreover, under mild additional conditions the convergence is uniform. We also have E[\hat{g}_N(x)]=g(x), i.e., \hat{g}_N(x) is an unbiased estimator of g(x). Therefore it is natural to expect that the optimal value and optimal solutions of the SAA problem converge to their counterparts of the true problem as N \rightarrow \infty.
Consistency of SAA estimators
Suppose the feasible set X of the SAA problem is fixed, i.e., it is independent of the sample. Let \vartheta^* and S^* be the optimal value and the set of optimal solutions, respectively, of the true problem and let \hat{\vartheta}_N and \hat{S}_N be the optimal value and the set of optimal solutions, respectively, of the SAA problem.
 Let g: X \rightarrow \mathbb{R} and \hat{g}_N: X \rightarrow \mathbb{R} be a sequence of (deterministic) real valued functions. The the following two properties are equivalent:
 for any \overline{x}\in X and any sequence \{x_N\}\subset X converging to \overline{x} it follows that \hat{g}_N(x_N) converges to g(\overline{x})
 the function f(\cdot) is continuous on X and \hat{g}_N(\cdot) converges to g(\cdot) uniformly on any compact subset of X
 If the objective of the SAA problem \hat{g}_N(x) converges to the true problem's objective g(x) with probability 1, as N \rightarrow \infty, uniformly on the feasible set X. Then \hat{\vartheta}_N converges to \vartheta^* with probability 1 as N \rightarrow \infty.
 Suppose that there exists a compact set C \subset \mathbb{R}^n such that
 the set S of optimal solutions of the true problem is nonempty and is contained in C
 the function g(x) is finite valued and continuous on C
 the sequence of functions \hat{g}_N(x) converges to g(x) with probability 1, as N \rightarrow \infty, uniformly in x\in C
 for N large enough the set \hat{S}_N is nonempty and \hat{S}_N \subset C with probability 1

 then \hat{\vartheta}_N \rightarrow \vartheta^* and \mathbb{D}(S^*,\hat{S}_N)\rightarrow 0 with probability 1 as N\rightarrow \infty . Note that \mathbb{D}(A,B) denotes the deviation of set A from set B, defined as














 \mathbb{D}(A,B) := \sup_{x\in A} \{ \inf_{x' \in B} \xx'\ \}
In some situations the feasible set X of the SAA problem is estimated, then the corresponding SAA problem takes the form
\min_{x\in X_N} \hat{g}_N(x)
where X_N is a subset of \mathbb{R}^n depending on the sample and therefore is random. Nevertheless consistency results for SAA estimators can still be derived under some additional assumptions:
 Suppose that there exists a compact set C \subset \mathbb{R}^n such that
 the set S of optimal solutions of the true problem is nonempty and is contained in C
 the function g(x) is finite valued and continuous on C
 the sequence of functions \hat{g}_N(x) converges to g(x) with probability 1, as N \rightarrow \infty, uniformly in x\in C
 for N large enough the set \hat{S}_N is nonempty and \hat{S}_N \subset C with probability 1
 if x_N \in X_N and x_N converges with probability 1 to a point x, then x \in X
 for some point x \in S^* there exists a sequence x_N \in X_N such that x_N \rightarrow x with probability 1.

 then \hat{\vartheta}_N \rightarrow \vartheta^* and \mathbb{D}(S^*,\hat{S}_N)\rightarrow 0 with probability 1 as N\rightarrow \infty .
Asymptotics of the SAA optimal value
Suppose the sample \xi^1,\cdots,\xi^N is i.i.d. and fix a point x \in X. Then the sample average estimator \hat{g}_N(x), of g(x), is unbiased and have variance \frac{1}{N}\sigma^2(x), where \sigma^2(x):=Var[Q(x,\xi)] is supposed to be finite. Moreover, by the central limit theorem we have that
 \sqrt{N} [\hat{g}_N g(x)] \xrightarrow{\mathcal{D}} Y_x
where \xrightarrow{\mathcal{D}} denotes convergence in distribution and Y_x has a normal distribution with mean 0 and variance \sigma^2(x), written as \mathcal{N}(0,\sigma^2(0)).
In other words, \hat{g}_N(x) has asymptotically normal distribution, i.e., for large N, \hat{g}_N(x) has approximately normal distribution with mean g(x) and variance \frac{1}{N}\sigma^2(x). This leads to the following (approximate) 100(1\alpha)% confidence interval for f(x):

 \left[ \hat{g}_N(x)z_{\alpha/2} \frac{\hat{\sigma}(x)}{\sqrt{N}}, \hat{g}_N(x)+z_{\alpha/2} \frac{\hat{\sigma}(x)}{\sqrt{N}}\right]
where z_{\alpha/2}:=\Phi^{1}(1\alpha/2) (here \Phi(\cdot) denotes the cdf of the standard normal distribution) and

 \hat{\sigma}^2(x) := \frac{1}{N1}\sum_{j=1}^{N} \left[ Q(x,\xi^j)\frac{1}{N} \sum_{j=1}^N Q(x,\xi^j) \right]^2
is the sample variance estimate of \sigma^2(x). That is, the error of estimation of g(x) is (stochastically) of order O(\sqrt{N}).
Multistage Portfolio Optimization
We now present an example from finance of multistage stochastic programming. Suppose that at time t=0 we have initial capital W_0 to invest in n assets. Suppose further that we are allowed to rebalance our portfolio at times t=1,\cdots,T1 but without injecting additional cash into it. At each period t we make a decision about redistributing the current wealth W_t among the n assets. Let x_0=(x_{10},\cdots,x_{n0}) be the initial amounts invested in the n assets. We require that each x_{i0} is nonnegative and that the balance equation \sum_{i=1}^{n}x_{i0}=W_0 should hold.
Consider the total returns \xi_t=(\xi_{1t},\cdots,\xi_{nt} for each period t=1,\cdots,T. This forms a vectorvalued random process \xi_1,\cdots,\xi_T. At time period t=1, we can rebalance the portfolio by specifying the amounts x_1=(x_{11},\cdots,x_{n1}) invested in the respective assets. At that time the returns in the first period is realized so it is reasonable to use this information in the rebalancing decision. Thus, the secondstage decisions, at time t=1, are actually functions of realization of the random vector \xi_1, i.e., x_1=x_1(\xi_1). Similarly, at time t the decision x_t=(x_{1t},\cdots,x_{nt}) is a function x_t=x_t(\xi_{[t]}) of the available information given by \xi_{[t]}=(\xi_{1},\cdots,\xi_{t}) the history of the random process up to time t. A sequence of functions x_t=x_t(\xi_{[t]}), t=0,\cdots,T1, with x_0 being constant, defines an implementable policy of the decision process. It is said that such policy is feasible if it satisfies the model constraints with probability 1, i.e., the nonnegativity constraints x_{it}(\xi_{[t]})\geq 0, i=1,\cdots,n, t=0,\cdots,T1, and the balance of wealth constraints,
\sum_{i=1}^{n}x_{it}(\xi_{[t]}) = W_t,
where in period t=1,\cdots,T the wealth W_t is given by
W_t = \sum_{i=1}^{n}\xi_{it} x_{i,t1}(\xi_{[t1]}),
which depends on the realization of the random process and the decisions up to time t.
Suppose the objective is to maximize the expected utility of this wealth at the last period, that is, to consider the problem
\max E[U(W_T)].
This is a multistage stochastic programming problem, where stages are numbered from t=0 to t=T1. Optimization is performed over all implementable and feasible policies. To complete the problem description one also need to define the probability distribution of the random process \xi_1,\cdots,\xi_T. This can be done in various ways. For example, one can construct a particular scenario tree defining time evolution of the process. If at every stage the random return of each asset is allowed to have two continuations, independent of other assets, then the total number of scenarios is 2^{nT}.
In order to write dynamic programming equations, one can consider the above multistage problem backward in time. At the last stage t=T1, a realization \xi_{[T1]}=(\xi_{1},\cdots,\xi_{T1}) of the random process is known and x_{T2} has been chosen. Therefore, one need to solve the following problem
\begin{array}{lrclr} \min\limits_{x_{T1}} & E[U(W_T)\xi_{[T1]}] & \\ \text{subject to} & W_T &=& \sum_{i=1}^{n}\xi_{iT}x_{i,T1} \\ &\sum_{i=1}^{n}x_{i,T1}&=&W_{T1}\\ & x_{T1} &\geq& 0 \end{array}
where E[U(W_T)\xi_{[T1]}] denotes the conditional expectation of U(W_T) given \xi_{[T1]}. The optimal value of the above problem depends on W_{T1} and \xi_{[T1]} and is denoted Q_{T1}(W_{T1},\xi_{[T1]}).
Similarly, at stages t=T2,\cdots,1, one should solve the problem
\begin{array}{lrclr} \min\limits_{x_{t}} & E[Q_{t+1}(W_{t+1},\xi_{[t+1]})\xi_{[t]}] & \\ \text{subject to} & W_{t+1} &=& \sum_{i=1}^{n}\xi_{i,t+1}x_{i,t} \\ &\sum_{i=1}^{n}x_{i,t}&=&W_{t}\\ & x_{t} &\geq& 0 \end{array}
whose optimal value is denoted by Q_{t}(W_{t},\xi_{[t]}). Finally, at stage t=0, one solves the problem
\begin{array}{lrclr} \min\limits_{x_{0}} & E[Q_{1}(W_{1},\xi_{[1]})] & \\ \text{subject to} & W_{1} &=& \sum_{i=1}^{n}\xi_{i,1}x_{i0} \\ &\sum_{i=1}^{n}x_{i0}&=&W_{0}\\ & x_{0} &\geq& 0 \end{array}
Stagewise independent random process
For a general distribution of the process \xi_t, it may be hard to solve these dynamic programming equations. The situation simplifies dramatically if the process \xi_t is stagewise independent, i.e., \xi_t is (stochastically) independent of \xi_1,\cdots,\xi_{t1} for t=2,\cdots,T. In this case, the corresponding conditional expectations become unconditional expectations, and the function Q_t(W_t), t=1,\cdots,T1 does not depend on \xi_{[t]}. That is, Q_{T1}(W_{T1}) is the optimal value of the problem
\begin{array}{lrclr} \min\limits_{x_{T1}} & E[U(W_T)] & \\ \text{subject to} & W_T &=& \sum_{i=1}^{n}\xi_{iT}x_{i,T1} \\ &\sum_{i=1}^{n}x_{i,T1}&=&W_{T1}\\ & x_{T1} &\geq& 0 \end{array}
and Q_t(W_t) is the optimal value of
\begin{array}{lrclr} \min\limits_{x_{t}} & E[Q_{t+1}(W_{t+1})] & \\ \text{subject to} & W_{t+1} &=& \sum_{i=1}^{n}\xi_{i,t+1}x_{i,t} \\ &\sum_{i=1}^{n}x_{i,t}&=&W_{t}\\ & x_{t} &\geq& 0 \end{array}
for t=T2,\cdots,1.
Biological applications
Stochastic dynamic programming is frequently used to model animal behaviour in such fields as behavioural ecology.^{[6]}^{[7]} Empirical tests of models of optimal foraging, lifehistory transitions such as fledging in birds and egg laying in parasitoid wasps have shown the value of this modelling technique in explaining the evolution of behavioural decision making. These models are typically many staged, rather than twostaged.
Economic applications
Stochastic dynamic programming is a useful tool in understanding decision making under uncertainty. The accumulation of capital stock under uncertainty is one example, often it is used by resource economists to analyze bioeconomic problems^{[8]} where the uncertainty enters in such as weather, etc..
Solvers

FortSP  solver for stochastic programming problems

FuncDesigner  free software that has commercial addon for stochastic programming and optimization
See also
References
Further reading
 John R. Birge and Fran ois V. Louveaux. Introduction to Stochastic Programming. Springer Verlag, New York, 1997.
 G. Ch. Pflug: Optimization of Stochastic Models. The Interface between Simulation and Optimization. Kluwer, Dordrecht, 1996.

Andras Prekopa. Stochastic Programming. Kluwer Academic Publishers, Dordrecht, 1995.
 Andrzej Ruszczynski and Alexander Shapiro (eds.) (2003) Stochastic Programming. Handbooks in Operations Research and Management Science, Vol. 10, Elsevier.
 Stein W. Wallace and William T. Ziemba (eds.) (2005) Applications of Stochastic Programming. MPSSIAM Book Series on Optimization 5
External links
ru:
