Nicola Lo Russo

Personal website

How to price options with Monte Carlo methods

I first implemented Monte Carlo methods during a computational finance course at ETH. We were using randomness to solve problems that had exact, deterministic answers. Today, Monte Carlo methods are everywhere: physicists use them to model particle interactions, financial engineers use them to price complex derivatives, statisticians use them for Bayesian inference.

The name comes from the Monte Carlo Casino in Monaco. Stanislaw Ulam and John von Neumann chose it during their work on the Manhattan Project in the 1940s, when they needed to simulate neutron diffusion in fissile materials. Random sampling provided a practical solution to otherwise intractable integrals.

I am writing this post to keep these concepts fresh in my mind and to consolidate my understanding of them. In any case, I will start with the measure-theoretic foundations (probability spaces, the Law of Large Numbers, the Central Limit Theorem), then move to option pricing as a concrete application, and finally discuss variance reduction techniques.

Measure-theoretic foundations

Before I can justify why Monte Carlo methods work, I need to establish the probability theory that underlies them. You can use these methods without understanding this theory (many practitioners do), but having a solid grasp of the mathematics helps you understand when the methods work well, when they might fail, and how to improve them.

Probability spaces

Every probabilistic model begins with a probability space. This is a triple \((\Omega, \mathcal{F}, P)\) where \(\Omega\) is the sample space (the set of all possible outcomes), \(\mathcal{F}\) is a σ-algebra on \(\Omega\) (a collection of subsets that we can assign probabilities to), and \(P\) is a probability measure.

The σ-algebra \(\mathcal{F}\) must satisfy three properties: it contains the empty set, it is closed under complementation, and it is closed under countable unions. These requirements are there to ensure that we can perform all the usual probability operations without running into paradoxes.

A probability measure \(P\) must satisfy \(P(\Omega) = 1\) and must be countably additive. This means that if I have a sequence of disjoint sets \(A_1, A_2, A_3, \ldots\) in \(\mathcal{F}\), then \(P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)\).

Random variables and integration

A random variable is a measurable function \(X: \Omega \to \mathbb{R}\). Measurable here means that for every Borel set \(B\) in \(\mathbb{R}\), the preimage \(X^{-1}(B) = \{\omega \in \Omega : X(\omega) \in B\}\) belongs to \(\mathcal{F}\). This condition is what allows me to talk about probabilities like \(P(X \in B)\).

The expected value of a random variable \(X\) is defined as the integral \(\mathbb{E}[X] = \int_{\Omega} X(\omega) \, dP(\omega)\). This generalizes the familiar sum-over-outcomes formula from discrete probability. When this integral exists and is finite, we say that \(X\) is integrable.

The key insight of Monte Carlo methods is the following: I can approximate this integral by sampling. If I draw independent samples \(X_1, X_2, \ldots, X_n\) from the distribution of \(X\), then the sample mean \(\frac{1}{n}\sum_{i=1}^{n} X_i\) approximates \(\mathbb{E}[X]\), and this approximation gets better as \(n\) increases.

The law of large numbers

The mathematical guarantee that sample means converge to expected values comes from the Law of Large Numbers. The strong law gives a particularly strong form of convergence.

The Law of Large Numbers: as the number of samples n increases, the sample mean converges to the true mean μ.

Theorem (Strong Law of Large Numbers): Let \(X_1, X_2, X_3, \ldots\) be independent and identically distributed random variables with \(\mathbb{E}[|X_1|] < \infty\). Then \[ \frac{1}{n}\sum_{i=1}^{n} X_i \to \mathbb{E}[X_1] \quad \text{almost surely as } n \to \infty. \]

The phrase "almost surely" means that the set of outcomes \(\omega \in \Omega\) for which the convergence fails has probability zero. This is stronger than convergence in probability (which would allow for a small probability of large deviations even as \(n\) grows).

The proof requires tools from measure theory, in particular the Borel-Cantelli lemmas and Kolmogorov's maximal inequality. I will not reproduce the full proof here, but the key idea is to show that the probability of the sample mean deviating from the true mean by more than any fixed \(\epsilon > 0\) decreases so rapidly with \(n\) that these deviation events happen only finitely many times (with probability one).

The central limit theorem and error bounds

The Law of Large Numbers tells me that Monte Carlo estimates converge, but not how fast. This is where the Central Limit Theorem comes in.

The Central Limit Theorem: the sample mean is approximately normally distributed, with 95% of values falling within ±1.96σ/√n of the true mean.

Theorem (Central Limit Theorem): Let \(X_1, X_2, X_3, \ldots\) be i.i.d. random variables with \(\mathbb{E}[X_1] = \mu\) and \(\text{Var}(X_1) = \sigma^2 < \infty\). Then \[ \sqrt{n}\left(\frac{1}{n}\sum_{i=1}^{n} X_i - \mu\right) \overset{d}{\longrightarrow} N(0, \sigma^2) \quad \text{as } n \to \infty. \]

This theorem tells me that the error in my Monte Carlo estimate scales as \(O(1/\sqrt{n})\). This is both good and bad. Good because I get predictable error bounds (I can construct confidence intervals). Bad because halving my error requires quadrupling my sample size.

The Central Limit Theorem also allows me to construct confidence intervals. If I estimate \(\mu\) by \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^{n} X_i\) and I estimate \(\sigma^2\) by the sample variance \(S_n^2\), then an approximate 95% confidence interval for \(\mu\) is \[ \bar{X}_n \pm 1.96 \cdot \frac{S_n}{\sqrt{n}}. \] You will see this formula in the interactive playgrounds below.

Monte Carlo integration

Now I can describe the general Monte Carlo integration problem. Suppose I want to evaluate an integral \[ I = \int_D f(x) \, dx, \] where \(D \subseteq \mathbb{R}^d\) is some domain and \(f: D \to \mathbb{R}\) is an integrable function. If I can sample points \(X_1, X_2, \ldots, X_n\) uniformly from \(D\), then I can write \[ I = \text{Vol}(D) \cdot \mathbb{E}[f(X)] \quad \text{where } X \sim \text{Uniform}(D), \] and therefore the Monte Carlo estimator is \[ \hat{I}_n = \frac{\text{Vol}(D)}{n} \sum_{i=1}^{n} f(X_i). \] By the Law of Large Numbers, \(\hat{I}_n \to I\) almost surely.

The real power of Monte Carlo integration becomes apparent when \(d\) is large. Traditional deterministic quadrature methods (like Simpson's rule or Gaussian quadrature) typically require \(O(n^d)\) function evaluations to achieve error \(O(n^{-2/d})\) in dimension \(d\). This becomes prohibitively expensive as \(d\) grows. Monte Carlo, by contrast, has error \(O(1/\sqrt{n})\) regardless of dimension. This makes it the method of choice for high-dimensional integration problems.

Application: option pricing

Financial derivatives provide one of the most important applications of Monte Carlo methods. I will focus on pricing European call options. There exists a closed-form solution (the Black-Scholes formula), but working through the Monte Carlo approach illustrates the general technique and sets the stage for more complex derivatives that do not have analytical solutions.

The Black-Scholes model

The Black-Scholes model assumes that the price \(S_t\) of a stock follows a geometric Brownian motion described by the stochastic differential equation \[ dS_t = \mu S_t \, dt + \sigma S_t \, dW_t, \] where \(\mu\) is the drift rate, \(\sigma\) is the volatility, and \(W_t\) is a standard Brownian motion. This equation captures the idea that stock prices have both a deterministic trend component (\(\mu S_t \, dt\)) and a random noise component (\(\sigma S_t \, dW_t\)) that scales with the current price.

Using Itô's lemma (a generalization of the chain rule to stochastic processes), we can solve this SDE to obtain \[ S_T = S_0 \exp\left(\left(\mu - \frac{\sigma^2}{2}\right)T + \sigma W_T\right), \] where \(S_0\) is the initial stock price, \(T\) is the time to maturity, and \(W_T \sim N(0, T)\). The term \(-\sigma^2/2\) comes from the Itô correction and ensures that \(\mathbb{E}[S_T] = S_0 e^{\mu T}\).

Risk-neutral pricing

One of the key ideas in mathematical finance is risk-neutral pricing. The fundamental theorem of asset pricing tells us that, in a complete market with no arbitrage, there exists a unique risk-neutral measure \(\mathbb{Q}\) under which discounted asset prices are martingales. This allows me to price derivatives without knowing the actual drift rate \(\mu\) of the stock.

In the Black-Scholes setting, this means I can replace the drift \(\mu\) with the risk-free rate \(r\) when pricing derivatives. A European call option with strike price \(K\) and maturity \(T\) has payoff \(\max(S_T - K, 0)\) at expiration, and its price today is \[ C_0 = e^{-rT} \mathbb{E}^{\mathbb{Q}}[\max(S_T - K, 0)], \] where the expectation is taken with respect to \[ S_T = S_0 \exp\left(\left(r - \frac{\sigma^2}{2}\right)T + \sigma\sqrt{T} Z\right), \] with \(Z \sim N(0, 1)\).

Monte Carlo implementation

The Monte Carlo approach is straightforward. I generate \(n\) independent samples \(Z_1, Z_2, \ldots, Z_n\) from \(N(0,1)\), compute the corresponding terminal stock prices \[ S_T^{(i)} = S_0 \exp\left(\left(r - \frac{\sigma^2}{2}\right)T + \sigma\sqrt{T} Z_i\right), \] calculate the payoffs \(V_i = \max(S_T^{(i)} - K, 0)\), and estimate the option price as \[ \hat{C}_n = e^{-rT} \cdot \frac{1}{n}\sum_{i=1}^{n} V_i. \] The standard error of this estimate is \(\text{SE} = e^{-rT} \cdot S_n / \sqrt{n}\), where \(S_n\) is the sample standard deviation of the payoffs.

For comparison, the Black-Scholes formula gives the exact price as \[ C_0 = S_0 \Phi(d_1) - K e^{-rT} \Phi(d_2), \] where \(\Phi\) is the cumulative distribution function of the standard normal distribution and \[ d_1 = \frac{\ln(S_0/K) + (r + \sigma^2/2)T}{\sigma\sqrt{T}}, \quad d_2 = d_1 - \sigma\sqrt{T}. \] In the interactive playground below, you can see how the Monte Carlo estimate converges to this analytical value.

European call option pricing

Adjust the parameters below and click "Run Simulation" to see Monte Carlo option pricing in action.

Variance reduction techniques

The Monte Carlo method I described above works, but it can be inefficient for high-accuracy requirements because of the \(O(1/\sqrt{n})\) error scaling. If I want to reduce my error by a factor of 10, I need to increase my sample size by a factor of 100. For complex derivatives or real-time trading applications, this can be prohibitive.

Variance reduction techniques offer a way to improve efficiency without simply throwing more samples at the problem. The idea is simple (surprisingly so): reduce the variance \(\sigma^2\) in the Central Limit Theorem error bound instead of increasing \(n\).

Antithetic variates

The antithetic variates technique exploits symmetry to induce negative correlation between paired samples. If I want to estimate the mean of a symmetric distribution, I can get a more precise estimate by ensuring that my samples are balanced across the distribution rather than allowing them to cluster by chance on one side.

In option pricing, I generate a standard normal sample \(Z \sim N(0, 1)\) and then use both \(Z\) and \(-Z\) to create a pair of terminal stock prices. Since the standard normal distribution is symmetric around zero, \(Z\) and \(-Z\) have the same marginal distribution but are perfectly negatively correlated. If I compute payoffs \(V_1 = g(Z)\) and \(V_2 = g(-Z)\), then the variance of their average is \[ \text{Var}\left(\frac{V_1 + V_2}{2}\right) = \frac{1}{4}\left(\text{Var}(V_1) + \text{Var}(V_2) + 2\text{Cov}(V_1, V_2)\right). \] Since \(\text{Var}(V_1) = \text{Var}(V_2)\) by symmetry, this becomes \[ \text{Var}\left(\frac{V_1 + V_2}{2}\right) = \frac{1}{2}\text{Var}(V_1)\left(1 + \rho_{V_1, V_2}\right), \] where \(\rho_{V_1, V_2}\) is the correlation coefficient between \(V_1\) and \(V_2\).

For monotone functions (and the call option payoff is monotonically increasing in \(Z\)), we have \(\rho_{V_1, V_2} < 0\), which means the variance is reduced compared to standard Monte Carlo. The reduction factor can be 50% or more, which is equivalent to doubling the effective sample size without any additional function evaluations.

Control variates

Control variates use auxiliary information to reduce variance. Suppose I want to estimate \(\mathbb{E}[X]\) but I also have access to another random variable \(Y\) whose expectation \(\mathbb{E}[Y]\) I know exactly. If \(X\) and \(Y\) are correlated, I can construct an improved estimator \[ \hat{X}_{\text{cv}} = \bar{X}_n - \beta(\bar{Y}_n - \mathbb{E}[Y]), \] where \(\bar{X}_n\) and \(\bar{Y}_n\) are sample means.

The term \(\beta(\bar{Y}_n - \mathbb{E}[Y])\) acts as a correction. If my sample mean \(\bar{Y}_n\) happens to be higher than \(\mathbb{E}[Y]\), this suggests that my random samples were biased toward high values, and I should correct \(\bar{X}_n\) downward (assuming \(\beta > 0\) and positive correlation).

The variance of this estimator is \[ \text{Var}(\hat{X}_{\text{cv}}) = \text{Var}(\bar{X}_n) + \beta^2 \text{Var}(\bar{Y}_n) - 2\beta \text{Cov}(\bar{X}_n, \bar{Y}_n). \] Taking the derivative with respect to \(\beta\) and setting it to zero, the optimal choice is \[ \beta^* = \frac{\text{Cov}(X, Y)}{\text{Var}(Y)}, \] and with this choice the variance becomes \[ \text{Var}(\hat{X}_{\text{cv}}) = \text{Var}(X) (1 - \rho_{X,Y}^2), \] where \(\rho_{X,Y}\) is the correlation coefficient. This means the variance reduction factor is \(\rho_{X,Y}^2\). If \(X\) and \(Y\) have correlation 0.9, I reduce the variance by 81% (which would require more than a 5-fold increase in sample size to achieve with standard Monte Carlo).

In the playground below, I use the stock price itself as a control variate (since we know its expected value under the risk-neutral measure).

Variance reduction comparison

Compare standard Monte Carlo, antithetic variates, and control variates. You can see how much faster the variance-reduced methods converge.

Computational considerations

A few practical notes on implementing Monte Carlo methods.

The quality of random number generation matters more than you might expect. Pseudorandom generators like the Mersenne Twister are usually sufficient, but there are cases where correlations in the sequence can bias Monte Carlo estimates (especially in high-dimensional problems). For critical applications, you might want quasi-random sequences like Sobol sequences, which are designed to fill high-dimensional spaces more evenly.

When should you use Monte Carlo instead of other numerical methods? The general rule is that Monte Carlo becomes more competitive as dimensionality increases. For low-dimensional integrals (dimension 1 to 3), traditional quadrature methods are usually faster. For high dimensions (more than 10), Monte Carlo is often the only practical choice because deterministic methods suffer from the curse of dimensionality while Monte Carlo's convergence rate remains \(O(1/\sqrt{n})\) regardless of dimension.

References

  1. Glasserman, P. (2003). Monte Carlo Methods in Financial Engineering. Springer. [link]
  2. Shreve, S. E. (2004). Stochastic Calculus for Finance II: Continuous-Time Models. Springer. [link]
  3. Billingsley, P. (1995). Probability and Measure (3rd ed.). Wiley. [link]
  4. Black, F., & Scholes, M. (1973). The Pricing of Options and Corporate Liabilities. Journal of Political Economy, 81(3), 637-654. [link]
  5. Metropolis, N., & Ulam, S. (1949). The Monte Carlo Method. Journal of the American Statistical Association, 44(247), 335-341. [link]