Random Number Generation

Statistical distributions

Matlab uses 4 core random number generators to create pseudo-random numbers

  • rand - Uniform pseudo-random number generator on the interval (0,1)

  • randn - Standard Normal pseudo-random number generator

  • randg - Standard Gamma pseudo-random number generator

  • randi - Uniform integer pseudo-random number generator

From these four generators other distributions can be sampled from, e.g.,

a + (b-a)*rand % Sample from U(a,b)
m + s*randn % Sample from N(m,s^2)

To create a vector of samples from the standard normal distribution we write

n = 10    % number of samples
x = randn(n,1)

Generating simulated data sets

When we write our own estimation routines we use simulated data to check how well they perform. Say we want to examine the properties of the OLS estimator. We need to create a dataset that obeys

Y=Xβ+eY = X\beta + e

For simplicity, lets assume that all the entries of X are iid standard normal random variables. Then we proceed as follows,

simulate_ols.m
n = 50; % sample size
beta = [1,2,3]'; % we need to specify the true parameters
sigma = 1; % we need to specify the true parameters

X = [ones(n,1) randn(n,size(beta,1)-1)];  % the matrix of regressors
ep = sigma*randn(n,1); % the error term
Y = X*beta + ep; % the dependent variable

We now have a cross sectional dataset with which we can run experiments!

Simulating time series data requires a little more effort due to the dependent nature of the data. Say we want to simulate data from an AR(1) process, that is

Yt=ρ  Yt1+utY_t = \rho \; Y_{t-1} + u_t

We will have to use a "burn in" to get rid of any effects of starting values and to ensure stationarity of Y(t). If we assume that u(t) is iid N(0,1) we can proceed as follows

simulate_ar1.m
n = 500 % number of observations
rho = 0.5 % persistence parameter
burn_in = 100 % number of burn-in periods

Ytemp = zeros(n+burn_in,1); % initialise temp. Y vector

u = randn(n+burn_in,1); % draw error vector
for i = 2:n+burn_in
    Ytemp(i) = rho*Ytemp(i-1) + u(i); % simulate AR(1) model
end

Y = Ytemp(burn_in+1:end,1); % discard burn-in draws

Replicability of simulated data

A problem with working on simulated data arises when you need to replicate your results. Due to the "randomness" of the data generating process different results may appear each time the code is run.

However, pseudo-random numbers, such as those generated by Matlab, are created using a entirely deterministic algorithm given an initial value (try closing (if you have it open) and opening MATLAB and typing randn and write down the number you get. Now close MATLAB and open it again and type randn. You should get the same number again!). We refer to this initial value of the algorithm as the seed. By specifying the seed to be used at the beginning of our code we can ensure that the results that we report are the same as those we get from running our code.

One way of specifying the seed in Matlab is by using the function rng(). The following example shows how to use rng() to replicate a random vector and compares them graphically.

rng(42); % Set the seed
X1 = randn(20,1); % Draw sample data
rng(42); % Reset the seed
X2 = randn(20,1); % Draw sample data
plot(X1)
hold on
plot(X2)

In the example above, try changing the seed to different numbers and explore how the random samples change.

Last updated