Random Number Generation
Statistical distributions
Matlab uses 4 core random number generators to create pseudo-random numbers
rand
- Uniform pseudo-random number generator on the interval (0,1)randn
- Standard Normal pseudo-random number generatorrandg
- Standard Gamma pseudo-random number generatorrandi
- Uniform integer pseudo-random number generator
From these four generators other distributions can be sampled from, e.g.,
a + (b-a)*rand % Sample from U(a,b)
m + s*randn % Sample from N(m,s^2)
To create a vector of samples from the standard normal distribution we write
n = 10 % number of samples
x = randn(n,1)
Generating simulated data sets
When we write our own estimation routines we use simulated data to check how well they perform. Say we want to examine the properties of the OLS estimator. We need to create a dataset that obeys
For simplicity, lets assume that all the entries of X are iid standard normal random variables. Then we proceed as follows,
n = 50; % sample size
beta = [1,2,3]'; % we need to specify the true parameters
sigma = 1; % we need to specify the true parameters
X = [ones(n,1) randn(n,size(beta,1)-1)]; % the matrix of regressors
ep = sigma*randn(n,1); % the error term
Y = X*beta + ep; % the dependent variable
We now have a cross sectional dataset with which we can run experiments!
Simulating time series data requires a little more effort due to the dependent nature of the data. Say we want to simulate data from an AR(1) process, that is
We will have to use a "burn in" to get rid of any effects of starting values and to ensure stationarity of Y(t). If we assume that u(t) is iid N(0,1) we can proceed as follows
n = 500 % number of observations
rho = 0.5 % persistence parameter
burn_in = 100 % number of burn-in periods
Ytemp = zeros(n+burn_in,1); % initialise temp. Y vector
u = randn(n+burn_in,1); % draw error vector
for i = 2:n+burn_in
Ytemp(i) = rho*Ytemp(i-1) + u(i); % simulate AR(1) model
end
Y = Ytemp(burn_in+1:end,1); % discard burn-in draws
Replicability of simulated data
A problem with working on simulated data arises when you need to replicate your results. Due to the "randomness" of the data generating process different results may appear each time the code is run.
However, pseudo-random numbers, such as those generated by Matlab, are created using a entirely deterministic algorithm given an initial value (try closing (if you have it open) and opening MATLAB and typing randn
and write down the number you get. Now close MATLAB and open it again and type randn
. You should get the same number again!). We refer to this initial value of the algorithm as the seed. By specifying the seed to be used at the beginning of our code we can ensure that the results that we report are the same as those we get from running our code.
One way of specifying the seed in Matlab is by using the function rng()
. The following example shows how to use rng()
to replicate a random vector and compares them graphically.
rng(42); % Set the seed
X1 = randn(20,1); % Draw sample data
rng(42); % Reset the seed
X2 = randn(20,1); % Draw sample data
plot(X1)
hold on
plot(X2)
Last updated
Was this helpful?