This section provides some exercises that are meant to deepen your knowledge in the topics covered in this section and to gain experience solving real-world problems.
In this exercise you will compute the OLS estimator on a simulated data set using basic MATLAB commands. Please refer to the theory section below for the necessary formulas.
Import the simulated data from olsdata.m and compute the OLS estimator β^ using matrix expressions. Create a results matrix which stacks the estimated parameters and the values supplied in the vector beta_true side by side. Are the estimated and the true values close?
Next, read up on MATLABs regress function on the . Estimate the OLS coefficient using this function and compare the results to the ones you computed manually.
Create a new folder for this exercise and copy the olsdata.mat file into it
In MATLAB, create a new script and save it into the same folder
Start your script with the following commands
clear all; close all; clc;
load olsdata.mat
Check the data has been loaded into your workspace. You should see a matrix X, a vector y as well as a vector beta_true which contains the true values of β that were used to generate the data
Add a line which computes the OLS estimator and saves it into a new variable beta_hat
beta_hat = ...formula goes here...
Note that you can code up the formula in two ways. Either by computing (X′X)−1 separately from X′y and then multiplying them or by directly writing the formula into one line.
% Compute OLS estimator
clear all; close all; clc;
load olsdata
% Compute OLS estimator
beta_hat = (X'*X)\(X'*y);
% Compare to true value
[beta_hat, beta_true]
% Extension: Compare to MATLABs regress function
beta_regress = regress(y,X);
[beta_hat, beta_regress]
Theoretical Background
Let y be a N×1 vector of data on the dependent variable and let X be a N×K matrix with data on the regressors where the first column is a vector of ones.
The OLS estimator of the regression coefficients is defined as β^=(X′X)−1(X′y).
2. Computing the log-likelihood of a logit model
In this exercise you will compute the log-likelihood of a logit model on a simulated data set using basic MATLAB commands. Please refer to the theory section below for the necessary formulas.
Import the simulated data from logitdata.m and calculate the value of the log-likelihood for different values of the parameter vector β using matrix expressions.
Approximately, for which value of β is the log-likelihood maximal?
Create a new folder for this exercise and copy the logitdata.mat file into it
In MATLAB, create a new script and save it into the same folder
Start your script with the following commands
clear all; close all; clc;
load logitdata.m
Check the data has been loaded into your workspace. You should see a matrix X, and a vector y
Fix a value for β by setting it to e.g. 0.5
beta = 0.5
Create a variable Land assign it the value of the formula for the likelihood from the theory section
L = ...formula goes here...
Hint: When coding up the formula, write it as a function of a vector. Start with the inner parts of the formula i.e. first think about how xiβ looks like for different i and how you can write it as a vector. Then think about what applying functions like exp() and ln (which in MATLAB is the log function) does to this vector. Finally think about how to evaluate the sum. It might be easiest to split up the formula into two separate parts (the one starting with yi and the one starting with (1−yi) that you save in different variables, then evaluate the sum operator over the sum of these variables.
To check your log-likelihood implementation is working correctly, try out different values for β and compare the resulting log-likelihood values. The data was generated with β0=1.5 and so your code should give you the maximal log-likelihood value close to this point.
Here is a plot of some reference values for the function.
Theoretical Background
Consider the following discrete choice logit model with no constant and one regressor
yi=xiβ+εi
where all variables are scalars and yi is a binary variable (i.e. it has 0/1 values).
The log-likelihood of the data given a value for the parameter vector β is defined as
3. Estimating a factor model using Principal Components (Advanced)
In this exercise you will estimate a factor model on a simulated data set using basic MATLAB commands. Please refer to the theory section below for the necessary formulas.
Caution: By default, eig sorts the eigenvalues and corresponding vectors in ascending order of magnitude of the eigenvalues. Make sure you extract the r eigenvectors corresponding to the rlargest eigenvalues.
The estimation above is valid only under the assumption that the factors are orthogonal i.e. F′F/T=I. Use MATLABs scatter(x,y) command and verify graphically that the normalization holds for the estimated factors. In the scatter command, x should be the first factor (i.e. the first column of F^) and y should be the second factor.
Theoretical Background
We will use the following factor model
Xt=ΛFt+ut
where Xt is large N×1 vector of series which we would like to explain by a lower number of factors. Ft is a r×1 vector of factors and ut an N×1 vector of idiosyncratic shocks. Λ is a matrix of factor loadings of dimension N×r. T is the number of observations.
Under some normalizations, the r factors and their factor loadings Λ can be estimated by principal components using the following formulae.
F^=TEV(XX′)1:rΛ^=F^′X/T
where F is a T×r matrix and X is a T×N matrix. EV(A)1:r denotes the first r eigenvectors of the matrix A which correspond to the r largest eigenvalues.
Read up on MATLABs eig function on the . Use the eig function to estimate the matrix of factors F and loadings Λ for the following dataset for r=2.