Working with Datasets
Last updated
Was this helpful?
Last updated
Was this helpful?
Often, we would like to process data in MATLAB that comes from an external source and which is supplied in either .csv or .xls/.xlsx format. MATLABs importdata
, csvread
and xlsread
commands allow us to easily read and process data files.
importdata
or csvread
For the example below, please download the INDPRO series from https://fred.stlouisfed.org/series/INDPRO as a .csv file
The importdata
command has the following syntax.
importdata(file, delimiter, skiprows)
loads data fromfile
where the data columns are separated with thedelimiter
char and numeric data starts at lineskiprows+1
.
A quick look at the INDPRO.csv file reveals that the numeric data starts in line 2 and columns are separated by commas (,).
Therefore, we can use the following command to read the industrial production series.
The command creates a struct object called mydata
which contains two sub-objects, a matrix called data
containing all numerical data from the file (in this case the values of the series) and a cell-array called textdata
which contains the string from the file (here header names and dates). We can explore these objects in the workspace browser or by accessing them directly.
Alternatively we can also read in only the numeric data, by using the csvread
command. csvread
has the following syntax.
csvread(file, rowindex, columnindex)
reads numeric data fromfile
starting in rowrowindex
and columncolindex
where 0 is the first row/column.
If we only want to read the values of the INDPRO series (which starts in the second row and second column), we can use the following command.
xlsread
Reading excel files works best on Windows. If you are using macOS it is best practice to use .csv files if they are available or to open the .xls file in excel and to export it in a .csv format. Files in a .xlsx format usually tend to work.
For the example below, please download the UNRATE series from https://fred.stlouisfed.org/series/UNRATE as an .xls file and save it as an .xlsx file using excel.
xlsread
has the following syntax.
xlsread(file, sheetname, excelrange)
reads data fromfile
and worksheetsheet
where the data is in the blockexcelrange
(which is supplied in the typical excel format e.g. B2:F18).
In order to read the UNRATE series, we can use the following command.
Alternatively, we can read both the numerical data along with all the text in the worksheet. We do this by specifying two outputs from xlsread
.
This returns the unemployment rate in the vector unrate, and a cell that contains the text data where the dates corresponding to the unemployment rate start in column 1, row 12.
datetime
Often when importing time series data, dates or datetimes are supplied as strings in various formats. When using the importdata
command, dates are thus treated as textdata and imported as vectors of strings. In order to work with dates to e.g. plot a series against them, we need to convert the date strings to the datetime format which MATLAB understands.
The datetime
function can be used to convert an array of string dates of a specific format into an array of datetime values. The general syntax is as follows.
formatString
is a string that describes the date format of the string representation. Here are some examples of typically used date formats.
formatString
Example
'yyyy-MM-dd'
2018-09-27
'dd.MM.yyyy'
27.09.2018
'MM/dd/yyyy'
09/27/2018
'MMMM d, yyyy'
September 27, 2018
Consider the csv import example above where we have imported the INDPRO.csv file. The first column of the mydata.textdata
array contains string representations of the dates of the time series starting in line 2. We can convert these dates into the datetime format in the following way.
These converted dates are useful e.g. when plotting the INDPRO series. Even though we will cover this more in detail in the next section, try out the following command to show a time series plot of the INDPRO series data.
We will see the following plot.
Similarly, we can make a time series plot of the unemployment rate by first converting the dates into the datetime format
Then we can plot the unemployment rate using
Which gives us
.mat
fileThe easiest way to export/save data in MATLAB is to export them as MATLAB datafile (which has a .mat
extension).
As you have seen before in this course, .mat
files can easily be loaded using the load
command.
To save data from your MATLAB workspace into a .mat
file you can use the save
command. It has the following syntax.
save(filename)
stores all variables in the current workspace into a file calledfilename.mat
.save(filename, variables)
stores the variables whose name is in the string listvariables
into a file calledfilename.mat
.
Here are two examples.
Saving data as .mat
files is the most efficient solution when you would like to export the data to use them in a different MATLAB script or to access them later again with MATLAB. The downside is that .mat files are only readable by MATLAB. If you would like to export data in a format such that they can be imported into other programs, please read the next section.
.csv
fileSometimes we would like to export our data in a format such that it can be imported into other programs which are not capable of reading .mat
files.
If you would like to export numeric data that can be represented in a large matrix, MATLABs csvwrite
function is a good option to write your data into a .csv
file. It has the following syntax.
csvwrite('filename.csv', matrix)
writes matrixmatrix
into a comma-separated values (.csv) file that is namedfilename.csv
.
Here is an example.
Note that .csv files which are exported this way do not contain column headers.
.xls
fileAlternatively, we can export our data to a .xls
or .xlsx
file using the xlswrite
function. In general, xlswrite
behaves just like csvwrite
.
Furthermore, we can use xlswrite
to write excel file that contain column headers by first converting the output matix into a cell. As an example, lets read the data from UNRATE.xls, convert the dates to datetime, plot the time series and then save the time series to an excel file with a column for the dates and a heading for the dates and the unemployment rate.