[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help] Control Theory
Cha`o chi. Thuc Anh,
Here are some ideas behind the AR, MA, ARMA and AIC stuffs. This area
of study is called time-series model. The book by Box and Jenkins (Time
Series Analysis: Forecasting and control) is the standrd reference in this
area. However, it is rather mathematical and could present difficulty for
beginners. You may find Chatfield's book more "friendly".
My summary here is taken from a lecture note I gave here many many
years ago and could be outdated now, but it should give you some basic ideas,
which I hope, are what you need. OK, let us start:
AR Model
--------
Consider a time-series data with m observations Y(t), t = 1, 2,
3, ..., m. The idea is to account for the correlation between adjacent
observations in such a series. For example, you can use knowledge of the
data at time period t-1 to predict the observation at time t, i.e
Y(t) = a.Y(t-1) + e(t) [1]
In [1], a is called the autoregressive coefficient, e(t) is the
random (white noise) term with mean 0 and constant variance of s^2.
This model is referred to as the first-order autoregressive model or
AR(1).
But Y(t) may not depend on Y(t-1), but also on Y(t-2) and so on.
The above model can then be extended to take into account previous
observations such as:
Y(t) = a.Y(t-1) + b.Y(t-2) + e(t) [2]
which is called the second-order AR model or AR(2). Similarly,
you can build a p-order AR model, AR(p) :
Y(t) = a.Y(t-1) + b.Y(t-2) + c.Y(t-3) + ... + d.Y(t-p) + e(t)
[3]
where a, b, c, d are autoregressive coefficients to be estimated from
observed data.
Lag Operatore
-------------
Before discussing more models in this topic, let us get some
notation right. Now define the lag operator as: L.Y(t) = Y(t-1), where
Y(t) and Y(t-1) are the elements of the series. If you apply L twice,
you get:
L^2.Y(t) = L(L.Y(t)) = L.Y(t-1) = Y(t-2)
get the idea? So, in general:
L^m.Y(t) = L.Y(t-m)
so the model [1] can be written as:
Y(t) = a.L.Y(t) + e
or (1- a.L).Y(t) = e(t)
and model [3] can be written as:
(1 - a.L - b.L^2 - c.L^3 - .... -d.L^p).Y(t) = e(t)
MA (moving average) Model
-------------------------
Assume that the time-series data has in fact an infinite AR
representation of the form:
Y(t) = -A.Y(t-1) - B.Y(t-2) - C.Y(t-3) - ....+ e(t) [4]
because [4] holds true for all integers t, and by assumption:
Y(t-1) = -A.Y(t-2) - B.Y(t-3) - ....+ e(t-1) [5]
Multiply [5] by a and substracting from [4], you get:
Y(t) = e(t) - A.e(t-1) = (1 - A.L).e(t) [6]
So, Y(t) is just a weighted sum of members of the white noise series,
is called a MOVING AVERAGE (MA) process. A p-order MA(p) can be wriiten
as:
Y(t) = (1 + A.L + B.L^2 + ... + D.L^p).e(t) [7]
The ARMA Model
--------------
Under certain conditions, the AR process have an MA
representation and vice versa. Your task is to find the most
parsimonious model to represent the data. Such a representation can
involve an AR and MA components such as:
Y(t) = a.Y(t-1) + e(t) + A.e(t-1) [8]
or (1 - a.L).Y(t) = (1 + A.L).e(t)
This is called the ARMA model of order (1,1) or ARMA(1,1). More
generally:
(1 - a.L - bL^2 - ...-d.L^p).Y(t) = (1 + A.L + B.L^2 + ...+D.L^q).e(t)
[9]
Which is called the ARMA(p, q) model or ARMA(p, q) process.
These days, there is even ARIMA model, which is not discussed here.
AIC
---
Consider a set of data point Y(t) , where t = 1,2,3, ...,m.
Suppose the you can represent it by two components: one has to do with
model and one has to do with random component, such as:
Observed = Model + Random
or Y(t) = F(Y(t-1), Y(t-2),...,Y(t-p)) + e
that is Y(t) is a function of Y(t-1), Y(t-2),...,Y(t-p) and the random
noise e.
Now, additing more parameters to the model will allow it to
"explain" (i.e. fit) the data better, i.e. smaller e. For example: the
model
Y(t) = a.Y(t-1) + b.Y(t-2) + e
would fit the data better than the model
Y(t) = a.Y(t-1) + e
But the issue how much better? In fact, the model that fits the data
better is not necessarily useful. What counts is the ability to predict
a wide range of data with a minimum number of parameters. For example,
if you spend $20 to buy a book which give you exactly the same amount
of information from another book with $30, then you surely would opt
for the $20-book one, would you? This is called the Occam Razor
principle of parsimosity. The AIC is a measure of how good a model fits
the data. AIC is equal to the difference between the Chi squared
statistic and twice the number of degrees of freedom (df) associated
with the ChiSq statistic, i.e
AIC = ChiSq - 2xdf
Remember ChiSq is a measure of deviation from the observed and
predicted value, i.e a function of e.
For a comparison of two models, a model with lower AIC is a more
parsimonious model.
I hope that these info are helpful to you. As I am pretty busy at
the moment and have no time to check textbooks on how to estimate these
parameters. However, you can find more details on these models in any
good textbooks on time-series analysis.
Good luck,
Tuan (Australia)