[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help] Control Theory




Cha`o chi. Thuc Anh,

	Here are some ideas behind the AR, MA, ARMA and AIC stuffs. This area 
of study is called time-series model. The book by Box and Jenkins (Time 
Series Analysis: Forecasting and control) is the standrd reference in this 
area. However, it is rather mathematical and could present difficulty for 
beginners. You may find Chatfield's book more "friendly".

	My summary here is taken from a lecture note I gave here many many 
years ago and could be outdated now, but it should give you some basic ideas, 
which I hope, are what you need. OK, let us start:


AR Model
--------

	Consider a time-series data with m observations Y(t), t = 1, 2, 
3, ..., m. The idea is to account for the correlation between adjacent 
observations in such a series. For example, you can use knowledge of the 
data at time period t-1 to predict the observation at time t, i.e

		Y(t) = a.Y(t-1) + e(t)			[1]

	In [1], a is called the autoregressive coefficient, e(t) is the 
random (white noise) term with mean 0 and constant variance of s^2. 
This model is referred to as the first-order autoregressive model or 
AR(1).  

	But Y(t) may not depend on Y(t-1), but also on Y(t-2) and so on. 
The above model can then be extended to take into account previous 
observations such as:

		Y(t) = a.Y(t-1) + b.Y(t-2) + e(t)	[2]

	which is called the second-order AR model or AR(2). Similarly, 
you can build a p-order AR model, AR(p) :

	Y(t) = a.Y(t-1) + b.Y(t-2) + c.Y(t-3) + ... + d.Y(t-p) + e(t)

									[3]

where a, b, c, d are autoregressive coefficients to be estimated from 
observed data.


Lag Operatore
-------------

	Before discussing more models in this topic, let us get some 
notation right. Now define the lag operator as:  L.Y(t) = Y(t-1), where 
Y(t) and Y(t-1) are the elements of the series. If you apply L twice, 
you get:

			L^2.Y(t) = L(L.Y(t)) = L.Y(t-1) = Y(t-2)

get the idea? So, in general:

			L^m.Y(t) = L.Y(t-m) 


so the model [1] can be written as:	

			Y(t) = a.L.Y(t) + e

	or		(1- a.L).Y(t) = e(t)

and model [3] can be written as:

			(1 - a.L - b.L^2 - c.L^3 - .... -d.L^p).Y(t) = e(t)


MA (moving average) Model
-------------------------

	Assume that the time-series data has in fact an infinite AR 
representation of the form:

		Y(t) = -A.Y(t-1) - B.Y(t-2) - C.Y(t-3) - ....+ e(t)	[4]

because [4] holds true for all integers t, and by assumption:

		Y(t-1) = -A.Y(t-2) - B.Y(t-3) - ....+ e(t-1)		[5]
	
Multiply [5] by a and substracting from [4], you get:

		Y(t) = e(t) - A.e(t-1) = (1 - A.L).e(t)			[6]

So, Y(t) is just a weighted sum of members of the white noise series, 
is called a MOVING AVERAGE (MA) process. A p-order MA(p) can be wriiten 
as:

		Y(t) = (1 + A.L + B.L^2 + ... + D.L^p).e(t)		[7]


The ARMA Model
-------------- 

	Under certain conditions, the AR process have an MA 
representation and vice versa. Your task is to find the most 
parsimonious model to represent the data. Such a representation can 
involve an AR and MA components such as:

	Y(t) = a.Y(t-1) + e(t) + A.e(t-1)			[8]

or	(1 - a.L).Y(t) = (1 + A.L).e(t)

This is called the ARMA model of order (1,1) or ARMA(1,1). More 
generally:

(1 - a.L - bL^2 - ...-d.L^p).Y(t) = (1 + A.L + B.L^2 + ...+D.L^q).e(t)
	
									[9]

Which is called the ARMA(p, q) model or ARMA(p, q) process.

        These days, there is even ARIMA model, which is not discussed here.


AIC
---

	Consider a set of data point Y(t) , where t = 1,2,3, ...,m. 
Suppose the you can represent it by two components: one has to do with 
model and one has to do with random component, such as:


		Observed = Model + Random

or		Y(t) = F(Y(t-1), Y(t-2),...,Y(t-p)) + e
 
that is Y(t) is a function of Y(t-1), Y(t-2),...,Y(t-p) and the random 
noise e. 

	Now, additing more parameters to the model will allow it to 
"explain" (i.e. fit) the data better, i.e. smaller e. For example: the 
model

	Y(t) = a.Y(t-1) + b.Y(t-2) + e

would fit the data better than the model

	Y(t) = a.Y(t-1) + e

But the issue how much better? In fact, the model that fits the data 
better is not necessarily useful. What counts is the ability to predict 
a wide range of data with a minimum number of parameters. For example, 
if you spend $20 to buy a book which give you exactly the same amount 
of information from another book with $30, then you surely would opt 
for the $20-book one, would you? This is called the Occam Razor 
principle of parsimosity. The AIC is a measure of how good a model fits 
the data. AIC is equal to the difference between the Chi squared 
statistic and twice the number of degrees of freedom (df) associated 
with the ChiSq statistic, i.e

		AIC = ChiSq - 2xdf

	Remember ChiSq is a measure of deviation from the observed and 
predicted value, i.e a function of e.

	For a comparison of two models, a model with lower AIC is a more 
parsimonious model.

	I hope that these info are helpful to you. As I am pretty busy at 
the moment and have no time to check textbooks on how to estimate these 
parameters. However, you can find more details on these models in any 
good textbooks on time-series analysis.

	Good luck,


	Tuan (Australia)