Lecture 5: Complexity Analysis (Program Efficiency Analysis)
Big-O
notation
Given
two algorithms that perform some function
- e.g.,
two get(pos) implementations - which one is better?
key issues: how much time,
disk i/o,
memory
does a program use?
We will generally concentrate on how much time
the code requires as a function of the problem
size
e.g., if
the code searches for a name,
the problem size is the number of names
Time required ~ # of basic operations
performed
One basic operation:
one arithmetic op (e.g., a+b)
one assignment (a=b)
one boolean expression evaluation
(e.g., a<3)
one read
one write of a primitive type
(machine-independent
measure)
Some methods always perform the same number
of ops
(on every call)
e.g.,
ArrayList size() always does
one operation:
return numItems;
Some
methods do different number of ops on different calls, depending on the value
or size of some parameter or field
this value/size is called the problem size or input size
e.g., List method add(int pos,Object ob), or contains(ob)
in
general, a longer list will take a longer time
the problem size = size of the list
complexity analysis goal:
how does time change as problem size changes?
not exact
number of operations or exact time
instead: if problem size doubles, does the time qualitatively
·
stay
the same
·
double
· do something else
We usually care about the worst case
e.g.,
List remove(pos)
· Removes the item at
position pos by moving all items to the right of pos one place to the left
· Worst case is removing the
first item; for list size N,
remove(1) has time proportional to N
· Time is linear in the size of the list
Constant time
method:
when problem size doubles, worst case time stays the same
Linear time
method:
when problem size doubles, worst case time doubles too
·
usually, the smallest growth rate is best choice
· example trade-off: sorted
list: add slower, contains faster
You try:
work in pairs
which List methods are
constant, which are linear
with respect to the size of the List ? why?
(REMEMBER: always think
of worst case)
add(pos, ob) - add at given position
isEmpty()
contains(ob) - contains a given object
get(pos) - get object at given
position
remove(pos)
size()
Results:
add (pos,ob): worst case is linear
(add at front ή move all items)
isEmpty(): worst case is constant
contains(ob): worst case is linear (look at all items)
get(pos): constant time (return item at pos)
remove(pos): worst case is linear (when pos=1)
size(): constant
add(ob) is tricky:
constant time when array is not full
when array is full, time to expand is linear in N
but if we double the size of the list in expandArray,
the increase is one copy operation per add
(consider maintaining double-size in parallel with list)
constant and linear
time are not the only possibilities
Another example:
while(!L.isEmpty())
{
L.remove(1);
}
what does this code do? (removes all
items)
how many iterations will be performed? (N)
how much work on each iteration?
N,
N-1,
, 3, 2, 1
total time: (N) +
(N-1) + (N-2) +
+
3 + 2 + 1 =
N(N+1)/2
graph time in each call to remove with X's on graph paper
first time is N, second is N-1, etc.
total area is N2;
total sum is N2/2 + N/2
how does the number of ops change as list size changes?
N 1 2
4 8
N(N+1)/2 1
3 10 36
time is proportional to N2; 2΄problem size ή 4΄time
method is quadratic in the problem size
quadratic time method
when problem size doubles, time quadruples
problem: was anyone else born in November
alg 1: ask everyone at once; anyone in Jan says
yes
alg 2: ask each person individually, stop if yes
alg 3: ask one person, if yes quit,
if no ask that
person to ask the next and report back
if no, ask the 1st to ask the 2nd to
ask the next person
etc.
think about complexity of each
check with person next to you
5 volunteers
act out the algorithms
-- O(1), O(N), O(N2)
Announcements:
· P1 due today
· H2 due Thursday
· P2 due
· Java review session
#2: today, 4-5:15p, 1325CS
Topics: strings, aliasing, parameter passing,
Boolean expressions
Big-O notation
Express complexity using big-O notation
For problem size = N,
constant time code is O(1)
Linear
time code is O(N)
Quadratic
code is O(N2)
Not O(3),
O(N+1), O(N΄2),
O(N2 + N)
i.e.,
leave out the constants & the lower order terms
·
usually, the smallest growth rate is best choice
i.e., a method that is O(1) is better than one that is O(N) which is better than a
method that is O(N2)
N |
O(N2) |
O(NlogN) |
O(N) |
100 |
0.01 |
0.0049 |
0.0006 |
1,000 |
1.12 |
0.0584 |
0.0033 |
10,000 |
111.13 |
0.6863 |
0.0304 |
100,000 |
N/A |
8.0113 |
0.2983 |
(measured running times)
Caveats:
· often have a design trade-off:
e.g., sorted list: add slower, contains faster
· sometimes ignoring the constants is
a bad idea
e.g.,
disk accesses are slow;
2N log N
may be better than 1000 N
(if N < 104)
· worst case analysis is
overly pessimistic for an application where worst case is rare
Formal definition: function T(N) is
O(f(N)) if
There
exists a constant C and a value n0 such that
For
all values of N > n0: T(N) £ c f(N)
· T(N) is the exact
complexity of the code
· F(N) is an upper bound on
the exact complexity actual time will be no worse than f(N)
· We want the smallest f(N)
that makes the above true
Example: L.add(1);
N - copy N items (or, 2N+1
if larger array is needed)
1 - add new item
1 - increment numItems
T(N) = N+2
Claim: T(N) is O(N)
want c and n0 such that N+2 £ c N
does it work for c =
1? (no)
does it work for c =
2? (not if n0=1, but yes if n0 = 2)
Use formal definition to check that your informal
guess is correct