CS 367 – Notes for Lecture 1

Lecture 5: Complexity Analysis (Program Efficiency Analysis)

Big-O notation

Given two algorithms that perform some function

- e.g., two get(pos) implementations - which one is better?

key issues: how much time,

disk i/o,

memory

does a program use?

We will generally concentrate on how much time the code requires as a function of the problem size

e.g., if the code searches for a name,

the problem size is the number of names

Time required ~ # of basic operations performed

One basic operation:

one arithmetic op (e.g., a+b)

one assignment (a=b)

one boolean expression evaluation (e.g., a<3)

one read

one write of a primitive type

(machine-independent measure)

Some methods always perform the same number of ops

(on every call)

e.g., ArrayList size() always does one operation:

return numItems;

Some methods do different number of ops on different calls, depending on the value or size of some parameter or field

this value/size is called the problem size or input size

e.g., List method add(int pos,Object ob), or contains(ob)

in general, a longer list will take a longer time

the problem size = size of the list

complexity analysis goal:

how does time change as problem size changes?

not exact number of operations or exact time

instead: if problem size doubles, does the time qualitatively

· stay the same

· double

· do something else

We usually care about the worst case

e.g., List remove(pos)

· Removes the item at position pos by moving all items to the right of pos one place to the left

· Worst case is removing the first item; for list size N, remove(1) has time proportional to N

· Time is linear in the size of the list

Constant time method:

when problem size doubles, worst case time stays the same

Linear time method:

when problem size doubles, worst case time doubles too

· usually, the smallest growth rate is best choice

· example trade-off: sorted list: add slower, contains faster

You try: work in pairs

which List methods are constant, which are linear

with respect to the size of the List ? why?

(REMEMBER: always think of worst case)

add(pos, ob) - add at given position

isEmpty()

contains(ob) - contains a given object

get(pos) - get object at given position

remove(pos)

size()

Results:

add (pos,ob): worst case is linear

(add at front Þ move all items)

isEmpty(): worst case is constant

contains(ob): worst case is linear (look at all items)

get(pos): constant time (return item at pos)

remove(pos): worst case is linear (when pos=1)

size(): constant

add(ob) is tricky:

constant time when array is not full

when array is full, time to expand is linear in N

but if we double the size of the list in expandArray,

the increase is one copy operation per add

(consider maintaining double-size in parallel with list)

constant and linear time are not the only possibilities

Another example:

while(!L.isEmpty()) {

L.remove(1);

}

what does this code do? (removes all items)

how many iterations will be performed? (N)

how much work on each iteration?

N, N-1, …, 3, 2, 1

total time: (N) + (N-1) + (N-2) + … + 3 + 2 + 1 = N(N+1)/2

graph time in each call to remove with X's on graph paper

first time is N, second is N-1, etc.

total area is N²; total sum is N²/2 + N/2

how does the number of ops change as list size changes?

N 1 2 4 8 …

N(N+1)/2 1 3 10 36

time is proportional to N²; 2´problem size Þ 4´time

method is quadratic in the problem size

quadratic time method

when problem size doubles, time quadruples

problem: was anyone else born in November

alg 1: ask everyone at once; anyone in Jan says “yes”

alg 2: ask each person individually, stop if “yes”

alg 3: ask one person, if yes quit,

if no ask that person to ask the next and report back

if no, ask the 1^st to ask the 2^nd to ask the next person

etc.

think about complexity of each

– check with person next to you

5 volunteers act out the algorithms -- O(1), O(N), O(N²)

Announcements:

· P1 due today

· H2 due Thursday

· P2 due …

· Java review session #2: today, 4-5:15p, 1325CS

Topics: strings, aliasing, parameter passing,

Boolean expressions

Big-O notation

Express complexity using big-O notation

For problem size = N,

constant time code is O(1)

Linear time code is O(N)

Quadratic code is O(N²)

Not O(3), O(N+1), O(N´2), O(N² + N)

i.e., leave out the constants & the lower order terms

· usually, the smallest growth rate is best choice

i.e., a method that is O(1) is better than one that is O(N) which is better than a method that is O(N²)

N	O(N²)	O(NlogN)	O(N)
100	0.01	0.0049	0.0006
1,000	1.12	0.0584	0.0033
10,000	111.13	0.6863	0.0304
100,000	N/A	8.0113	0.2983

(measured running times)

Caveats:

· often have a design trade-off:

e.g., sorted list: add slower, contains faster

· sometimes ignoring the constants is a bad idea

e.g., disk accesses are slow;

2N log N may be better than 1000 N (if N < 10⁴)

· worst case analysis is overly pessimistic for an application where worst case is rare

Formal definition: function T(N) is O(f(N)) if

There exists a constant C and a value n₀ such that

For all values of N > n₀: T(N) £ c f(N)

· T(N) is the exact complexity of the code

· F(N) is an upper bound on the exact complexity – actual time will be no worse than f(N)

· We want the smallest f(N) that makes the above true

Example: L.add(1);

N - copy N items (or, 2N+1 if larger array is needed)

1 - add new item

1 - increment numItems

T(N) = N+2

Claim: T(N) is O(N)

want c and n₀ such that N+2 £ c N

does it work for c = 1? (no)

does it work for c = 2? (not if n₀=1, but yes if n₀ = 2)

Use formal definition to check that your informal guess is correct