Lecture 5: Complexity Analysis (Program Efficiency Analysis)

Big-O notation

Given two algorithms that perform some function

     - e.g., two get(pos) implementations - which one is better?

key issues:  how much time,

    disk i/o,

    memory

        does a program use?

 

We will generally concentrate on how much time the code requires as a function of the problem size  

    e.g., if the code searches for a name,

the problem size is the number of names

 

Time required ~ # of basic operations performed

One basic operation: 

one arithmetic op  (e.g., a+b)

        one assignment (a=b)

        one boolean expression evaluation (e.g., a<3)

        one read

        one write of a primitive type

(machine-independent measure)

Some methods always perform the same number of ops

(on every call)

        e.g., ArrayList size() always does one operation: 

return numItems;

Some methods do different number of ops on different calls, depending on the value or size of some parameter or field

        this value/size is called the problem size or input size

          e.g., List method add(int pos,Object ob), or contains(ob)

                   in general, a longer list will take a longer time

the problem size = size of the list

complexity analysis goal: 

how does time change as problem size changes?

                not exact number of operations or exact time

instead: if problem size doubles, does the time qualitatively

·       stay the same

·       double

·       do something else

We usually care about the worst case

        e.g., List remove(pos)

·       Removes the item at position pos by moving all items to the right of pos one place to the left

·       Worst case is removing the first item; for list size N, remove(1) has time proportional to N

·       Time is linear in the size of the list

Constant time method: 

   when problem size doubles, worst case time stays the same

Linear time method: 

   when problem size doubles, worst case time doubles too

·       usually, the smallest growth rate is best choice

·       example trade-off: sorted list:  add slower, contains faster

You try:  work in pairs 

which List methods are constant, which are linear

with respect to the size of the List ?   why?

(REMEMBER:  always think of worst case)

add(pos, ob) - add at given position

isEmpty()

contains(ob) - contains a given object

get(pos) - get object at given position

remove(pos)

size()

Results:

        add (pos,ob):  worst case is linear

                                  (add at front ή move all items)

        isEmpty():  worst case is constant

        contains(ob):  worst case is linear (look at all items)

          get(pos):  constant time (return item at pos)

        remove(pos):  worst case is linear (when pos=1)

        size():  constant

add(ob) is tricky:

        constant time when array is not full

        when array is full, time to expand is linear in N

        but if we double the size of the list in expandArray,

                the increase is one copy operation per add

        (consider maintaining double-size in parallel with list)

constant and linear time are not the only possibilities

Another example:

while(!L.isEmpty()) {

        L.remove(1);

}

          what does this code do?    (removes all items)

        how many iterations will be performed?  (N)

        how much work on each iteration?

                N, N-1, …, 3, 2, 1

total time:  (N) + (N-1) + (N-2) + … + 3 + 2 + 1 = N(N+1)/2

        graph time in each call to remove with X's on graph paper

                first time is N, second is N-1, etc.

                total area is N2;  total sum is N2/2 + N/2

        how does the number of ops change as list size changes?

                             N               1       2      4       8       … 

  N(N+1)/2         1       3     10     36

          time is proportional to N2;  2΄problem size ή  4΄time

method is quadratic in the problem size

quadratic time method

        when problem size doubles, time quadruples

 

problem:  was anyone else born in November

        alg 1:  ask everyone at once; anyone in Jan says “yes”

        alg 2:  ask each person individually, stop if “yes”

        alg 3:  ask one person, if yes quit,

     if no ask that person to ask the next and report back

     if no, ask the 1st to ask the 2nd to ask the next person

etc.

        think about complexity of each

– check with person next to you

        5 volunteers act out the algorithms  -- O(1), O(N), O(N2)

 

Announcements:

·       P1 due today

·       H2 due Thursday

·       P2 due …

·       Java review session #2:  today, 4-5:15p, 1325CS

Topics:  strings, aliasing, parameter passing,

Boolean expressions

Big-O notation

Express complexity using big-O notation

For problem size = N,

constant time code is O(1)

        Linear time code is O(N)

        Quadratic code is O(N2)

       

Not O(3), O(N+1), O(N΄2), O(N2 + N)

        i.e., leave out the constants & the lower order terms

 

·       usually, the smallest growth rate is best choice

i.e., a method that is O(1) is better than one that is O(N) which is better than a method that is O(N2)

                                                             

N

O(N2)

O(NlogN)

O(N)

100

0.01

0.0049

0.0006

1,000

1.12

0.0584

0.0033

10,000

111.13

0.6863

0.0304

100,000

N/A

8.0113

0.2983

(measured running times)

 

 

 

Caveats:

·       often have a design trade-off:

e.g., sorted list:  add slower, contains faster

·       sometimes ignoring the constants is a bad idea

e.g., disk accesses are slow; 

2N log N  may be better than 1000 N  (if N < 104)

·       worst case analysis is overly pessimistic for an application where worst case is rare

 

 

Formal definition:  function T(N) is O(f(N)) if

        There exists a constant C and a value n0 such that

                For all values of N > n0:  T(N) £ c f(N)

 

·       T(N) is the exact complexity of the code

·       F(N) is an upper bound on the exact complexity – actual time will be no worse than f(N)

·       We want the smallest f(N) that makes the above true

 

Example:  L.add(1);

N     - copy N items  (or, 2N+1 if larger array is needed)

1       - add new item

1       - increment numItems

T(N) = N+2

Claim:  T(N) is O(N)

        want c and n0 such that N+2 £ c N

        does it work for c = 1?   (no)

        does it work for c = 2?   (not if n0=1, but yes if n0 = 2)

 

Use formal definition to check that your informal guess is correct