Optimizer Costs
Variable Definitions
Variable Name | Description
|
---|
NTuples | Tuples
|
---|
NKeys | Number of Distinct Key
|
---|
IHeight | height of btree,
|
---|
ICost | # of pages required for hash look up a tuple
|
---|
INPages | Number of Index Leaf Pages (width of btree, # of hash table pages)
|
---|
BPR | Buffer Pool Pages Required; how many pages are
required in the buffer pool in order for this query to run.
|
---|
NPages | Number of Relation Pages
|
---|
NTuples | Number of Tuples
|
---|
PSize | Page Size
|
---|
RF(Primary) | Primary Predicate Reduction Factor
|
---|
NPages | Result Number of Pages
|
---|
RF | Reduction Factors(all predicates for this node)
|
---|
TSize | Result Tuple Sizes
|
---|
Reduction Factors
With only minor alterations, this comes right from "Access Path Selection
in a Relational Database Management System" by Selinger et al.
column = value | RF = 1 / NKeys if index on column
| RF = 1/10 otherwise
| column1 = column2 | RF = 1 / MAX (1.NKeys,
2.NKeys) if index on columns 1 and 2
RF = 1 / I.NKeys
if only one index on column i.
| RF = 1/10 otherwise
| column != value | RF = 1 - 1 / NKeys if index on column
| RF = 1 - 1/10 otherwise
| column1 != column2 | RF = 1 - 1 / MAX (1.NKeys,
2.NKeys) if index on columns 1 and 2
RF = 1 - 1 / I.NKeys
if only one index on column i.
| RF = 1 - 1/10 otherwise
| value1 < col < value2 | (note: for our purposes, no
real point in distinguishing between < and <= with regard to RF)
| RF = (value2 - value1) / (high key value - low key value) if
int or real
| 1/4 otherwise
| value < column | (or any other open-ended
comparison -- the formula is similar)
| RF = (high key value - value) / (high key value - low key
value) if numeric
| RF = .3
| column1 < column2 | RF = .3
| A AND B | RF = RF(A) * RF(B)
| A OR B | RF = RF(A) + RF(B) - RF(A) * RF(B)
| | |
Access Method Costs
If all attributes used in either the selection or
projection in a plan node can be supplied by
the index, that node will be marked as being an index-only plan.
In such a plan, the indexed relation need not be accessed, and this
is reflected in the following costs.
Access Method | Description
|
---|
File Scan |
NPages
|
---|
File Scan, file is sorted
|
log(NPages) + (NPages * RF(Primary))
|
---|
We can use a binary search to find the correct first page, and then we only
need to read in as many pages that match the condition based on the sorted
attribute.
|
Hash Index-Only Scan
|
INPages
|
---|
We can read through all of the pages of the hash index instead of
going to the base relation.
|
Clustered Hash Scan
|
ICost + (NPages * RF(Primary))
|
---|
The cost to get to the appropriate pages (ICost), plus the number of
pages that contain records (since clustered, we know they are all together).
|
Unclustered Hash Scan
|
ICost + NTuples(Fetched)
|
---|
The cost to get to the appropriate pages (ICost), plus one page IO for
each matching tuple (since each could be on a different page).
|
Index-Only BTree Index
|
IHeight + (INPages * RF(Primary))
|
---|
The depth of the btree, plus the number of index leaves we have to look
at. We don't have to go to the base relation.
|
Clustered BTree Index
|
IHeight + (INPages * RF(Primary)) + (NPages * RF(Primary))
|
---|
The depth of the btree, plus the number of index leaves we have to look
at, plus the number of
pages that contain records (since clustered, we know they are all together)
|
Unclustered B Tree Index
|
IHeight + (INPages * RF(Primary)) + NTuples(Fetched)
|
---|
In this case, each record could be on a different page.
|
Joins
In general, we assume all joins are pipelined; that is, the results of one
join are directly fed into the next.
Index Nested Loops
The optimizer never considers whether the left plan was already
ordered or not. If we could make a guess on how many keys are entering in
the left hand side, we could then use the L.NKeys instead of L.NTuples in
the formulas, which could be cheaper.
The cost estimate is computed using the following formula for
joining plan R and plan L, where R is the right subplan and L is the
left:
L.TotalCost + (L.Cardinality * R.Access)
Where R.Access is the cost to get one tuple. Note, that RF(Primary) is the
selectivity of the join, so the selectivity for just one probe should be (RF(Primary) /
L.NTuples) if we have L.NTuples probes. So, what I did was grab the cost for R.Access
from the access-method level lookups & then did some algebra to simplify it
up a little bit. This would be a high estimate if the left plan was sorted
& the index is clustered. (Note, that these will work in non-equijoin joins
if the index supports equijoins)
Right Relation Sorted |
(not strictly an index, but close)
L.Total + L.NTuples * ( log(R.NPages) + R.NPages * RF(Primary) / L.NTuples )
(Clustered or Unclustered) Index-Only Hash Index |
L.Total + L.NTuples * R.ICost
|
---|
Clustered Hash Index |
L.Total + L.NTuples * ( R.ICost + R.NPages * RF(Primary) / L.NTuples)
|
---|
Unclustered Hash Index |
R.ICost is cost to use index once
|
---|
L.Total + L.NTuples * ( R.ICost + NTuples(Fetched) / L.NTuples + R.ICost)
| Note that NTuples(Fetched) = NTuples(Input) * RF(Primary),
or NTuples(Fetched) = L.NTuples * R.NTuples * RF(Primary)
|
Index-Only BTree Index
|
L.Total + L.NTuples * ( R.IHeight + R.INPages * RF(Primary)/L.NTuples)
|
---|
Clustered BTree Index
|
L.Total + L.NTuples * ( R.IHeight + R.INPages * RF(Primary)/L.NTuples + R.NPages * RF(Primary)/L.NTuples)
|
---|
Unclustered BTree Index
|
L.Total + L.NTuples * ( R.IHeight + R.INPages * RF(Primary)/L.NTuples + NTuples(Fetched)/L.NTuples)
|
---|
(note that NTuples(Fetched) = NTuples(Input) * RF(Primary) = L.NTuples * R.NTuples * RF(Primary))
| |
---|
Sort Merge
Hash indexes are not allowed.
In general, the cost is the cost to sort one or both incoming relations,
if necessary, plus the cost to access each one once. (If there is an index,
it will include the cost to use the index)
We also assume we can always sort in two runs (not always reasonable)
Note; we assume that sort-merge is done in two distinct steps -- a sort
step, and a merge step. It assumes (like the current version of minibase)
that merging is not done at the same time as the last sort run.
Both relations unsorted
| 4L.NPages + 4R.NPages + L.TotalCost + R.NPages
|
---|
Left relation sorted, right unsorted after access method, no b-tree index on right
| 4R.NPages + L.TotalCost + R.NPages
|
---|
Left relation unsorted, right sorted after access method, no b-tree index on right
| 4L.NPages + L.TotalCost + R.NPages
|
---|
Left relation unsorted, right sorted after access method,
clustered b-tree index on right
| 4L.NPages + L.TotalCost + R.IHeight + R.INPages + R.NPages
|
---|
Left relation unsorted, right sorted after access method,
unclustered b-tree index on right
| 4L.NPages + L.TotalCost + R.IHeight + R.INPages + R.NTuples
|
---|
Both relations sorted after access method, no b-tree index on
right
| L.TotalCost + R.NPages
|
---|
Both relations sorted after access method, clustered b-tree index
on right
| L.TotalCost + R.IHeight + R.INPages + R.NPages
|
---|
Both relations sorted after access method, unclustered b-tree
index on right
| L.TotalCost + R.IHeight + R.INPages + R.NTuples
|
---|
Other Join Methods
Tuple Nested Loops
| (L.NTuples * R.NPages) + L.Total
|
---|
Page Nested Loops
| (L.NPages * R.NPages) + L.Total
|
---|
Hash Join
| L.TotalCost + 2L.NPages + 2R.NPages + R.NPages
|
---|
Order By, Group By, Distinct, and Aggregates
Aggregates are `free'. If all aggregates can be computed using some index,
and index-only plan is produced. (However, all the aggregates must be
computable from the same index. It could be cheaper to instead scan two
different indexes and join them together, but the optimizer isn't smart
enough to consider this currently.)
Order By and Group By: Group by and distinct are done by sorting the
relation and then stepping through the different groups; after this point
they are not significantly different as far as the optimizer is concerned
from a similar order by clause.
If the plan is
already in the order desired,
there is no additional cost. Otherwise, 4 * NP is added to the cost
of the plan to sort the relation.
Click here for the public interface to the planner
Back to the Minibase Home Page
Back to the Components Home Page
|