International Collegiate Programming
Contest
|
Advanced Data Structures and String Processing
This session focuses on advanced data structures and string processing. If you are not
familiar with the relevant concepts and algorithms, or you need a refresher,
you can use the resources below.
Problem Set (#7)
-
Triangle Wave
-
Potentiometers
(To use fast I/O, use scanf/printf in c++ or only cin/cout, with the lines:
ios_base::sync_with_stdio(false);
cin.tie(0);
as the first two lines of main. In Java, use BufferedReader, as seen
here.)
-
AGTC
(You may assume a O(nm) algorithm is sufficiently efficient, where n is the length of x and m the length of y)
-
Grammar Evaluation
(The grammar for <component> should read <component> := <factor> | <factor> * <component>).
I realized this problem is tricky. You need to use Recursive Descent Parsing:
https://en.wikipedia.org/wiki/Recursive_descent_parser to evaluate
the grammar directly. It handles all order of operations.
-
Frequent Values
-
Power Strings
(Hint: KMP)
-
Ahoy, Pirates!
Note: All problems expect I/O to be done using standard input and standard
output. The solution to problems must be written in a single file.
If you are programming in Java the application class should be called
"Main" and should be in the default package.
Resources
Advanced Data Structures
There are a few data structures outside of what you typically learn in a
data structures or algorithms course that are sometimes necessary in ICPC.
These data structures are not supplied with the standard library of Java
or C++. Because of this, we recommend having bug-free implementations in
library code. In particular, we already have working implementations of
the data structures discussed at
https://github.com/atmorgan/ICPC2014.
Binary Indexed (Fenwick) Tree
Occasionally a problem based on an array or sequence will require range sum
queries (RSQ); that is, it will ask for the the sum of elements between
two indices in the sequence. If the array is static, the problem can be solved
in O(1) time per query with O(n) time preprocessing, using dynamic programming.
However, if the values of the array can change as part of the problem, then
naive solutions will either require O(n) time per update or O(n) time per RSQ.
A more efficient approach is to use a Binary-Indexed Tree (BIT), which can achieve
O(log n) time update and O(log n) time query. A BIT is a perfect binary tree
built into an array that utilizes bit operations in its updates and queries.
Because of this, it is particularly fast in practice. The code is also very
simple and short. Read more about Fenwick trees in the wikipedia article
here. To learn how
to implement your own BIT, take a look at
the following topcoder tutorial.
As seen by our library code,
BITs can additionally be extended to work in 2 or more dimensions, and
modified to support range-updates, where a contiguous sequence of indices
change value.
Segment Tree
Segment trees provide similar operations as BITs but with greater flexibility.
In particular, segment trees can also be used for RSQ, but they are capable of
a whole lot more (at a cost of additional complexity and more code).
The basic idea of a segment tree is to utilize divide and conquer
to again provide O(log n) time queries on a contiguous range of indices
and O(log n) time update. The difference with segment trees as opposed to
BITs is the query operation can be any function that can be computed (efficiently)
by merging results of queries on smaller intervals. For example, segment trees
are capable of dynamic Range Minimum Queries (RMQ); that is, they support finding
a minimum between indices (i, j) in O(log n) time, and updating a value at a particular
index in O(log n) time. As with BITs, segment trees can be modified to work in
2 or more dimensions, and can also support range updates via lazy propagation.
The wikipedia article for segment trees provides information about the computational-geometry
segment tree for stabbing queries on a set of intervals, which is typically not needed in
programming competitions. However, you can read about segment trees used in ICPC in
the following article.
As a warning to ICPC contestants utilizing a segment tree
for a contest problem, the trees can unfortunately be quite slow. It is not uncommon
to get Time Limit Exceeded on a problem with an implementation that asymptotically
runs in the same time complexity as a model solution. For this reason, we have optimized our
implementation with a few tricks that bring down some of the constants involved with
working with segment trees. Take a look
here.
String Processing
Working with strings, parsing obscure input, and producing complicated output are all
tasks that appear frequently in ICPC. Familiarity with how your language of choice accomplishes
basic string manipulation is essential to being competitive in a contest setting.
In addition to basic string manipulation, often string tasks in ICPC are
solvable with a dynamic program. The classic problems in this space are
edit distance and
the related
longest common subsequence
problem.
In more advanced tasks, ICPC problems may require efficient string-processing algorithms
or data structures. Although less common, understanding these algorithms
and having an implementation handy is necessary to solve these problems.
Knuth-Morris-Pratt (KMP) Algorithm
The Knuth-Morris-Pratt (KMP) algorithm is an algorithm to determine if a query string Q of
length m is a substring of a given string S of length n, and if so, the first index
at which a match occurs. Although standard libraries have
such string matching functions, the C++ and Java implementations run in worst
case O(nm) complexity. The KMP algorithm runs in worst case O(n + m) complexity by
preprocessing Q into a KMP table. This table can have uses for more than just the KMP
algorithm, so it is useful to understand how it works. The
wikipedia article for
the KMP algorithm explains the details of the algorithm. Furthermore, we have an
implementation here.
Trie
A trie is a data structure for storing a set of strings. Each node of a trie
is a letter, and a path from root to leaf represents a string in the
trie. Words with shared prefixes share the same beginning path.
When used purely as a dictionary, tries are no better than the standard
library implementations of hash tables. Tries can be useful, however, if more information regarding the prefixes
themselves is needed. For more information about tries, consult the
wikipedia article.
Suffix Array
Suffix arrays are advanced string data structures that rarely come up
in ICPC. However, they can provide efficient operations that are not possible
with elementary approaches. The idea behind a suffix array is to determine all the suffixes
of a given string S of length n, and sort them in lexicographical order. Naively
this would take (at least) O(n^2) time, but it turns out it can be done in O(n log^2 n)
time (easily), and optimized to O(n log n) when the alphabet is bounded by a constant
(suffix arrays can actually be constructed in linear time with more complicated algorithms).
Once constructed, a suffix array can achieve the following operations:
- String Matching (same as what KMP does) in O(m log n) time -
where n is the length of the preprocessed suffix array string and m is the length of the
query string
- Longest Common Prefix in O(n) time
- Longest Repeated Substring in O(n) time
- Longest Common Substring in O(n) time
For an overview of suffix arrays, consult the
following article.
We have a limited suffix array implementation available
here.
|