00001 /*<std-header orig-src='shore' incl-file-exclusion='MAINPAGE_H'> 00002 00003 $Id: mainpage.h,v 1.5 2010/07/07 21:43:42 nhall Exp $ 00004 00005 SHORE -- Scalable Heterogeneous Object REpository 00006 00007 Copyright (c) 1994-99 Computer Sciences Department, University of 00008 Wisconsin -- Madison 00009 All Rights Reserved. 00010 00011 Permission to use, copy, modify and distribute this software and its 00012 documentation is hereby granted, provided that both the copyright 00013 notice and this permission notice appear in all copies of the 00014 software, derivative works or modified versions, and any portions 00015 thereof, and that both notices appear in supporting documentation. 00016 00017 THE AUTHORS AND THE COMPUTER SCIENCES DEPARTMENT OF THE UNIVERSITY 00018 OF WISCONSIN - MADISON ALLOW FREE USE OF THIS SOFTWARE IN ITS 00019 "AS IS" CONDITION, AND THEY DISCLAIM ANY LIABILITY OF ANY KIND 00020 FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 00021 00022 This software was developed with support by the Advanced Research 00023 Project Agency, ARPA order number 018 (formerly 8230), monitored by 00024 the U.S. Army Research Laboratory under contract DAAB07-91-C-Q518. 00025 Further funding for this work was provided by DARPA through 00026 Rome Research Laboratory Contract No. F30602-97-2-0247. 00027 00028 */ 00029 00030 /* this file contains only Doxygen documentation */ 00031 00032 00033 /** \mainpage SHORE Storage Manager: The Multi-Threaded Version 00034 * \section Brief Description 00035 * 00036 * This is an experiment test-bed library for use by researchers who wish to 00037 * write multi-threaded software that manages persistent data. 00038 * 00039 * This storage engine provides the following capabilities: 00040 * - transactions with ACID properties, with ARIES-based logging and recovery, 00041 * primitives for partial rollback, 00042 * transaction chaining, and early lock release, 00043 * - prepared-transaction support for two-phased commit, 00044 * - persistent storage structures : 00045 * B+ tree indexes, R* trees (spatial indexes), and files of untyped records, 00046 * - fine-grained locking for records and B+ tree indexes with deadlock detection, 00047 * optional lock escalation and optional coarse-grained locking, 00048 * - in-memory buffer management with optional prefetching, 00049 * - extensible statistics-gathering, option-processing, and error-handling 00050 * facilities. 00051 * 00052 * This software runs on Pthreads, thereby providing its client software 00053 * (e.g., a database server) multi-threading 00054 * capabilities and resulting scalability from modern SMP and NUMA 00055 * architectures, and has been used on Linux/x86-64 and Solaris/Niagara 00056 * architectures. 00057 * 00058 * \section Background 00059 * 00060 * The SHORE (Scalable Heterogeneous Object REpository) project 00061 * at the University of Wisconsin - Madison Department of Computer Sciences 00062 * produced the first release 00063 * of this storage manager as part of the full SHORE release in 1996. 00064 * The storage manager portion of the SHORE project was used by 00065 * other projects at the UW and elsewhere, and was intermittently 00066 * maintained through 2008. 00067 * 00068 * The SHORE Storage Manager was originally developed on single-cpu Unix-based systems, 00069 * providing support for "value-added" cooperating peer servers, one of which was the 00070 * SHORE Value-Added Server (http://www.cs.wisc.edu/shore), and another of which was 00071 * Paradise (http://www.cs.wisc.edu/paradise) at the University of Wisconsin. 00072 * The 00073 * TIMBER (http://www.eecs.umich.edu/db/timber) and 00074 * Persicope (http://www.eecs.umich.edu/persiscope) projects 00075 * at the University of Michigan, 00076 * PREDATOR (http://www.distlab.dk/predator) at Cornell 00077 * and 00078 * Lachesis (http://www.vldb.org/conf/2003/papers/S21P03.pdf) 00079 * used the SHORE Storage Manager. 00080 * The storage manager has been used for innumerable published studies since 00081 * then. 00082 * 00083 * The storage manager had its own "green threads" and communications 00084 * layers, and until recently, its code structure, nomenclature, 00085 * and contents reflected its SHORE roots. 00086 * 00087 * In 2007, the Data Intensive Applications and Systems Labaratory (DIAS) 00088 * at Ecole Polytechnique Federale de Lausanne 00089 * began work on a port of release 5.0.1 of the storage manager to Pthreads, 00090 * and developed more scalable synchronization primitives, identified 00091 * bottlenecks in the storage manager, and improved the scalability of 00092 * the code. 00093 * This work was on a Solaris/Niagara platform and was released as Shore-MT 00094 * http://diaswww.epfl.ch/shore-mt). 00095 * It was a partial port of the storage manager and did not include documentation. 00096 * Projects using Shore-MT include 00097 * StagedDB/CMP (http://www.cs.cmu.edu/~stageddb/), 00098 * DORA (http://www.cs.cmu.edu/~ipandis/resources/CMU-CS-10-101.pdf) 00099 * 00100 * In 2009, the University of Wisconsin - Madison took the first Shore-MT 00101 * release and ported the remaining code to Pthreads. 00102 * This work as done on a Red Hat Linux/x86-64 platform. 00103 * This release is the result of that work, and includes this documentation, 00104 * bug fixes, and supporting test code. 00105 * In this release some of the scalability changes of the DIAS 00106 * release have been disabled as bug work-arounds, with the hope that 00107 * further work will improve scalability of the completed port. 00108 * 00109 * \section Copyrights 00110 * 00111 This distribution contains code and documentation subject to one or 00112 more of the following copyrights. 00113 00114 The main code base of the storage manager is subject to the SHORE/UW 00115 copyright (given below) and most of it is also subject to 00116 the SHORE-MT/DIAS copyright (also given below). 00117 Both copyrights are hereby extended to the date of this release, 2010. 00118 00119 The atomic operations library is taken from the OPENSOLARIS release and 00120 is subject to Sun Microsystems copyright, and to the OPENSOLARIS license, 00121 found in src/atomic_ops/OPENSOLARIS.LICENSE. 00122 It is lengthy and so it is not included here. 00123 00124 The strstream compatibility code found in src/fc/w_compat_strstream.h and 00125 src/fc/w_compat_strstream.cpp 00126 is subject to the Silicon Graphics copyright, below. 00127 00128 The regex code found in the src/common/ library is subject to the 00129 Henry Spencer/ATT copyright and license, contained in src/common/regex2.h, and 00130 included below. 00131 00132 What little remains of the old SHORE sthreads library is subject to 00133 copyright given in those source files (src/sthread/sthread.h) as well as to the SHORE/UW and 00134 SHORE-MT/DIAS copyrights. 00135 00136 00137 - \b SHORE/UW \b Copyright: 00138 00139 SHORE -- Scalable Heterogeneous Object REpository 00140 00141 Copyright (c) 1994-2010 Computer Sciences Department, University of 00142 Wisconsin -- Madison 00143 All Rights Reserved. 00144 00145 Permission to use, copy, modify and distribute this software and its 00146 documentation is hereby granted, provided that both the copyright 00147 notice and this permission notice appear in all copies of the 00148 software, derivative works or modified versions, and any portions 00149 thereof, and that both notices appear in supporting documentation. 00150 00151 THE AUTHORS AND THE COMPUTER SCIENCES DEPARTMENT OF THE UNIVERSITY 00152 OF WISCONSIN - MADISON ALLOW FREE USE OF THIS SOFTWARE IN ITS 00153 "AS IS" CONDITION, AND THEY DISCLAIM ANY LIABILITY OF ANY KIND 00154 FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 00155 00156 This software was developed with support by the Advanced Research 00157 Project Agency, ARPA order number 018 (formerly 8230), monitored by 00158 the U.S. Army Research Laboratory under contract DAAB07-91-C-Q518. 00159 Further funding for this work was provided by DARPA through 00160 Rome Research Laboratory Contract No. F30602-97-2-0247. 00161 00162 - \b SHORE-MT/DIAS \b Copyright: 00163 00164 Shore-MT -- Multi-threaded port of the SHORE storage manager 00165 00166 Copyright (c) 2007-2009 00167 Data Intensive Applications and Systems Labaratory (DIAS) 00168 Ecole Polytechnique Federale de Lausanne 00169 00170 All Rights Reserved. 00171 00172 Permission to use, copy, modify and distribute this software and 00173 its documentation is hereby granted, provided that both the 00174 copyright notice and this permission notice appear in all copies of 00175 the software, derivative works or modified versions, and any 00176 portions thereof, and that both notices appear in supporting 00177 documentation. 00178 00179 This code is distributed in the hope that it will be useful, but 00180 WITHOUT ANY WARRANTY; without even the implied warranty of 00181 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. THE AUTHORS 00182 DISCLAIM ANY LIABILITY OF ANY KIND FOR ANY DAMAGES WHATSOEVER 00183 RESULTING FROM THE USE OF THIS SOFTWARE. 00184 00185 00186 - \b Silicon \b Graphic \b Copyright \b and \b License: 00187 00188 Copyright (c) 1998 00189 Silicon Graphics Computer Systems, Inc. 00190 00191 Permission to use, copy, modify, distribute and sell this software 00192 and its documentation for any purpose is hereby granted without fee, 00193 provided that the above copyright notice appear in all copies and 00194 that both that copyright notice and this permission notice appear 00195 in supporting documentation. Silicon Graphics makes no 00196 representations about the suitability of this software for any 00197 purpose. It is provided "as is" without express or implied warranty. 00198 00199 - \b Henry \b Spencer/ATT \b Copyright \b and \b License: 00200 00201 Copyright 1992, 1993, 1994, 1997 Henry Spencer. All rights reserved. 00202 This software is not subject to any license of the American Telephone 00203 and Telegraph Company or of the Regents of the University of California. 00204 00205 Permission is granted to anyone to use this software for any purpose on 00206 any computer system, and to alter it and redistribute it, subject 00207 to the following restrictions: 00208 00209 1. The author is not responsible for the consequences of use of this 00210 software, no matter how awful, even if they arise from flaws in it. 00211 00212 2. The origin of this software must not be misrepresented, either by 00213 explicit claim or by omission. Since few users ever read sources, 00214 credits must appear in the documentation. 00215 00216 3. Altered versions must be plainly marked as such, and must not be 00217 misrepresented as being the original software. Since few users 00218 ever read sources, credits must appear in the documentation. 00219 00220 4. This notice may not be removed or altered. 00221 00222 * 00223 * \section START Getting Started With the Shore Storage Manager 00224 * A good place to start is with \ref SSMAPI. 00225 * 00226 *\section BUILD Configuring and Building the Storage Manager 00227 * See \ref OPT "this page" to configure and build the storage manager. 00228 * 00229 * \section IMPLNOTES1 Implementation Notes 00230 * See \ref IMPLNOTES "this page" for some implementation details. 00231 * 00232 * \section REFS References 00233 * See \ref REFERENCES "this page" for references to selected papers 00234 * from which ideas are used in the Shore Storage Manager. 00235 */ 00236 00237 /**\addtogroup OPT 00238 * Configuring and building the storage manager consists of these steps, 00239 * all done at the root of the distribution directory tree. 00240 * - bootstrap 00241 * - configure 00242 * - build (make) 00243 * 00244 *\section OPTBOOT Bootstrapping 00245 * Bootstrapping might not be necessary, but if you have the autotools 00246 * installed, it might save time to bootstrap the first time you try to 00247 * build, particularly if you are installing on a system other than Linux. 00248 * To bootstrap, type ./bootstrap. You can also look at that script and 00249 * run selected parts of it, since all it does is run the autotools. 00250 * 00251 * Autotools run abysmally slowly on Solaris. 00252 * 00253 *\section OPTCONF Configuring 00254 * There are two parts to configuring the storage manager. 00255 * The original configuration scheme of SHORE was encapsulated in 00256 * \e config/shore.def, which described all or most pertinent CPP macros. 00257 * We are moving away from that scheme and replacing it with 00258 * autoconf options and features, but a few things still remain under 00259 * the control of \e shore.def. 00260 * These fall into three categories: 00261 * - details related to autoconf-controlled options, such as pathnames 00262 * - basic compile-time constants that someone extending the 00263 * storage manager might want to change 00264 * - maintainer's tools 00265 * 00266 * There remaining some CPP macros not 00267 * described in \e config/shore.def: 00268 * - code of occasional utility for debugging purposes 00269 * - code intended to be the subject of future experimentation 00270 * 00271 * Configuring amounts to running ./configure from the root of the 00272 * distribution directory tree, and, depending on the features you 00273 * wish to use, editing \e config/shore.def. 00274 * 00275 * \remarks 00276 * The storage manager API contains a method ss_m::config_info (q.v.) that 00277 * allows a server to determine, at run time, some of the 00278 * compile-time limits determined by the configuration. 00279 * 00280 *\subsection CONFIGOPT Configuration Options 00281 * To find the configuration options, type 00282 *\code ./configure --help \endcode, the output of which is reproduced here. 00283 * 00284 * \verbatim 00285 SHORE-specific Features: 00286 --enable-lp64 default:yes Compile to use LP 64 data model 00287 No other data model is supported yet. 00288 But we hope some day to port back to LP32. 00289 --enable-checkrc default:no Generate (expensive) code to verify return-code checking 00290 If a w_rc_t is set but not checked with 00291 method is_error(), upon destruction the 00292 w_rc_t will print a message to the effect 00293 "error not checked". 00294 --enable-trace default:no Include tracing code 00295 Run-time costly. Good for debugging 00296 problems that are not timing-dependent. 00297 Use with DEBUG_FLAGS and DEBUG_FILE 00298 environment variables. See \ref SSMTRACE. 00299 --enable-dbgsymbols default:no Turn on debugger symbols 00300 Use this to override what a given 00301 debugging level will normally do. 00302 --enable-explicit default:no Compile with explicit templates 00303 NOT TESTED. 00304 \todo mainpage.h compile with or remove explicit templates 00305 00306 --enable-valgrind default:no Enable running valgrind run-time behavior 00307 Includes some code for valgrind. 00308 --enable-purify default:no Enable build of <prog>.pure 00309 --enable-quantify default:no Enable build of <prog>.quant 00310 --enable-purecov default:no Enable build of <prog>.purecov 00311 00312 SHORE-specific Optional Packages: 00313 --with-hugetlbfs Use the hugetlbfs for the buffer pool. 00314 Depending on the target architecture, this might 00315 be useful. If you use it, you will need to set 00316 a path for your hugetlbfs in config/shore.def. 00317 The default is : 00318 #define HUGETLBFS_PATH "/mnt/huge/SSM-BUFPOOL" 00319 --without-mmap Do not use mmap for the buffer pool. Trumps 00320 hugetlbfs option. 00321 --with-debug-level1 Include level 1 debug code, optimize. 00322 This includes code in w_assert1 and 00323 #if W_DEBUG_LEVEL > 0 /#endif pairs and 00324 #if W_DEBUG_LEVEL >= 1 /#endif pairs and and 00325 W_IFDEBUG1 00326 --with-debug-level2 Include level 2 debug code, no optimize. 00327 Equivalent to debug level 1 PLUS 00328 code in w_assert2 and 00329 #if W_DEBUG_LEVEL > 1 /#endif pairs and 00330 #if W_DEBUG_LEVEL >= 2 /#endif pairs and 00331 W_IFDEBUG2 00332 --with-debug-level3 Include level 3 debug code, no optimize. 00333 Equivalent to debug level 2 PLUS 00334 includes code in w_assert3 and 00335 #if W_DEBUG_LEVEL > 2 /#endif pairs and 00336 #if W_DEBUG_LEVEL >= 3 /#endif pairs and 00337 W_IFDEBUG3 00338 \endverbatim 00339 \bug GNATS 136 Only 64-bit platforms are supported. The issue is that 00340 lsns and some other data structures need atomic methods. 00341 00342 \todo Convert w_assert9 to w_assert3 where the asserts are still reasonable and remove the rest. Some of these are obsolete, some are racy in the new mt-context. All the w_assert9's are what used to be w_assert3; they were turned into 9 to disable them until they could be evaluated for usefulness, and many have been converted to 2 or 3-level asserts; many remain to be addressed. 00343 00344 * \subsection SHOREDEFOPT Description of Selected CPP Macros 00345 * In this section we describe selected macros defined (or not) in 00346 * \e config/shore.def. 00347 * 00348 * - HUGETLBFS_PATH See --with-hugetlbfs in \ref CONFIGOPT; 00349 * see also \ref REFHUGEPAGE1 for use of hugetlbfs with Linux. 00350 * 00351 * - USE_SSMTEST Define this if you want to include crash test hooks in your 00352 * smsh. This is for a maintainer's testing purposes and should not be 00353 * defined for a release version of the storage manager. 00354 * 00355 * - COMMON_GTID_LENGTH : You can override the default length of a global 00356 * transaction id. 00357 * Useful only if your server implements distributed 00358 * transactions. 00359 * 00360 * - COMMON_SERVER_HANDLE_LENGTH : You can override the default length of a 00361 * server handle, the handle by which the server identifies a 00362 * coordinator of distributed transaction. 00363 * Useful only if your server implements distributed 00364 * transactions. 00365 * 00366 * - SM_LOG_PARTITIONS : You can override the default maximum number of 00367 * open partitions for the log by defining this. 00368 * 00369 * - SM_PAGESIZE : You can override the default page size by defineing this. 00370 * Warning: this has not been tested in a long time, and in any case, 00371 * the maximum page size is 64KB. 00372 * 00373 * - SM_EXTENTSIZE : You can override the default extent size with this. 00374 * Warning: you must also address the alignment of Pmap_Align4 00375 * based on the resulting size of a Pmap. See extent.h, logrec.h, 00376 * pmap.h. 00377 * Warning: this has never been tested. This information is included 00378 * only for those whose hacking on the storage manager who may require larger 00379 * extents. 00380 * 00381 *\subsection OPTTCL Tcl and smsh 00382 * The storage manager test shell, smsh, uses Tcl. 00383 * Autoconf tries to find Tcl in a standard place; if it is found, 00384 * fine. But if not, you must define two paths to your Tcl 00385 * library and include files in \e Makefile.local 00386 * at the top of the directory tree. 00387 * If you Tcl installation is not built for multithreading, you must 00388 * install such a copy and put its path in \e Makefile.local. 00389 * 00390 * Tcl is available from 00391 * - ActiveState (http://www.activestate.com/activetcl/), or 00392 * - SourceForge (http://sourceforge.net/projects/tcl/files/). 00393 * 00394 *\section OPTBLD Building 00395 * Building the storage manager consist of running 00396 * \code 00397 make 00398 \endcode 00399 * 00400 * Since it is not alway easy to tell what options were used for the 00401 * most recent build in a directory, the compiler options used on the 00402 * build are put in the file \e makeflags 00403 * and the rest of the options are determined in \e config/shore-config.h, 00404 * produced at configuration time. 00405 * 00406 * \note For Solaris users: if you use CC rather than gcc, 00407 * you will probably have to run configure 00408 * with environment variable CXX defined as the path to your 00409 * CC compiler, and you might also need 00410 * \code configure --enable-dependency-tracking \endcode. 00411 * 00412 * \section SHOREMKCHECK Checking the Release 00413 * 00414 * After building the storage manager, you can check it by running 00415 * \code 00416 make check 00417 \endcode 00418 * in the root of the directory tree. 00419 * This runs unit tests for each libary. 00420 * 00421 * \attention 00422 * The storage manager test shell, smsh, is run by 00423 * \code make check \endcode. Smsh uses Tcl. The path to you 00424 * Tcl installation is given in \e Makefile.local at the 00425 * top of the directory tree. 00426 * 00427 * \note 00428 * If you do not have Tcl installed and want to test the installation 00429 * without smsh, you may run 00430 * \e make \e check and ignore the fact that 00431 * it chokes trying to build smsh, because 00432 * smsh is the last test that \e make \e check runs. 00433 * 00434 * \section SHOREMKINSTALL Installing the Release 00435 * 00436 * You may run 00437 * \code 00438 make install 00439 \endcode 00440 * This installs: 00441 * - the header files in 00442 * - <includedir> [default: <prefix>/include] 00443 * - the libraries in 00444 * - <libdir> [default: <exec-prefix>/lib] 00445 * 00446 * To change the prefixes, use one or more of these configure options: 00447 * \code configure --prefix=<path> \endcode 00448 * or 00449 * \code configure --libdir=<path> \endcode 00450 * or 00451 * \code configure --includedir=<path> \endcode 00452 * 00453 */ 00454 00455 /**\page HUGETLBFS HugeTLBfs 00456 * 00457 * See \ref REFHUGEPAGE1 for assorted on-line documentation about 00458 * using large pages to avoid excessive load on the TLB. 00459 * 00460 * Here we do not claim to be complete for all target architectures. 00461 * This is meant to serve as an example for Linux targets. 00462 * The following steps are what we did on one RHEL5 system. 00463 * 00464 * NOTE: If you have kernel documentation installed, see: 00465 * /usr/share/doc/kernel-doc-<version>/Documentation/vm/hugetlbpage.txt 00466 * 00467 * First steps (most of this must be done by the super-user): 00468 * - Determine that our kernel supports hugetlbfs. 00469 * - grep Hugepagesize /proc/meminfo 00470 * - Ensure that we have adequate huge pages (this example is 00471 * for an 8GB buffer pool) on reboot: 00472 * - echo "vm.nr_hugepages=4096" >> /etc/sysctl.conf 00473 * To dynamically allocate pages, 00474 * - echo 4096 > /proc/sys/vm/nr_hugepages 00475 * - Create a group for users of the hugetlbfs using your sys admin 00476 * applications. My group is called "ssm" and has gid 55555. 00477 * - Add users to the ssm group. 00478 * - Create the mount point for a hugetlbfs: 00479 * - mkdir -p /mnt/huge 00480 * - Mount a pseudo filesystem of type hugetlbfs at that mount point; do this 00481 * on reboot: 00482 * - echo "none /mnt/huge hugetlbfs rw,gid=55555,size10g,mode=0770 0 0" > /etc/fstab 00483 * 00484 * - Reboot. 00485 * 00486 *\warning 00487 * <b> If you have configured the storage manager for use with hugetlbfs 00488 * and have not taken the above steps to ensure that your system actually 00489 * has the hugetlbfs pages available for use, your storage manager (or 00490 * the 'make check' tests) will likely croak. Unfortunately, we have 00491 * not yet figured out a way to determine ahead of time, programmatically, 00492 * whether things will go well or not, before we try to write to the 00493 * huge pages. </b> 00494 * 00495 * 00496 * Second steps (this can be done by users in the ssm group): 00497 * - Edit the default path for the hugetlbfs node in config/shore.def. 00498 * - Create a file in the hugetlbfs, owned by the ssm group, whose 00499 * name matches the default path in config/shore.def, e.g., 00500 * - touch /etc/huge/SSM-BUFPOOL 00501 * - Configure and build the storage manager with --with-hugetlbfs. 00502 * - To run using the hugetlbfs is now the default; to run without it, 00503 * change the sm_hugetlbfs_path run-time option to the value "NULL"; 00504 * here is an example from the .shoreconfig file in the smsh directory: 00505 * - *.server.*.sm_hugetlbfs_path: NULL 00506 * 00507 * Note that your buffer pool size will have to be set to a multiple 00508 * of the huge page size for your system. Thus, if your huge pages are 2 MB 00509 * you will get an error from mmap if you use a 3 MB buffer pool. 00510 * 00511 * On the whole, the use of mmap with the hugetlbfs is not reliable 00512 * and, although at process end, all huge pages are supposed to be returned 00513 * to the system, we have seen cases in which pages were "lost" and 00514 * the mmap thereafter failing, repaired only on reboot, so consider this 00515 * feature for performance experiments only on systems that do not 00516 * require high availability. 00517 * 00518 */