Data Mining Your Website
by Jesus MenaView All Available Formats & Editions
Turn Web data into knowledge about your customers.
This exciting book will help companies create, capture, enhance, and analyze one of their most valuable new sources of marketing information-usage and transactional data from a website. A company's website is a primary point of contact with its customers and a medium in which visitor's actions are messages about
… See more details belowOverview
Turn Web data into knowledge about your customers.
This exciting book will help companies create, capture, enhance, and analyze one of their most valuable new sources of marketing information-usage and transactional data from a website. A company's website is a primary point of contact with its customers and a medium in which visitor's actions are messages about who they are and what they want.
Data Mining Your Website will teach you the tools, techniques, and technologies you'll need to profile current and potential customers and predict on-line interests and behavior. You'll learn how to extract from the huge pools of information your website generates, insights into on-line buying patterns, and how to apply this knowledge to design a website that better attracts, engages, and retains on-line customers.
Data Mining Your Website explains how data mining is a foundation for the new field of web-based, interactive retailing, marketing, and advertising. This innovative book will help web developers and marketers, webmasters, and data management professionals harness powerful new tools and processes.
The first book to apply data mining specifically to e-commerce
Learn effective methods for gathering, managing, and mining Web customer information
Use data mining to profile customers and create personalized e-commerce programs
Editorial Reviews
Product Details
- ISBN-13:
- 9781555582227
- Publisher:
- Elsevier Science
- Publication date:
- 07/15/1999
- Pages:
- 384
- Product dimensions:
- 7.05(w) x 9.19(h) x 0.78(d)
Read an Excerpt
Chapter 1: What Is Data Mining?
...New Data Mining ApplicationsIn a web-centric networked environment, data mining is able to deliver new decision-support applications in two areas. First, data mining can be used to perform analysis on Internet-generated data from a website, which is what this book is about. And second, the technology can be used to monitor and detect the signature of network anomalies and potential problems before they happen.
One way of defining what data mining is about is to contrast it with more traditional methods of data analysis, such as statistics, Online Analytical Processing (OLAP), visualization, and web traffic analysis tools. We will now examine these various methods of data analysis and show that they are not exclusive of each other but are instead complementary, each contributing to the insight of your website visitors and customers and the effectiveness of your online presence.
Statistics
For centuries, man has been looking at the world and gathering data in an attempt to explain natural phenomena. People have been manually analyzing data in their search for explanations and patterns. The science of statistics originated as a way of making sense out of these observations. Traditional statistics have developed over time to help scientists, engineers, psychologists, and business analysts make sense of the data they generate and collect. Descriptive statistics, for example, provide general information about these observations, such as the average and median values, the observed errors, and the distribution of values. Another form of statistics is regression analysis, which is a technique used to interpolate andextrapolate these observations in an effort to make some intelligent predictions about the future.
Insurance and financial services companies were some of the first to try and predict customer behavior and to do risk analysis. Banks and insurers have typically used regression models to rank their customers' overall value or risk, all of which are attempts to fit a line to an observed phenomenon. SAS has built an empire by providing a host of enterprise-wide statistical applications to the financial services market sector. Fair Isaac, a modeling company, provides a wide assortment of services using statistics in the areas of behavior and their FICO credit scoring for financial services firms and other companies that do not do in-house analysis. Claritas provides demographic data to financial service firms designed to help them estimate sales potential of customers and segment markets by age, income, etc.; it sells its P$YCLE demographic data to financial companies, which allows them to differentiate households in terms of expected financial behavior.
This kind of statistical data analysis involves analysts who formulate a theory about a possible relation in a database and then convert this hypothesis into a query; it is a manual, user-driven, top-down approach to data analysis. In statistics, you usually begin the process of data analysis with a hypothesis about the relation in the data, which is the reverse of how data mining works. Some of the most popular statistical tools include the following:
Data Desk uses animation to help you see patterns you might miss in a static display. For example, you can easily link a sliding control to any part of an equation and see the effects of sliding the value as display update. Data Desk automatically makes sliders to help you find optimal transformations of variables, to learn about the sensitivity of analyses to small shifts in variables, and to assess the sensitivity of nonlinear regressions. You can easily build your own animations.
MATLAB is a powerful integrated technical computing environment that combines numeric computation, advanced graphics and visualization, and a high-level programming language.
SAS is a modular, integrated, hardware-independent statistical and visualization system of software for enterprise-wide information delivery. SAS recently recognized the benefits of data mining and is now offering Enterprise Miner a data mining add-on module to its base system.
S-Plus is the commercial version of "S," an interactive, object oriented programming language for data analysis developed by AT&T Bell Labs. It is supported and marketed by MathSoft. S/SPlus has proved to be a useful platform for both general-purpose data analysis including clustering, classification, summarization, visualization, regression, and CART.
SPSS is a powerful, easy-to-use, easy-to-learn statistical package for business or research. It features most of the standard statistics, along with high-resolution graphics with reporting and distributing capabilities. SPSS also recognized the importance of data mining when it purchased Clementine, a data mining tool from ISL (Integrated Solutions Limited).
STATlab is an exploratory data analysis software for drilling down into data and performing a multitude of analyses. STATlab can import data from common formats such as relational databases, ASCII files, popular spreadsheets, and most of the popular statistical data systems.
Data Mining vs. Statistics
The deciding distinction between statistics and data mining is the direction of the query: In data mining, the interrogation of the data is done by the machine-learning algorithm or neural network, rather than by the statistician or business analyst. In other words, data mining is data-driven, rather than user-driven or verification-driven, as it is with most statistical analyses. Statistical manual factorial and multivariate analyses of variance may be performed in order to identify the relationships of factors influencing the outcome of product sales using such tools as SPSS or SAS. Pearson's correlation may be generated for every field in a database to measure the strength and direction of their relationship to some dependent variable, like total sales.
A skilled SAS statistician conversant with that system's PROC syntax can perform that type of analysis rather quickly. However, one of the problems with this approach, aside from the fact that it is very resource-intensive, is that the techniques tend to focus on tasks in which all the attributes have continuous or ordinal values. Many of them are also parametric; for instance, a linear classifier assumes that class can be expressed as a linear combination of the attribute values. Statistical methodology assumes a bell-shaped normal distribution of data-which in the real world of business and Internet databases simply is nonexistent and too costly to accommodate. However, these statistical tool vendors are well aware of these shortcomings; as both SPSS and SAS are now making available new data mining modules and add-ons to their main products.
Data mining also has major advantages over statistics when the scale of databases increase in size, simply because manual approaches to data analysis are rendered impractical. For example, suppose there are 100 attributes in a database to choose from, of which you don't know which are significant. With even this small problem there are 100 x 99 = 9,900 combinations of attributes to consider. If there are three classes, such as high, medium, and low, there are now 100 x 99 x 98 = 970,200 possible combinations. If there are 800 attributes, such as in our large website bookseller customer database ... well, you get the picture. Consider analyzing millions of transactions on a daily basis, as is the case with a large electronic retailing site, and it quickly becomes apparent that the manual approach to pattern-recognition simply does not scale to the task. Data mining, rather than hindering the traditional statistical approach to data analysis and knowledge discovery, extends it by allowing the automated examination of large numbers of hypotheses and the segmentation of very large databases.
Online Analytical Processing
OLAP tools are descendants of query generation packages, which are in turn descendants of mainframe batch report programs. They, like their ancestors, are designed to answer top-down queries from the data or draw "what if" scenarios for business analysts. Recently, OLAP tools have grown very popular as the primary methods of accessing company database, datamarts, and data warehouses.
OLAP tools were designed to get data analysts out of the customreport-writing business and into the "cube construction" business. The OLAP data structure is similar to a Rubik's Cube of data that an analyst can twist and twirl in different ways to work through multiple reports and "what-would-happen" scenarios. OLAP tools primarily provide multidimensional data analysis-that is, they allow data to be broken down and summarized by product line and marketing region. The basic difference between OLAP and data mining is that OLAP is about aggregates, while data mining is about ratios. OLAP is addition while data mining is division.
OLAP deals with facts or dimensions typically containing transactional data relating to a firm's products, locations, and times. Each dimension also can contain some hierarchy. For example, the time dimension may drill down from year, to quarter, to month, and even to weeks and days. A geographical dimension may drill Up from city, to state, to region, to country, and even to hemisphere, if necessary. For example, Widget A, by Western Region, by Month of November, which can be further "drilled" by Blue Widgets A, by San Jose, by November 10, and so on. The data in these dimensions, called measures, is generally aggregated (for example, total or average sales in dollars or units, or budget dollars or sales-forecast numbers).
Many organizations have been in existence for some time and have accumulated considerable quantities of data that could be useful for business planning. Historical trends and future projections could be used to analyze business alternatives and make more informed decisions that could gain or maintain a competitive advantage. During the last decade, data warehouses have become common in large corporations, many of which use OLAP tools for reports and decision support. These OLAP applications span a variety of organizational functions. Finance departments use OLAP for applications such as budgeting, activity-based costing (allocations), financial performance analysis, and financial modeling. Sales analysis and forecasting are two of the OLAP applications found in sales departments. Among other applications, marketing departments use OLAP for market research analysis, sales forecasting, promotion analysis, customer analysis, and market/customer segmentation. Typical manufacturing OLAP applications include production planning and defect analysis...
Customer Reviews
Average Review: