\section{Introduction}
The LASSO-Patternsearch algorithm \cite{shi:wahba:wright:2008} (LPS) is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. As a global method, it considers all possible patterns up to certain order. Therefore, it can handle very complicated correlation structures among the predictor variables, which can cause serious problems for the sequential methods. The novel computational algorithm makes LPS a very efficient method. 

However, every method has a limitation due to the limited computer memory. For LPS, the limitation is more stringent because all possible patterns are being considered. For example, given that the maximum number of unknowns we can solve is 2 million, only 1999 variables can be included if we consider second order patterns. The current trends in genetic epidemiology are to evaluate as many markers as possible. The number can easily exceed ten thousand or more \cite{valdar:solberg:2006}. For these kinds of data sets, we will have to add a screen step before applying LPS, just like in \cite{shi:wahba:lee:2007}. In this article, the Grouped LASSO-Patternsearch Algorithm (GLPS) is proposed to handle huge data sets. Like LPS, we assume that all predictor variables are binary, or have been dichotomized before the analysis. Although we consider myriad parameters, the solution is believed to be sparse. There are two steps in the Grouped LASSO-Patternsearch Algorithm, the group step and the post-group step. GLPS, as suggested by its name, begins with dividing the covariates into small groups. The number of variables in each group is the same. It is determined by the total number of unknowns we can solve in each run of LPS. Suppose we only want to consider main effects and size two patterns. For each group, we run LPS with $q = 2$ on all variables in this group; for each pair of groups, we run LPS with $q = 2$ on all variables in both groups, with the restriction that the size two patterns must have one variable from each group. We call this step of the algorithm the group step. All possible size two patterns have been considered in this step. We collect all the variables that show up in at least one of these runs, either as main effects or in second order patterns. In the post-group step we run LPS again on all variables surviving the group step, and the resulting model is our final model. For LPS in the group step, the tuning parameter is chosen by BGACV. For LPS in the post-group step, we use two tuning parameters, one for main effects and one for interactions. They are chosen by BGACV2, which is a variation of BGACV. Properties of GLPS will be examined via simulations and in genetic data by evaluating its predictive power. 

The rest of the article is organized as follows. In Section \ref{sec.glps} we describe the details of the GLPS algorithm. Section \ref{sec.glpssim} presents three simulation examples, designed to demonstrate the properties of GLPS as well as comparing GLPS with Logic regression, SPLR and Random forest. Section \ref{sec.genebar} applies the method to gene expression data dichotomized by the bar code method. And finally in Section \ref{sec.glpsdisc} we present a discussion. 

\section{Grouped LASSO-Patternsearch Algorithm}
\label{sec.glps}
Considering $n$ subjects with $p$ binary predictor variables, we first divide the $p$ variables into $k$ groups so that each group has $g$ variables. For ease of presentation we assume that $p = kg$. If $(k-1)g<p<kg$, we can easily add $kg-p$ dummy variables that are not related to the response (like zeros) to the $k$th group. The data is $\{y,x_j, j = 1,\cdots,p\}$, where $y = (y_1,y_2,\cdots,y_n) \in \{0,1\}$ is the response, $x_j = (x_j(1),x_j(2),\cdots,x_j(n))$ is the $j$th covariate, $x_j(i)\in\{0,1\}$. We reorganize the $p$ predictor variables and denote them by $x_{st}$, $s = 1,\cdots,k$ indexes groups and $t = 1,\cdots,g$ indexes variables within each group. Let $q$ be the highest order of patterns we consider. Then there will be $N_B = \sum_{\nu=0}^q{{p}\choose{\nu}}$ patterns if we apply LPS. We use $q=2$ to illustrate the GLPS algorithm. The case with larger $q$ can easily be extended. 

Let us look at the between group patterns between the $s_1$th group and the $s_2$th group, where $1\leq s_1 < s_2 \leq k$. The basis functions we consider are $\{B_{t_1t_2} = x_{s_1t_1}\times x_{s_2t_2}, t_1,t_2 = 1,\cdots,g\}$, $\{B_{t_1} = x_{s_1t_1}, t_1 = 1,\cdots,g\}$ and $\{B_{t_2} = x_{s_2t_2}, t_2 = 1,\cdots,g\}$. In words, all main effects and all size two patterns that consist of one variable from each group are included. There will be a total of $N_B = 2g + g^2 + 1$ patterns, including the constant. 

Letting $p(x) = \mbox{Prob}[y= 1|x]$ and the logit (log odds ratio) be $f(x) = \log [p(x)/(1-p(x))]$, we estimate $f$ by minimizing 
\begin{equation}\label{penlik}
I_{\lambda}(y,f) =  \cL (y,f) +\lambda J(f)
\end{equation}
where $\cL (y,f)$ is $\frac{1}{n}$ times the negative log likelihood:
\begin{equation}\label{defL}
\cL(y,f) =\frac{1}{n}  \sum_{i=1}^n [ -y_if(x(i)) + \log (1+e^{f(x(i))})]
\end{equation}
with 
\begin{equation}\label{f.def}
f(x) = \mu +  \sum_{\ell=1}^{N_B-1} c_{\ell}B_{\ell}(x),
\end{equation}
where we are relabeling the $N_B-1$ (non-constant) patterns from $1$ to $N_B-1$, and 
\begin{equation}\label{J}
J(f) =  \sum_{\ell = 1}^{N_B-1} |c_{\ell}|.
\end{equation}

We first solve (\ref{penlik}) with the between group basis functions defined above. The smoothing parameter $\lambda$ is chosen by BGACV. We then build a logistic regression model on the remaining basis functions and select models by the backward elimination method with the selection criterion being BGACV. We call this step the between-group step because of the way we choose variables for the size two patterns. We loop through all possible pairs of groups here and record all the surviving variables.

Now we move to the size two patterns within each group. We call this step the within-group step. Between-group step and within-group step are named group steps. For the $s$th group, the basis functions we consider are $\{B_{t_1t_2} = x_{st_1}\times x_{st_2}, t_1,t_2 = 1,\cdots,g\}$ and $\{B_{t} = x_{st}, t = 1,\cdots,g\}$. There will be a total of $\frac{g(g+1)}{2} + 1$ patterns, including the constant. Note that this number is about half of that of the between-group step, so in practice we combine two neighboring groups, resulting in $N_B = g^2 + g + 1$ patterns. Similar to the above step, we use LASSO followed by logistic regression to do variable selection and record all the surviving variables. Figure \ref{plotgroup} is a graphical presentation of the group steps.

\begin{figure}
  \begin{center}
    \includegraphics[scale= 0.8]{group.ps}\\
  \end{center}
  \caption{The plot of the group step of GLPS. We use 5 groups as an example. The length of the squares is the number of variables in every group. Each dot represents a size two pattern. The x axis indexes which group the first variable is from and the y axis indexes which group the second variable is from. The red squares are all runs in the between-group step and the green triangles are all runs in the within-group step.}
  \label{plotgroup}
\end{figure}

We take all variables that survive either the between-group step or the within-group step. For a surviving size two pattern, both variables are taken. Some of the variables may appear more than once and we only need one replicate. Usually only a small number of variables, denoted by $p^*$, will survive and they can be handled by LPS. We call this step the post-group step. Unlike the usual LPS, we use different smoothing parameters for main effects and size two patterns here. Suppose there are $N_{B_1} ~(=p^*)$ main effects and $N_{B_2} ~(={{p^*}\choose{2}})$ size two patterns. Denote the main effects by $\{B_{1\ell}, \ell = 1,\cdots,N_{B_1}\}$ and the size two patterns by $\{B_{2\ell}, \ell = 1,\cdots,N_{B_2}\}$. The objective function becomes
\begin{equation}
\label{penlik2}
I_\lambda(y,f) = {\cal L}(y,f) + \lambda_1J_1(f) + \lambda_2J_2(f),
\end{equation}
where the link function $f$ is
\begin{equation}
\label{f.def2}
f(x) = \mu + \sum_{\ell=1}^{N_{B_1}}c_{1\ell} B_{1\ell}(x) + \sum_{\ell=1}^{N_{B_2}}c_{2\ell} B_{2\ell}(x),
\end{equation}
and the penalties are
\begin{equation}
\label{J2}
J_1(f) = \sum_{\ell = 1}^{N_{B_1}}|c_{1\ell}|,~~~ J_2(f) = \sum_{\ell = 1}^{N_{B_2}}|c_{2\ell}|.
\end{equation}

The choice of smoothing parameters, $(\lambda_1, \lambda_2)$, in (\ref{penlik2}) are critical. We were surprised to find out that BGACV does not work very well in these circumstances. In a problem where both main effects and size two patterns are important, often times only size two patterns appear and very few times only main effects appear. In the first scenario true main effects appear in size two patterns with some other variables; in the second scenario true size two patterns appear as main effects. We believe that this phenomena is caused by the huge difference in the numbers of coefficients penalized by the two smoothing parameters. $N_{B_2}$ is close to half of the square of $N_{B_1}$. The solution is very sensitive to small changes of smoothing parameters. Grid search method, which is commonly used in the search of smoothing parameters, can not find the exact optimal solutions. To fix this problem, we introduce the following penalty
\begin{equation}
\label{BGACV2}
BGACV2(\lambda_1,\lambda_2) = BGACV(\lambda_1,\lambda_2)\times (1+0.5\frac{|n_{b_1} - n_{b_2}|}{n_{b_1} + n_{b_2}}),
\end{equation}
where $n_{b_1}$ is the number of nonzero coefficients of main effects and $n_{b_2}$ is the number of nonzero coefficients of size two patterns. We add a penalty to force these two numbers to be close. Then the two scenarios described above can not happen. If the true model only has main effects the number of main effects showing up will be smaller than before due to the new penalty. However, BGACV is very conservative (see discussion in \cite{shi:wahba:wright:2008}). We won't miss any important patterns by using the new penalty. We are just making it less conservative. Some size two patterns might show up as well. This can be taken care of by the parametric logistic regression step followed by solving (\ref{penlik2}). The argument for a model with size two patterns only is similar. 

Some minor changes need to be made if we are interested in size three patterns ($q = 3$). Firstly, there will be four group level steps. Let $1\leq s_1 \leq s_2 \leq s_3 \leq k$ be the groups from which the three variables are chosen. We have the following four cases: $s_1<s_2<s_3$, $s_1=s_2 < s_3$, $s_1<s_2=s_3$ and $s_1=s_2=s_3$. In the post-group step, we will still be using two smoothing parameters, one for main effects and one for interactions, including size two patterns and size three patterns. The penalty added to BGACV is slightly different:
\begin{equation}
\label{BGACV3}
BGACV3(\lambda_1,\lambda_2) = BGACV(\lambda_1,\lambda_2)\times (1+0.5\frac{|n_{b_1} - n_a|+|n_{b_2} - n_a|+|n_{b_3} - n_a|}{n_{b_1} + n_{b_2} + n_{b_3}}),
\end{equation}
where $n_{b_1}$ is the number of nonzero main effects, $n_{b_2}$ is the number of nonzero size two patterns, $b_{b_3}$ is the number of nonzero size three patterns and $n_a$ is the average of the three. 

In practice we only implemented $q = 2$ and $q = 3$. Higher order patterns will not be addressed in this paper. However, they will not be hard to implement if necessary. From now on, we use GLPS to denote the case where $q = 2$. And we use GLPS3 to denote the case where $q = 3$. Sometimes, GLPS also means general GLPS with different $q$, depending on the setting. 

The choice of $g$, the number of variables in each group, depends on the power of the computer. On our super server (AMD Dual-Core 2.8 GHz with 64 GB memory), we usually set $g = 2,000$ for $q = 2$. This generates $N_B = 2,001,001$ basis functions, which can be handled by the super server comfortably. On a more standard computer (Intel(R) Pentium(R) 4 2.80GHz with 2 GB memory), we usually set $g = 200$ for $q = 2$ and $g = 35$ for $q = 3$. One big advantage of our algorithm is that the runs in group steps are parallel. We can run them simultaneously on different machines. The computing system Condor ({\tt http://www.cs.wisc.edu/condor/}) provides us an excellent opportunity to do this job. We request all jobs to go to machines with at least 2 GB of memory so that we can use $g = 200$ for $q = 2$ and $g = 35$ for $q = 3$. Different $g$ values usually give very similar results, if not the same. When better computers are available bigger $g$ is preferred because it is usually faster. 

\section{Simulation Studies}
\label{sec.glpssim}
In this section we study the empirical performance of GLPS through three simulated examples. In the first example the data set is relatively small and all variables are independent. Three patterns are included, one main effect and two second order interactions. The second example has a very large data set and strong correlations among neighboring variables. Two main effects and two second order interactions are included. The last example studies the performance of GLPS3. Two main effects, one second order interaction and one third order interaction are included. We compare GLPS with three other methods, Logic regression \cite{ruczinski:kooperberg:leblanc:2003}, Stepwise Penalized Logistic Regression (SPLR) \cite{park:hastie:2008} and Random forest (RF) \cite{breiman:2001}. We use the R package {\tt LogicReg} to run Logic regression, the R package {\tt stepPlr} to run SPLR, and the R package {\tt randomForest} to run random forest. The number of trees and number of leaves in Logic regression are selected by 5-fold cross validation. The smoothing parameter in SPLR is also selected by 5-fold cross validation, and then the model size is selected by BIC.

\subsection{Simulation Example \ref{sec.glps.sim1}}
\label{sec.glps.sim1}
400 iid Bernoulli(0.5) random variables are generated. The sample size $n = 700$. The true logit function is
\[f(x) = -2+1.5X_{50} + 1.5X_{150}X_{250} + 1.5X_{251}X_{252}.\]
We simulated 100 data sets according to this model and ran all four methods on these data sets. 
\begin{table}[htbp]
\caption{The result of Simulation \ref{sec.glps.sim1}. For each pattern, the number outside the parenthesis is the appearance frequency of the pattern; the two numbers inside the parenthesis are the appearance frequencies of the two variables in the pattern (they can appear as main effects, or in different patterns).}
\label{simglps1}
\begin{center}
\begin{tabular}{c|cccc}
  \hline
  Methods&$X_{50}$&$X_{150}X_{250}$&$X_{251}X_{252}$&noise\\\hline
  GLPS &94 (100)&99 (99,99)&96 (97,97)&153\\
  Logic&100 (100)&70 (88,91) &65 (84,90)&190\\
  RF&NA (100)&NA (96,97)&NA (94,96)&517\\
  SPLR &100 (100)&97 (100,97)&91 (100,98)&712\\\hline
\end{tabular}
\end{center}
\end{table}

The results of this simulation are shown in Table \ref{simglps1}. We use two numbers to show the appearance frequency of a main effect. The first number (outside the parenthesis) counts how many times the variable appears as a main effect and we call this number the pattern count; the second number (inside the parenthesis) counts how many times the variables appears in the model, either as a main effect or in some interactions and we call this number the variable count. Similarly, we need three numbers to show the appearance frequency of a size two pattern, one pattern count and two variable counts. The pattern count counts the number of times it appears as the exact pattern and the two variable counts count the number of times each variable appears in the model. Random forest doesn't generate a specific model. It only provides the importance scores of all variables. For each run, we took the top 10 variables as the selected variables. Therefore, we are not able to get the pattern count for random forest. They are denoted by 'NA' in the table. 

GLPS selected all three patterns almost perfectly and the least number of noise patterns. Logic regression didn't do very well in the size two patterns and it selected slightly more noise. Random forest did well in selecting the important variables but it also selected lots of noise. Remember that we took top 10 variables for random forest. The number of noise variables will drop if we take less, but the appearance frequencies of the important variables will also drop. SPLR performed similar to GLPS in selecting the patterns but it selected much more noise. 

\subsection{Simulation Example \ref{sec.sim.glps2}}
\label{sec.sim.glps2}
This simulation studies the behavior of GLPS on a very large data set with correlations among the covarriates. The sample size $n = 1000$ and the number of variables $p = 8000$. $X_i^*\sim N(0,1)$, $i = 1,\cdots,p$. corr$(X_i^*,X_{i+1}^*) = 2/3$, corr$(X_i^*,X_{i+2}^*) = 1/3$, $i = 1,2,\cdots,p-2$. $X_i^*$ and $X_j^*$ are independent if $|i-j|>2$. $X_i$ = 1 if $X_i^*>0$ and 0 otherwise, $i = 1,\cdots,p$. The true logit function is
\[f(x) = -4 + 2X_{500} + 3X_{5000} + 2X_{1000}X_{3000} + 3X_{7000}X_{7002}.\]
The simulation was run 50 times. Logic regression has an upper limit on the number of predictor variables, 1000, so it can not be applied in this example. For random forest, we selected the top 12 variables. The results are shown in Table \ref{simglps2}. GLPS did almost perfectly in selecting the important patterns. The number of noise patterns being selected was the smallest. RF did quite bad on the pattern $X_{1000}X_{3000}$. SPLR did perfectly on all four patterns but the number of noise patterns was huge. There is a parameter in SPLR that needs to be specified by the user, the maximum number of terms. We set it to be 20 here. Actually we reached the maximum in all 50 runs because $(50\times4+800)/50 = 20$. We would reach the maximum even if the maximum number of terms is set to be 50. 
\begin{table}[htbp]
\caption{The result of Simulation Example \ref{sec.sim.glps2}. Logic regression not applied.}
\label{simglps2}
\begin{center}
\begin{tabular}{c|ccccc}
  \hline
  Methods&$X_{500}$&$X_{5000}$&$X_{1000}X_{3000}$&$X_{7000}X_{7002}$&noise\\\hline
  GLPS &50 (50)&50 (50)&48 (48,50)&50 (50,50)&278\\
  RF&NA (50)&NA (50)&NA (28,37) & NA (50,50)&335\\
  SPLR &50 (50)&50 (50)&50 (50,50)&50 (50,50)&800\\\hline
\end{tabular}
\end{center}
\end{table}

\subsection{Simulation Example \ref{sec.sim.glps3}}
\label{sec.sim.glps3}
This simulation studies the behavior of GLPS3 on a large data set. The sample size $n = 1000$ and the number of variables $p = 500$. The correlation structure of these variables is the same as that of the previous example. In other words, two variables are highly correlated if they are next to each other; they are moderately correlated if they are separated by one variable; they are independent if they are separated by at least two variables. The marginal distribution of all 500 variables is Bernoulli(0.5). The true logit function is
\[f(x) = -4 + 2X_{100} + 3X_{200} + 2X_{300}X_{400} + 3X_{150}X_{450}X_{451}.\]
The simulation was run 50 times and the results are shown in Table \ref{simglps3}. GLPS3 did quite well here, although it selected slightly more noise patterns than Logic regression. Logic regression didn't do very well on the two interaction terms. Random forest did well on the size three pattern but it missed the size two pattern quite a few times. Similar to previous examples, SPLR did well in selecting important patterns, but it also selected lots of noise patterns. 

\begin{table}[htbp]
\caption{The result of Simulation Example \ref{sec.sim.glps3}.}
\label{simglps3}
\begin{center}
\begin{tabular}{c|ccccc}
  \hline
  Methods&$X_{100}$&$X_{200}$&$X_{300}X_{400}$&$X_{150}X_{450}X_{451}$&noise\\\hline
  GLPS3 &47 (50) &   50 (50)  &  47 (50,50) &  47 (50,49,48)& 204\\
  Logic & 50 (50)  &  50 (50)  &  34 (43,44)  &  30 (50,44,41)&151\\
  RF&NA (50)&NA (50)&NA (36,40) & NA (49,47,49)&279\\
  SPLR &50 (50)& 50 (50)& 45 (49,50) & 50 (50,50,50)&554\\\hline
\end{tabular}
\end{center}
\end{table}

We summarize the performance of all four methods here. Logic regression can't handle very large data sets and it doesn't do well on the interaction terms. Random forest doesn't give a specific model. It does well most of the time but it can be bad if the signal is not very strong (small coefficients). SPLR is good at picking the right patterns but it also picks lots of noise. GLPS can get the right patterns almost all times and keep the noise within a reasonable amount. 

\section{The Gene Expression Barcode Data}
\label{sec.genebar}
With current microarray technology we are able to measure thousands of RNA transcripts at one time. This allows for the characterization of cells and tissues in greater depth. However, feature characteristics such as probe sequence can cause the observed intensity to be far away from the actual expression. Even though the 'probe effect' is large, it is very consistent across different hybridizations, meaning that the probe effect is very similar when comparing the intensities of different hybridizations for the same gene. Therefore, the majority of microarray data analysis uses relative expression rather than absolute expression. To overcome this, a gene expression bar code (GEBC) \cite{zilliox:irizarry:2007} was proposed recently. They wanted to know what intensity relates to no expression for a given gene and microarray platform. GEBC starts with preprocessing all genes by Robust Multi-array Analysis (RMA) \cite{irizarry:hobbs:2003}. Then for each gene an empirical density smoother was used to estimate the density function of this gene across tissues. The smallest mode of the density function was considered as the expected intensity of an unexpressed gene. Gene expressions to the left of the mode were used to estimate the standard deviation of unexpressed genes. Lastly, they selected a constant $K$ and considered genes to be expressed in tissues when the log expression estimates were $K$ standard deviations larger than the unexpressed mean. $K$ was chosen to be 6 by cross-validation. The expressed genes were coded as 1 and unexpressed genes coded as 0. 

GEBC \cite{zilliox:irizarry:2007} downloaded raw data that is public available from 40 different studies. This resulted in a database of 1092 human samples representing 118 different tissues. Of these, 498 were normal tissues, 500 were breast tumors, and 94 were other diseases. There were a total of 22,215 genes for each sample. In this section we applied GLPS on these data sets after the genes being dichotomized by GEBC. Many of these genes have unbalanced expression levels, i.e., only a small percentage are expressed or unexpressed. These genes are removed from our analysis, with 7,654 genes remained. In our first analysis, we took all normal tissues as controls and all non-breast tumor samples as cases. In the second analysis we considered the survival time of breast cancer patients. We defined those died before 5 years as cases and those were alive after 10 years as controls. 

\subsection{Normal vs Cancer}
In this analysis, all normal tissues and cancer tissues other than breast tumors are used. Breast tumors are excluded because no normal breast tissue is available. There are 503 normal tissues and 70 cancer tissues with a malignancy rate of 12.2$\%$. 


We divided the data into a training set and a test set. The training set consists of 4/5 of the samples and the test set is the rest. We first applied GLPS on the training set with all 7,654 genes. The resulting model is shown in Table \ref{tabcancerglps1}. We picked four main effects and two size two interactions. We tested the performance of this model on the test set, which yields a prediction accuracy of 96.5$\%$ with a sensitivity of 80.0$\%$ and a specificity of 97.0$\%$. 

\begin{eqnarray}
f &=& - 8.81 + 1.51\times NOTCH3 + 2.89\times S100P + 1.22 \times HOXB7 \nonumber\\
  &~& + 1.67\times MFSD10 + 2.51\times VAMP8 \times TMEM47 \nonumber\\
  &~& + 3.17\times VEGFA \times AIM1. 
\end{eqnarray}

We want to compare our model with models fitted by Logic regression and SPLR. However, Logic regression can handle at most 1,000 variables. Therefore we used a screen step to select a subset of most important genes. Here 636 genes were selected and we denoted this the reduced cancer data. Applying GLPS on these genes gives the model in Table \ref{tabcancerglps2}. The prediction accuracy of this model is 98.2$\%$ with a sensitivity of 86.7$\%$ and a specificity of 100$\%$.

\begin{eqnarray}
f &=& - 8.15 + 3.58\times CALU \times ERBB3 + 1.93\times LAMC1\times CD24 \nonumber\\
  &~& + 3.29\times LPCAT1\times ACY1 + 3.75\times FXYD3\times GNL3 \nonumber\\
  &~& + 2.34\times NOTCH3\times CD24. 
\end{eqnarray}

The models in Tables \ref{tabcancerglps1} and \ref{tabcancerglps2} are quite different. They only share one gene in common. However, their predictive power is almost the same. The reason is that many of these genes are highly correlated. Choosing one important gene or another correlated gene doesn't make a big difference. Note that this is a special property of the $l_1$ norm. It tends to pick the most sparse model. An $l_2$ norm will pick all correlated genes that are predictive. 

We also applied GLPS3, Logic regression and SPLR on the reduced data set. The results are summarized in Table \ref{tabcancersum}. We present the number of genes in the model ($\#$ Gene), the number of parameters in the model ($\#$ Para), the highest order of interactions ($q$), the summation of the previous three (Total), prediction accuracy (Pred), sensitivity (Sens) and specificity (Spec) of the model. The first three measure the complexity of the model. We add them together to get an overall criterion. We see that GLPS dominates in almost all criteria. GLPS3 has the smallest number of genes and SPLR has equally best specificity. The detailed models by these methods are listed in Appendix \ref{gebcmore}.


\begin{table}[htbp]
\caption{Summary of Non-breast cancer - 5 fold cv no pred}
\label{tabcancersum2}
\begin{center}
\begin{tabular*}{0.85\textwidth}{@{\extracolsep{\fill}}c|ccccc}
  \hline
Methods&$\#$ Gene & $\#$ Para& $q$ & Total &AUC\\
\hline
GLPS &9.2&6.6&{\bf 2.0}&{\bf 17.8}&{\bf 0.982}\\
GLPS3&{\bf 8.4}&6.4&3.0&{\bf 17.8}&0.945\\
Logic&14.0&{\bf 5.2}&5.0&24.2&0.956\\
SPLR&17.2&20.6&5.6&43.4&0.962\\
\hline
\end{tabular*}
\end{center}
\end{table}


\subsection{Breast Tumors with Survival Time}
The survival time of breast cancer patients depends on many factors, such as grade, stage and oestrogen-receptor status. In this section we study the possible genetic effects on the survival time. we denote patients that died before 5 years as cases and patients that were alive after 10 years as controls. Patients with a censored death time less than 10 years and patients that died between 5 and 10 years are excluded. This gives us a total of 243 patients, among which 80 are cases. The incidence rate is 80/243 = 32.9$\%$. Similar to the previous section we used a screen step, which selected 592 genes and denote this set the reduced breast tumor data.

Table \ref{tabbreastglps} shows the model fitted by GLPS. There are one main effect and four size two interactions. The prediction accuracy of this model is 77.1$\%$ with a sensitivity of 53.3$\%$ and a specificity of 87.1$\%$. 

\begin{eqnarray}
f &=& 3.21 - 1.59\times PODXL - 2.00\times SYNE2\times AKAP11 + 2.05\times CD20\times CREB1\nonumber\\
  &~&  - 1.88\times STAT5A\times MAPT -1.89 \times MAOB\times IFFO1.
\end{eqnarray}

The models fitted by GLPS3, Logic regression and SPLR along with the one by GLPS are summarized in Table \ref{tabbreastsum}. The detailed models by these methods are also listed in Appendix \ref{gebcmore}. Among the seven criteria we present in Table \ref{tabbreastsum}, GLPS does the best in four and Logic regression in three. Logic regression picked a size eight interaction which doesn't make much sense. Note that the incidence rate is 32.9$\%$ so we will get an accuracy of 67.1$\%$ if we predict all to be controls. The performance of Logic regression and SPLR is not quite different from this. GLPS is the only one that stands out. We do not get as high prediction accuracy as in the cancer data set, which implies that there are other factors other than genes that are important to the survival time of breast cancer patients. One of them is grade. We have the grade information so we added it to the model along with all genes. However, it did not show up because it's highly correlated with some genes. 

\begin{table}[htbp]
\caption{Summary of breast cancer - 5 fold cv}
\label{tabbreastsum2}
\begin{center}
\begin{tabular*}{0.85\textwidth}{@{\extracolsep{\fill}}c|cccccccc}
  \hline
Methods&$\#$ Gene & $\#$ Para& $q$ & Total & AUC\\
\hline

GLPS&10.0&6.8&{\bf 2.0}&18.8&{\bf 0.824}\\
GLPS3&10.2&6.6&3.0&19.8&0.780\\
Logic&{\bf 4.4}&{\bf 2.6}&3.8&{\bf 10.8}&0.721\\
SPLR&19.4& 20.6&  5.0& 45.0&0.793\\
\hline
\end{tabular*}
\end{center}
\end{table}

\section{Discussion}
LASSO-Patternsearch Algorithm is a global method that handles binary data very well. However, it doesn't work when the number of parameters is huge. Traditional methods for small $n$ large $p$ problem deals with each variable separately. Grouped LASSO-Patternsearch Algorithm proposed in this chapter is in between a global method and a uni-variate method. It breaks the big problem into smaller ones that are much bigger than those in a uni-variate analysis. We show through simulations that GLPS is better than competing methods in selecting the right variables and controlling the amount of noise. We also show that GLPS gives smaller models with better prediction accuracy through two gene expression data sets. 

Unlike LPS, we use two tuning parameters in GLPS and GLPS3. The tuning criterion is slightly different too. We impose a penalty on the difference between the number of main effects and the number of interactions for GLPS and a penalty on the difference among the numbers of main effects, size two interactions and size three interactions for GLPS3. These penalties prevent the extreme cases where only main effects or interactions come up in the LASSO step. The extreme cases appear way too often with the original criterion. If an extreme case is the truth, the LASSO step will select more noise but the parametric step will still get the right model.
\label{sec.glpsdisc}