Open-source Molecular Simulation Library (MSL)

MSL is an open-source C++ library for molecular modeling, analysis and design developed by a small group of collaborators. The core objects in MSL are about 100,000 lines of C++ code with about 160 classes. These classes provide functionality of varying complexity; ranging from simple functions like measuring distances and angles between atoms, to geometric transformations of molecules; complex sidechain optimization algorithms and energy minimization routines. Open source libraries such as the GNU Linear Programming Kit (GLPK) and GNU Scientific Library have been leveraged to implement sidechain optimization and energy minimization respectively.

Most of my time is spent either enhancing MSL or writing protein modeling applications using MSL. It is free for download at sourceforge. You can find more details here.

Transmembrane dimers with C∝-mediated hydrogen bonding (CATM)

CATM is an MSL program that predicts how two helical protein segments would associate to form a symmetric homodimer in a membrane environment. It is fundamentally, a search over the space (3 rotational and one translational degrees of freedom) of dimer geometries for structures with favorable energy scores. The search is made tractable by imposing geometric constraints derived from biochemical principles and other sequence-specific analysis.

We have achieved enough efficiency to be able to scan entire genomes on our 128-node computational cluster. Infact, we have applied CATM to all predicted transmembrane proteins in the human genome and the results are availabe here. The entire run, scanning about 2300 proteins takes about 2-3 days.

Energy-based protein sidechain conformation libraries (EBL and BEBL)

Sidechain optimization is an important component of any protein design or structure prediction method. Each position on the given protein backbone is allowed a set of discrete representative geometries or conformations called a conformer or rotamer library. The individual positions are allowed to assume conformations independent of other positions and this combinatorial space is searched for the configuration (one conformation per position) with the lowest energy. The search is performed via popular algorithms (greedy trials, dead end elimination, monte carlo simulated annealing, linear programming, self-consistent mean field) implemented in MSL.

The conformer library is typically created using some kind of geometric clustering, however, we have created an energy-based method based on importance sampling, to compile more efficient conformation libraries. This library, called the EBL outperforms all popular sidechain libraries in terms of efficiency(speed) as well as accuracy of prediction. We have created both backbone-dependent as well as backbone-independent libraries and use them extensively in all our protein modeling aplications.