Below are some of the current CBB students and their research.
Rotations
| Mark Gerstein | The study of neurodegenerative diseases |
| Michael Krauthammer | Focused on the use of text mining to find related articles |
| Michael Snyder | Validated the predictor for identifying genes in yeast that were essential to its gO (quiescent) state |
Rotations
| Mark Gerstein | Examined transcription factor binding site patterns in yeast |
| Michael Snyder | Analyzed the data produced by chIP-Seq experiments |
| Perry Miller/Kei Cheung | Explored possible data representation methods/structures for high-throughput, next-generation sequencing data |
David is developing statistical/computational methods to better identify gene expression quantitative trait loci (eQTL) underlying complex disease. Once the eQTL are identified he wants to work on relating the eQTL to clinical traits in order to establish (causative) gene networks. He is also working on applying biological/pathway information to genome wide association studies. The objective is to develop new methods to prioritize SNPs for selection for association testing using the known biological information.
Rotations
| Mark Gerstein | Statistical approaches to filtering noise from signal data in microarray experiments |
| Kevin White | Several studies of Drosophila genomes across eight species |
| Perry Miller | Creation of a web application to do single SNP analysis |
Jamie’s research has been focused on understanding the targeting mechanisms of activation induced cytosine deaminase (AID), which is responsible for somatic hypermutation in germinal center B-cells. Her lab has recently been working to identify cis-regulatory modules which are responsible for recruiting AID to the immunoglobulin loci and other recently identified genes. The goal is to identify why some genes are targets of AID and others are not and additionally why some of the mutated genes are repaired in an error-free manner as opposed to other genes that are repaired in an error-prone manner.
Rotations
| David Tuck | Investigated the differences between breast cancer subtypes using microarray data |
| Annette Molinaro | Began the initial setup of a data adaptive system for analysis of tissue microarrays |
| Steven Kleinstein | Analyzed mutations occurring in non-immunoglobulin genes |
Tara studies the link between a small molecule's phenotypic effect and its structural characteristics. Specifically, she uses a variety of machine learning technique to find out more about this relationship.
Rotations
| Michael Snyder | Investigated the binding partners of the putative S. cerevisiae transcription factor Mga1 using chIP chip, an experimental technique that can identify targets of transcription factors |
| Mark Shlomchik | Developed an automated method to construct phylogenetic trees from the sequence data of cells undergoing affinity maturation |
| Mark Gerstein | Measured pathway "disregulation" in microarray data |
Rotations
| Michael Snyder | Established three single-stranded cDNA libraries from different human cell lines for high-throughput 454 sequencing |
| Steven Kleinstein | Investigated the migration patterns of B-cells that are entering and existing the germinal center during affinity maturation |
| Mark Gerstein | Mapped the tanscriptome of the human genome using high-throughout sequencing |
Sujun researches on steroid hormone signaling pathways and breast cancer. Specifically, his research focuses on
Song is studying how outbred stock mice with pedigree information contribute to admixture mapping.
Rotations
| Kei-Hoi Cheung | Web service technology to interoperate biological databases and analyze gene clustering |
| Mark Gerstein | Statistical methods for preprocessing and scoring tiling microarray data |
| Hongyu Zhao | Statistical issues in mapping quantitative trait loci for gene expression levels |
Jia’s research has been focused on genome wide association studies. One approach in obtaining a higher power in detecting the statistically significant associations between SNPs (single nucleotide polymorphisms) and disease status is to perform a summary analysis on several combined studies. This approach is referred to as meta-analysis. However, the challenge in meta-analysis is to achieve comparability between studies. Jia's current research involves exploring various possible approaches in performing meta-analysis on combined sets of Crohn’s disease case-control studies while incorporating different imputation methods in expanding the sample size. In addition, as part of his research, he is also hoping to find solutions to account for the population structures when combining datasets.
Rotations
| William Jorgensen | 3D-docking a ligand library containing 24,000 ligands into the tautomerase site of Macrophage Migration Inhibitory Factor |
| Hongyu Zhao | Analyzed the data and investigated the function of the p38 pathway at the molecular level |
| Kei Cheung | Implemented a web interface that allows users to upload/convert a tab delimited text file |
Kevin is developing a method to accurately determine atomic coordinates for backbone atoms from low resolution RNA crystal structures. To do this, he is using both a reduced representation of RNA developed by the Pyle lab and RNA backbone rotamer library developed by the Richardson lab at Duke University. His goal is to get accurate information about the reduced representation from the electron density, and then determine the appropriate rotamer from this reduced representation data.
Rotations
| Anna Pyle | Comparing and analyzing two methods to examine RNA backbone structure: a reduced representation and a rotamer library |
| Mark Gerstein | Analyzing protein hinges, or large inter-domain motions in proteins |
| Kevin White | Attempting to develop microarray probes that could be used across multiple species of Drosophila |
In one collaboration, Hugo is looking at the genetic variation effect between different strains of Saccharomyces cerevisiae that will lead to a quantitative difference in the binding of transcription factors. He is also surveying the pseudo genes in meta genomics by investigating their distribution by protein families in different geographical locations in prokaryotes. This project seeks to see how different environments and nutrition factors would affect the quantity of pseudo genes, categorized by their parent protein families. In a third project, he is developing a pipeline system for analyzing motifs from SH3 domains using comparative genomics, structural, and genomic approaches.
Rotations
| Perry Miller | Using Web Ontology Language (OWL) to integrate two neuronal database, CoCoDat and SenseLab |
| Mark Gerstein | Working on microarray data optimization |
| Michael Snyder | Analysis of transcription factors for pseudohyphal growth in different yeast strains |
The aim of Karen’s research is to develop methods for discovering patterns in high-dimensional data, specifically in survival data. She is studying non-parametric algorithms for partitioning observations based on their covariate values with the aim of minimizing the residual sum of squares for each partition. She has extended the partDSA (partitioning Deletion Substitution Addition algorithm) to accommodate censored survival data by implementing the Inverse Probability Censoring weighting scheme.
Rotations
| Paul Lizardi | Studied the basis for hypermethylation and hypomethylation in CpG islands |
| Steven Kleinstein | Model mutations of B-cells using a discrete stochastic model so that the number of mutations in each B-cell could be tracked |
| Annette Molinaro | Focused on the challenge of missing data imputation when employing non parametric search algorithms |
ThaiBinh’s work involves mining through biomedical literature in order to map instances of gene strings and diseases to a repository of gene/disease identifiers. The aim of this process is to more easily classify research papers and identify relevant papers for researchers.
Rotations
| Hongyu Zhao | Establishing a database for a large microarray data set used to test the effects of drugs and toxicants on rat organs |
| Michael Krauthammer | Analyzing several methods of term mapping in order to identify terms found in biological abstracts |
| David Tuck | Classification of transcription factors in PubMed abstracts |
Laura is using the complementary approaches of statistical methods and pharmacokinetic modeling to explore possible mechanisms underlying alcohol dependence.
Rotations
| Hongyu Zhao | Evaluating the performance of HapGraph, a program which determines the dependence among genetic loci, by testing it on SNP data from the International HapMap Project |
| Joe Chang | Analyzing data from the Multiple Crime Study, which reports on an isolated population in Russia where individuals have committed multiple crimes. Laura used IDB analysis to determine if any of the genetic markers are linked to mental health or behavioral traits. |
| Kenneth Kidd | Evaluation of markers to cluster people into ethnic populations |

Sara researches protein structure sampling algorithms for applications such as structure determination and protein-protein or protein-ligand interactions. Her research focuses on methods to explore the energy surface more efficiently than standard Monte Carlo sampling algorithms. She is interested in secondary structure initiation, side chain sampling, and global optimization for protein folding.
Rotations
| Mark Gerstein | Helping to develop a server to characterize helix-helix interactions in proteins, taking special interest in interactions involving proteins that sit in the lipid bilayer |
| Andrew Miranker | Looking at a small model peptide that aggregates to determine if it aggregates in silico as well as in vitro |
| Bill Jorgensen | Setting up a folding simulation for a small beta hairpin protein using all-atom simulations in implicit water |
Rotations
| Michael Krauthammer | Investigated detailed relationships between a gene and disease |
| Mark Gerstein | Examined comparative genomics of functional elements in C. elegans and C. briggsae |
| Michael Snyder | Used the ChIP-seq experimental method in order to identify transcription factor binding sites in C. elegans |

Jill's research focuses on epigenetic markers, such as methylation. In particular, she studies the mapping and analysis of these markers on a genome wide scale. The main goals of this work are:
Rotations
| David Tuck | Using in silico modeling of tissue micro-heterogeneity to determine whether interaction between diverse clones of cells can lead to tumorigenesis |
| Paul Lizardi | Studying the mechanisms of isothermal whole-genome amplification using in silico modeling |
| Michael Krauthammer | Quantifying the strength of relationships between pathological terms and genes based on statistics of co-occurence in the literature |
Pavi works with protein-protein interactions networks to find disease-related clusters/genes within them. She uses methods such as dimensionality reduction or diffusion to analyze these networks. She also works with microarray data from melanoma patients to find related genes.
Rotations
| Michael Krauthammer | Analyzing protein interaction networks using graph theory algorithms to find subnetworks of disease genes |
| Mark Gerstein | Building a web interface for Primer3, a program that designs primers for PCR |
| Michael Snyder | Creating an abstract framework for tagging experimental data |
Chong currently works on mapping 5' UTR sequences in yeast by processing data from large-scale 5' RACE experiments. The project attempts to find annotation errors in gene translation start codon positions and original sequencing errors. It also works towards the development of a complete map of 5' UTR sequences in all yeast transcripts.
Rotations
| Mark Gerstein | Examined yeast regulatory networks to find the targets of essential transcription factors |
| Michael Snyder | Performed ChIP-chip experiments on yeast Pol-2 transcription factor to find regulatory binding sites. Also produced 3 biological replicates and submitted data into the UCSC database. |
| Hongyu Zhao | Inferred protein-protein interacting domains using high-throughput data from diverse organisms |
Michael works on developing new techniques, algorithms, and software to efficiently handle the complexity of modeling large and multiscale biological systems. His particular emphasis is on stochastically simulating biochemical reaction networks that are generally intractable using traditional simulation methods. He is applying his new techniques to model the bacterial chemotaxis system in order to study how single cells and populations of cells process information and communicate as they navigate complex environments.
Rotations
| Steven Kleinstein | Developed new computational techniques and software to statistically characterize white blood cell trafficking that was imaged in lymph nodes of live mice |
| Michael Snyder | Worked on microarray based experiments to study how differences in transcription factor binding between several strains of yeast affect observed phenotype |
| Thierry Emonet | Created a stochastic model of the bacterial flagellar motor and used it to study how slow fluctuations in the chemotaxis signaling system affect the swimming behavior of single cells |
Emmett works on creating system models of breast cancer pathology, with a focus on HER2+ breast cancers. He is currently investigating copy number variations in different patients, as well as HER2+ breast cancer cell lines.
Rotations
| Mark Gerstein | Investigated the related network features of bottlenecks |
| David Tuck | Developed a software tool to help with network analysis |
| Steven Kleinstein | Investigated the combined effects of IFN-Lambda with IFN-alpha or IFN-gamma on IFN-stimulated gene expression and Hepatitis C Virus replication in hepatocytes |
The focus of Sebastian’s research is the design of microarray chips that detect patterns of genomic methylation, e.g. in cancer versus normal tissues. He hopes that his analysis of the data derived from experiments using those chips will help to functionally annotate the uncharted genomic regions, known as the "junk" DNA.
Rotations
| David Tuck | Creating a simulation of the DNA damage response pathway in yeast using differential equations and agent-based frameworks |
| Paul Lizardi | Developing and testing novel microarray normalization methods, analyzing methylation changes across the human genome, and determining the potential role of repetitive elements in the human genome |
| Michael Krauthammer | Created custom gene ontologies based on preprocessed literature from PubMed |
Rotations
| Hongyu Zhao | Employed a method that uses both gene expression data and pathway information |
| Michael Snyder | Performed the ChIP procedure for the precipitation of binding sites of RNA polymerase II (pol II), and acetylated Histone IV (Ac-H4) |
| Paul Lizardi | Classified each known SVA sequence in the genome into the consensus sub-family to which it is best aligned |
| Annette Molinaro | Analyzed the lung cancer survival dataset to determine predictors (combination of variables) that determine life expectancy of patients |

Mohamed’s research involves computational analysis of the immune system. Specifically, he is studying Immunoglobulin (Ig) receptor sequences and lineage trees.
Rotations
| Kenneth Kidd | Created interactive simulations to model several population genetics principles |
| Steven Kleinstein | Investigated lineage trees of the B-cell populations for various selection values |
| Perry Miller/Hongyu Zhao | Identified significant genes associated with Age-related Macular Degeneration using a simple GWAS analysis |

Xiaowei's research focuses on two main areas:
Valentin was the first graduate of the CBB program, receiving his PhD in May, 2007. His research focused on informatics issues involved in the analysis of SNP data as it relates to helping determine the genetic basis of disease. He developed statistical algorithms for association analysis of genomic data focused on pathway-based analysis. Valentin developed and investigated performance of algorithms for pivoting clinical data stored in Entity-Attribute-Value modeled databases. He also investigated genomic coverage and copy number polymorphism capabilities of multiple microarray platforms.
Yin worked on developing statistical methods to analyze large-scale genomics and proteomics data and applies these methods to study biological problems. In particular, she developed statistical approaches for the genome-wide protein interaction prediction and protein complex identification in yeast. The protein interaction data and gene expression data from microarray chips have been integrated for signal transduction pathways reconstruction. Yin also worked on the identification of allelic association between genetic variations located on different chromosomes using human HapMap project data.
Tom studied the Affymetrix and Nimblegen microarray technologies. Specifically, he was attempting to quantify the sources of signal variability within tiling microarray experiments. Through studying tiling microarrays, Tom’s goal was to improve the measurement accuracy of these technologies and broaden the scope of their application.