Below are some of the current CBB students and their research focuses.
Rotations
| Mark Gerstein | Statistical approaches to filtering noise from signal data in microarray experiments |
| Kevin White | Several studies of Drosophila genomes across eight species |
| Perry Miller | Creation of a web application to do single SNP analysis |
Valentin works on disease-oriented analysis of high-density microarray genomic data and on clinical database modeling. He uses Affymetrix 100K and 500K SNP GeneChips to investigate the association of genomic loci with complex diseases, such as age related macular degeneration and cardiac diseases. He is also investigating copy number changes in cancer using Affymetrix, Nimblegen, and Illumina high-density genomic microarrays. One particular focus of his research is developing tools for pathway-based association analysis of genomic data. He is also investigating issues related to the use of the entity-attribute-value modeling approach to clinical databases, such as improving performance through in-memory data processing.
Rotations
| Michael Snyder | Performed wet-lab work for microarray hybridization. Used mRNA to prepare labeled cDNA, hybridized it to Nimblegen microarrays, and scanned and analyzed data. |
| Perry Miller | Worked on quality control analysis for microarray data in the Yale Microarray Database |
| Mark Gerstein | Investigated various data mining techniques for predicting protein solubility and pseudogene classification |
Rotations
| David Tuck | Investigated the differences between breast cancer subtypes using microarray data |
Tara studies the link between a small molecule's phenotypic effect and its structural characteristics. Specifically, she uses a variety of machine learning technique to find out more about this relationship.
Rotations
| Michael Snyder | Investigated the binding partners of the putative S. cerevisiae transcription factor Mga1 using chIP chip, an experimental technique that can identify targets of transcription factors |
| Mark Shlomchik | Developed an automated method to construct phylogenetic trees from the sequence data of cells undergoing affinity maturation |
| Mark Gerstein | Measured pathway "disregulation" in microarray data |
Sujun researches on steroid hormone signaling pathways and breast cancer. Specifically, his research focuses on
Rotations
| Kei-Hoi Cheung | Web service technology to interoperate biological databases and analyze gene clustering |
| Mark Gerstein | Statistical methods for preprocessing and scoring tiling microarray data |
| Hongyu Zhao | Statistical issues in mapping quantitative trait loci for gene expression levels |
Rotations
| William Jorgensen | 3D-docking a ligand library containing 24,000 ligands into the tautomerase site of Macrophage Migration Inhibitory Factor. |
Kevin researches RNA structure. In particular, he is developing a method to accurately determine positions of backbone atoms from low resolution crystal structures. This involves combining information from a backbone roatmer library and a reduced representation of RNA developed in the Pyle lab, which involves only two atoms per residue. He is also continuing research from his rotation in the Gerstein lab on predicting protein hinges.
Rotations
| Anna Pyle | Comparing and analyzing two methods to examine RNA backbone structure: a reduced representation and a rotamer library |
| Mark Gerstein | Analyzing protein hinges, or large inter-domain motions in proteins |
| Kevin White | Attempting to develop microarray probes that could be used across multiple species of Drosophila |
Rotations
| Perry Miller | Using Web Ontology Language (OWL) to integrate two neuronal database, CoCoDat and SenseLab |
| Mark Gerstein | Working on microarray data optimization |
| Michael Snyder | Analysis of transcription factors for pseudohyphal growth in different yeast strains |
Yin works on developing statistical methods to analyze large-scale genomics and proteomics data and applies these methods to study biological problems. In particular, she developed statistical approaches for the genome-wide protein interaction prediction and protein complex identification in yeast. The protein interaction data and gene expression data from microarray chips have been integrated for signal trasndcution pathways reconstruction. Yin is also working on the identification of allelic association between genetic variations located on different chromosomes using human HapMap project data.
Rotations
| Perry Miller | Developed an XML-based approach to standardizing microarray data analysis |
| Mark Gerstein | Performed statistical analysis on annotated human pseudogenes and helped construct an online human pseudogene database |
| Hongyu Zhao | Investigated the underlying evolutionary mechanisms of human pseudogenes |
Rotations
| Paul Lizardi | Studied the basis for hypermethylation and hypomethylation in CpG islands. |
ThaiBinh's current project involves mining through biomedical literature in order to map instances of gene strings and diseases to a respository of gene/disease identifiers. The aim of this process is to more easily classify research papers and identify relevant papers for researchers.
Rotations
| Hongyu Zhao | Establishing a database for a large microarray data set used to test the effects of drugs and toxicants on rat organs |
| Michael Krauthammer | Analyzing several methods of term mapping in order to identify terms found in biological abstracts |
| David Tuck | Classification of transcription factors in PubMed abstracts |
Laura is analyzing two related programs used to study genotype data. The PHASE program infers haplotypes from genotype data. FastPHASE is a much faster program; however, it is thought to be less accurate than PHASE. She is currently comparing the two algorithms to determine whether the supposed increased accuracy of the PHASE results is worth the computation time.
Rotations
| Hongyu Zhao | Evaluating the performace of HapGraph, a program which determines the depencence among genetic loci, by testing it on SNP data from the International HapMap Project |
| Joe Chang | Analyzing data from the Multiple Crime Study, which reports on an isolated population in Russia where individuals have commited multiple crimes. Laura is using IDB analysis to determine if any of the genetic markers are linked to mental health or behavioral traits. |
| Kenneth Kidd | Evaluation of markers to cluster people into ethnic populations |

Sara researches protein structure sampling algorithms for applications such as structure determination and protein-protein or protein-ligand interactions. Her research focuses on methods to explore the energy surface more efficiently than standard Monte Carlo sampling algorithms. She is interested in secondary structure initiation, side chain sampling, and global optimization for protein folding.
Rotations
| Mark Gerstein | Helping to develop a server to characterize helix-helix interactions in proteins, taking special interest in interactions involving proteins that sit in the lipid bilayer |
| Andrew Miranker | Looking at a small model peptide that aggregates to determine if it aggregates in silico as well as in vitro |
| Bill Jorgensen | Setting up a folding simulation for a small beta hairpin protein using all-atom simulations in implicit water |

Tom studies the Affymetrix and Nimblegen microarray technologies. Specifically, he is attempting to quantify the sources of signal variability within tiling microarray experiments. Through studying tiling microarrays, Tom hopes to improve the measurement accuracy of these technologies and broaden the scope of their application.

Jill's research focuses on epigenetic markers, such as methylation. In particular, she studies the mapping and analysis of these markers on a genome wide scale. The main goals of this work are:
Rotations
| David Tuck | Using in silico modeling of tissue micro-heterogeneity to determine whether interaction between diverse clones of cells can lead to tumorigenesis |
| Paul Lizardi | Studying the mechanisms of isothermal whole-genome amplification using in silico modeling |
| Michael Krauthammer | Quantifying the strength of relationships between pathological terms and genes based on statistics of co-occurence in the literature |
Pavi works with protein-protein interactions networks to find disease-related clusters/genes within them. She uses methods such as dimensionality reduction or diffusion to analyze these networks. She also works with microarray data from melanoma patients to find related genes.
Rotations
| Michael Krauthammer | Analyzing protein interaction networks using graph theory algorithms to find subnetworks of disease genes |
| Mark Gerstein | Building a web interface for Primer3, a program that designs primers for PCR |
| Michael Snyder | Creating an abstract framework for tagging experimental data |
Chong currently works on mapping 5' UTR sequences in yeast by processing data from large-scale 5' RACE experiments. The project attempts to find annotation errors in gene translation start codon positions and original sequencing errors. It also works towards the development of a complete map of 5' UTR sequences in all yeast transcripts.
Rotations
| Mark Gerstein | Examined yeast regulatory networks to find the targets of essential transcription factors |
| Michael Snyder | Performed ChIP-chip experiments on yeast Pol-2 transcription factor to find regulatory binding sites. Also produced 3 biological replicates and submitted data into the UCSC database. |
| Hongyu Zhao | Inferred protein-protein interacting domains using high-throughput data from diverse organisms |
Rotations
| Steven Kleinstein | Developed computational and statistical techniques to characterize the global migration pattern of B cells |
Rotations
| Mark Gerstein | Investigated the related network features of bottlenecks |
Sebastian's research interests include novel ways of annotating "junk" and other unannotated regions of DNA using static data widely available through Genome Browser repositories in conjunction with various dynamic data i.e. experimental datasets such as methylation. His focus is to unravel the mysteries of the epigenetic component of our genome regulation and accessibility.
Rotations
| David Tuck | Creating a simulation of the DNA damage response pathway in yeast using differential equations and agent-based frameworks |
| Paul Lizardi | Developing and testing novel microarray normalization methods, analyzing methylation changes across the human genome, and determining the potential role of repetitive elements in the human genome |
| Michael Krauthammer | Created custom gene ontologies based on preprocessed literature from PubMed |

Rotations
| Kenneth Kidd | Created interactive simulations to model several population genetics principles |
| Steven Kleinstein | Investigated lineage trees of the B-cell populations for various selection values |

Xiaowei's research focuses on two main areas: