Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Executive Committee 
 Postdocs 
 Visitors 
 Students 
 Research 
 Publications 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 P/T Colloquia 
 Archive 
 Ulam Scholar 
 
 Postdoc Nominations 
 Student Requests 
 Student Program 
 Visitor Requests 
 Description 
 Past Visitors 
 Services 
 General 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Monday, April 07, 2014
3:00 PM - 4:00 PM
CNLS Conference Room (TA-3, Bldg 1690)

Colloquium

Quantitative modeling of transcription factor binding specificities using DNA shape

Remo Rohs
University of Southern California

Our current knowledge of genome function is the result of sequence-based data in the form of one-dimensional strings of letters. However, DNA-binding proteins recognize the double helix as a three-dimensional object. Therefore, an understanding of transcription factor (TF) binding specificity must ultimately include DNA shape. The sequence-structure relationship in DNA is highly degenerate, and different nucleotide sequences can give rise to the same structure, while single nucleotide sequence variants sometimes change DNA shape over a region of several base pairs. To explore these effects on a genomic scale, we developed a method for the high-throughput DNA shape features. We used these structural features to augment nucleotide sequence in binding specificity models derived from statistical machine learning approaches such as support vector regression (SVR) and regularized multiple linear regression (MLR). Using these approaches, we learned in vitro DNA binding specificity models from protein binding microarray (PBM), genomic-context PBM, and HT-SELEX/SELEX-seq data. Based on data for many TFs from diverse protein families, we demonstrated that shape-augmented models are generally more efficient than existing sequence models in terms of accuracy, number of features, and computation time. Our models provide information on the importance of specific DNA sequence and shape features and thus reveal TF family-specific readout mechanisms and better explain why a given TF binds in vivo to a specific genomic target site.

Host: Boian Alexandrov