In the second half of the quarter, the students will research, design, and implement a translational bioinformatics project, with a topic of their choice.
All times for due dates are 5:00 pm (Pacific), unless otherwise noted. Submissions should be emailed to bmi217submit@gmail.com.
Each student will give an oral presentation about their project in class on March 19 in a special session between 12:15 and 3:15 pm. Students taking the class for 4 units will speak for 10 minutes or less. Those taking BMI218 for 2 credits will have 5 minutes. There will be ~2 minutes for questions while the next speaker sets up. We may end a few minutes late. Timing will be adjusted as we figure out how many students stick with the course until the end.
Remote students will have to audio-record a presentation along with their slides and send to us before the deadline. These will be played and projected in the same session as the other students' presentations. Like the other students, remote students must keep to time.
Otherwise, there are no specific requirements for the presentation other than being within the time limit. Cover the details of your work, with some background, methodology, your findings, what you learned, and what you would do next.
Contact us ASAP if you have a conflict.
The final project report should be in the format of a scientific research paper: title, abstract, introduction, methods, results, discussion, references.
You will also submit your code, which does not count towards the page limit. We will base approximately 10% of your written report grade on your code implementation (primarily clarity of methods though code commenting/documentation).
There is a 10 page limit, including figures and references.
The report should be along the lines of a research grant proposal. Write as if you are going to actually implement the project. It should include the following sections.
There is a 5 page limit, including figures and references.
The final presentation could be slides built around each of these sections.
The title and abstract should be written in the style of a research paper or grant proposal. For advice on writing an abstract, see http://www.ece.cmu.edu/~koopman/essays/abstract.html.
The Specific Aims section should include 2-3 specific goals and/or milestones of your project, in the style of a grant proposal. For advice on writing Specific Aims, see NIAID.
The final project grading breakdown is
You will be graded using roughly the same criteria that NIH uses for grant proposals:
1. Approach: Are the conceptual or clinical framework, design, methods, and analyses adequately developed, well integrated, well reasoned, and appropriate to the aims of the project? Does the student acknowledge potential problem areas?
2. Innovation: Is the project original and innovative? For example: Does the project challenge existing paradigms or clinical practice; address an innovative hypothesis or critical barrier to progress in the field? Does the project develop or employ novel concepts, approaches, methodologies, tools, or technologies for this area?
3. Significance: Does this study address an important problem? If the study is successful and accurate, how will scientific knowledge or clinical practice be advanced? What will be the effect of these studies on the concepts, methods, technologies, treatments, services, or preventative interventions that drive this field?
4. Success: Did the project work? Can findings be reported, either positive or negative?
The final papers for BIOMEDIN217 should be formatted by the guidelines of the sumbission of the CSB conference held here at Stanford.
You can read more about the conference here.
Call for Papers - Computational Systems Bioinformatics
The Computational Systems Bioinformatics Conference will be held this year on August 10-13, 2008 on the Stanford campus. This conference has clearly supported students in the past by increasing opportunities for student presentations and inviting students and post-docs as featured speakers and tutors. Many student projects from BMI classes were first presented as posters or papers here and (with further development) have been accepted as papers at other conferences or as journal articles. I strongly encourage you to submit your work since early publication of your work is highly beneficial. Since the conference is held on campus, it is very affordable. Your research advisors and programs are much more likely to support your attendance at this conference, using this venue as an otherwise unaffordable opportunity to build you up scientifically. If you don't have a research advisor yet, cost is a major consideration and CSB conference organizers have been extremely sympathetic in the past. Most other local conference do not attract such a wide audience, the proceedings are not indexed in Medline, the scientific quality is either not as deep or not as wide. In addition, many BMI alumni find this conference especially attractive because they know the conference organizers very well; this is the conference where they presented their first poster, paper, presentation or joined the organizing committee. Last but not least, the BMI program plans to hold a mixer during this time to enhance your opportunity to meet with your peers and the alumni. This is a great opportunity to find out what is happening, learn about the job market, and have fun!
The Call for Papers can be found here
The program committee is interested in medical informatics; you may submit your work with full confidence that your work will be fully and carefully reviewed.
Critical Dates:
Please contact Betty Cheng (betty.cheng@stanford.edu) if you have questions about the conference.
These are scanned in from print-outs, so please excuse the errors in optical character recognition.
Alexander Morgan - Regulators of Senescence Linked Transcriptional Profiles
We used several publicly available microarray datasets to develop a model of a shared mammalian transcriptional profile for aging which is able to effectively recapitulate genes previously associated with human aging; area under ROC curve of 71 %. Using this transcriptional profile and predicted human transcription factor binding sites, we suggest several transcription factors (XBP-l, E4BP4, Sox-5, AMLl, and AP-I) as being associated with age related regulation of expression. These are then potential targets for interfering with the process of biological senescence.
Ibrahim Emam - Using Gene Regulatory Networks to Study Gene Interactions in Human Liver Cancer
Gene Regulatory Networks are promising conceptual models to understanding gene interactions at higher cellular levels. A Bayesian network model was used to build two gene networks to examine the difference between gene interactions in normal human liver tissues versus Human He!'.~~na (HCC) resulting from chronic hepatitis caused by HCVanaHBV. Results indicate that this approach can reveal valuable In[ormafiOif1OS;;r;;;;tisi;(h;.t might enable them to discover the mechanism of tumor development in HCC as well as discover new targets for drug development.
David Chen - Using Clinical Laboratory Data to Infer Gene Function
The elucidation of genes involved in biological processes has involved the examination of gene expression for particular samples of interest compared to normal or control samples through a combination of laboratory work and statistical analysis. Although genes that change significantly for a particular condition give us valuablc insight into biological processes, the mechanisms by which they work and their effects on the system as a whole arc more often than not unknown. Model organisms arc then used to probe the phenotypic expression that pertUlbation of genes of interest causes. We propose an in silica method to examine the effects of phenotypic expression of genes by examining elinieal laboratory profiles across 8 distinct diseases and comparing that to gel1e expression acroSs the same diseases. We show that this method can retrieve relevant clinical biomarkers that arc correlated with gene expression pallems.
Sarah Aerni - Integration of Copy Number Variation, Putative Regulatory Modules, and Gene Expression Data to Link Genome Rearrangments to Breast Cancer Subtypes
Breast cancer is a highly heterogeneous disease, with individual tumors composed of cells with highly variable genomic structure and gene expression patterns. New technologies have emerged which allow researchers to examine the genomes of tumor cells without incurring the cost of sequencing. Array eomparative genomic hybridization (aCGH) can been used to assess segmental duplications and deletions, and can correlate copy number variability to changes in expression profiles and disease severity. End sequence profiling (ESP) has been used to examine genome rearrangements by identifying genomic breakpoints. However, neither of these methods has been completely effeetive at giving a complete picture of the state of a tumor genome, and as a result only a few examples have been successful at elucidating the mechanisms responsible for disease progression. By using aCGH data we provide a tool to give additional insight into genome rearrangement detected by ESP, and examine regulatory changes which maybe responsible for disease progression and prognosis.
Daniel Holbert - A Comparative Study of Cellular Responses to Microgravity
I perfonned two separate analyses to explore the cellular effects of microgravity. My first goal was to find genes that exhibit similar responses to microgravity across species. To achieve this, I compared expression patterns from human T-cells, mouse osteoblasts, and yeast cells to find genes that are consistently up- or down-regulated in microgravity environments. My second goal was to explore to what extent the human T-cell microgravity sample resembled various immunodeficiency syndromes. I compared the microgravity T-cell dataset to other datasets of T-cells with HIV, classical Hodgkin's lymphoma, and sarcoidosis.
Noah Zimmerman - Analyzing Time Course Expression Data Using Go Term Enrichment
Time course microarray experiments are a powerful tool for measuring the dynamic expression of genes over time. Recently, there has been progress in developing tools and techniques that are appropriate for analyzing the temporal microarray data. We present a method that builds on these techniques by describing the expression of genes at each time point using the Gene Ontology. We apply the method to data obtained from human brain samples and show that groups of differentially expressed genes can be clustered by time point and function to create a timeline of process activities.
Marc Schaub - Prediction of mRNA Expression in Cancer Using microRNA Expression Levels
MicroRNAs playa key role in gene regulation by lowering the abundance of mRNA transcripts, Recent studies have indicated that microRNA are involved in a large number of biological processes and playa role in cancer, In this work, we search for clusters in which the variation in mRNA expression level is mainly controlled by microRNAs, and then learn models of mRNA expression within these cluster. We start by collecting a large set of micro-array experiments, and use them to find clusters of co-expressed genes, In each cluster, we search for microRNA binding sites whose frequency is enriched in the cluster. We then try to predict gene expression in different types of normal and cancer cells that were not part of the data used for the clustering, and for which we have measurements of both mRNA and microRNA expression levels. In each cluster in which a pattern of several microRNAs are significantly enriched, we learn a model using only the microRNAs in the pattern, We use cross-validation in order to evaluate the performance of the learned models. Unfortunately, this approach is unable to find very significant microRNA binding site enrichment in clusters of co-expressed genes, and the learned models fail to predict gene expression, We discuss how this approach could be extended to incorporate transcriptional regulation, as well as issues related to the available data.
Don Rule - Searching for Correlations Between Compound Mechanism and Gene Expression Data
This is a project that compared the mechanism of action for a set of anti-cancer agents and gene expression profiles within the NCI-60 data set. I compared the variations among cells in their susceptibility to anti-cancer agents of a known class to the gene expression profiles for those same cells. The results pointed to a number of genes implicated in various forms of cancer.
Megan So - Translation in Volume As Well As Meaning_A Investigation of Aniaml and Human Responses to Drugs
Translational medicine is a science which relies upon accurate and sensitive animal models. But, how sensitive are they? In this study, I investigated the ability of the rat model of five diseases (hypertension, type II diabetes, ulcers, Alzheimer's disease, and arthritis) to reflect the human response to treatment in what I call “volume”: that is, do the rat models reproduce details of human response such as which drugs are more effective than others') I investigated both translation of efficacy, as well as translation of adverse events, with a literature search and a microarray analysis. I found that the accuracy of the rat model in predicting human responses to drugs varied depending on disorder, and that the key in translational medicine is finding the correct model. To facilitate this goal of finding and evaluating animal models, I also developed a simple tool for performing PubMed searches on animal models of drugs whieh have a high incidence of adverse events.
Tim Chang - Application of Logic Regression to F2 BXH Mouse Strains
Correlation of genetic polymorphisms to phenotypes in complex diseases such as obesity is difficult because of epistatic interactions. Standard linear regression techniques cannot model thc interactions of features. Logic regression is a potential solution. Some 22 phenotypes were studied for F2 BXH mice with 1347 single nucleotide polymorphisms (SNP) measured. Of interest was the discovery of Esm1 for the phenotype LDL and VLDL in a model of two Boolean expressions where each contained two predictors. Esml has becn shown to be involvcd in human obesity, but was insignificant in Markov Chain Montc Carlo (MCMC) sampling. Insulin was found to have significant SNP predictors using MCMC but no biologically information could bc deduced about them. Total fat, aortic calcification and leptin did not show any significant predictors using MCMC. Thcse results suggest that logic regression prefers low complexity models, which may be too sparse to compare with known biological data. In turn, the proposal of novel biologically relevant correlations will seem unreliable. The future extension and application of logic regression's multiple interaction modeling still seems promising with its richer and more biologically plausible expressive power.
Marina Sirota - Integrative Biology Approach to Identifying Novel Tumor Antigens and Fusion Proteins Through Antibody and Gene Expression Profiling
Arrays of immobilized proteins (ProtoArray) have been developed for the discover and characterization of novel protein biomarkers specific for infectious diseases, cancers, and autoimmune diseases. In this study, plasma collected from several leukemia patients one year post-transplant, pre-transplant, and their donors. The ProtoArray allows us to at once screen thousands of antibodies of interest for targets of allogeneic antibodies. We have identified a set of potential tumor antigens through subtraction of antibody levels present in AML patients in comparison to healthy individuals. We examined the presence of coding non-synonymous single nucleotide polymorphisms in each of~ proposed tumor antigens. The list of antigens targeted in leukemia patients was intersected with gene expression data relevant to AML to find if there was significant over-representation of genes with differential antibody expression in one or more leukemia experiments. We tested if the set of potential tumor antigens is enriched in genes near known leukemia breakpoints. In this paper we present a set of new tumor antigens as potential drug targets for leukemia. To complement our computational analysis, the novel potential tumor antigens presented in this paper require validation by large clinically characterized patient samples,
Christopher Egner - Learning Pathological Genetic Mutation Patterns in Disease
Single nucleotide polymorphisms (SNPs) are one of the most common mutations in the human genome and are thought to account for the majority of sequence differences between individual humans. Through the examination of patterns in SNPs, I explore the extent to which genotypic variations are associated with phenotypic states, specifically Parkinson's Disease (PD). First, I derive arguments as to why simpler statistical techniques may be insufficient given the nature of the data. I then explore the extent to which machine learning techniques may be used to classify patients as having PO or not I further show how the learned hypotheses may be used to guide research into which SNPs and genes may play a role in the development of PD. From a subset of the SNPs on chromosome II, I construct a classifier for PD that achieves an estimated generalisation error of less than 5%.
Irene Liu - Deciphering Genetic and Environmental Influences on Human Diseases
We propose to create three networks of diseases, one based on genetic factors alone, one based on environmental factors alone, and one based on disease co-occurrence data. By comparing these three networks, we will be able to identify diseases that are influenced primarily by either genetic or environmental factors, as well as diseases that are influenced by both. The relative impact of various factors can also be calculated. The results will provide insights on the best approaches for disease prevention and control.
Gavi Kohlberg - Significance of human genetic variation compared to viral genotype in deciding course of therapy in HIV-1 infected patients
David Feliciano - Pharmacogenomic Variation in Anti-Angiogenic Treatment Response
T.B.A.