Computational Genomics with R 책 가격비교

책 이미지

책 정보

· 제목 : Computational Genomics with R (Paperback, 1)
· 분류 : 외국도서 > 의학 > 전염병학
· ISBN : 9780367634605
· 쪽수 : 462쪽

1. Introduction to Genomics Genes, DNA and central dogma What is a genome? What is a gene? How genes are controlled? The transcriptional and the post-transcriptional regulation What does a gene look like? Elements of gene regulation Transcriptional regulation Post-transcriptional regulation Shaping the genome: DNA mutation High-throughput experimental methods in genomics The general idea behind high-throughput techniques High-throughput sequencing Visualization and data repositories for genomics 2. Introduction to R for Genomic Data Analysis Steps of (genomic) data analysis Data collection Data quality check and cleaning Data processing Exploratory data analysis and modeling Visualization and reporting Why use R for genomics ? Getting started with R Installing packages Installing packages in custom locations Getting help on functions and packages Computations in R Data structures Vectors Matrices Data Frames Lists Factors Data types Reading and writing data Reading large files Plotting in R with base graphics Combining multiple plots Saving plots Plotting in R with ggplot Combining multiple plots ggplot and tidyverse Functions and control structures (for, if/else etc) User defined functions Loops and looping structures in R Exercises Computations in R Data structures in R Reading in and writing data out in R Plotting in R Functions and control structures (for, if/else etc) 3.Statistics for Genomics How to summarize collection of data points: The idea behind statistical distributions Describing the central tendency: mean and median Describing the spread: measurements of variation Precision of estimates: Confidence intervals How to test for differences between samples randomization based testing for difference of the means Using t-test for difference of the means between two samples multiple testing correction moderated t-tests: using information from multiple comparisons Relationship between variables: linear models and correlation How to fit a line How to estimate the error of the coefficients Accuracy of the model Regression with categorical variables Regression pitfalls Exercises How to summarize collection of data points: The idea behindstatistical distributions How to test for differences in samples Relationship between variables: linear models and correlation 4.Exploratory Data Analysis with Unsupervised Machine Learning Clustering: grouping samples based on their similarity Distance metrics Hiearchical clustering K-means clustering how to choose “k”, the number of clusters Dimensionality reduction techniques: visualizing complex data sets in D Principal component analysis Other matrix factorization methods for dimensionality reduction Multi-dimensional scaling t-Distributed Stochastic Neighbor Embedding (t-SNE) Exercises Clustering Dimension Reduction 5.Predictive Modeling with Supervised Machine Learning How machine learning models are fit? Machine learning vs Statistics Steps in supervised machine learning Use case: Disease subtype from genomics data Data preprocessing data transformation Filtering data and scaling Dealing with missing values Splitting the data Holdout test dataset Cross-validation Bootstrap resampling Predicting the subtype with k-nearest neighbors Assessing the performance of our model Receiver Operating Characteristic (ROC) Curves Model tuning and avoiding overfitting Model complexity and bias variance trade-off Data split strategies for model tuning and testing Variable importance How to deal with class imbalance Sampling for class balance Altering case weights selecting different classification score cutoffs Dealing with correlated predictors Trees and forests: Random forests in action decision trees Trees to forests Variable importance Logistic regression and regularization regularization in order to avoid overfitting variable importance Other supervised algorithms Gradient boosting Support Vector Machines (SVM) Neural networks and deep versions of it Ensemble learning Predicting continuous variables: regression with machine learning Use case: Predicting age from DNA methylation reading and processing the data Running random forest regression Exercises Classification Regression 6.Operations on Genomic Intervals and Genome Arithmetic Operations on Genomic Intervals with GenomicRanges package How to create and manipulate a GRanges object Getting genomic regions into R as GRanges objects Finding regions that do/do not overlap with another set of regions Dealing with mapped high-throughput sequencing reads Counting mapped reads for a set of regions Dealing with continuous scores over the genome Extracting subsections of Rle and RleList objects Genomic intervals with more information: SummarizedExperiment class Create a SummarizedExperiment object Subset and manipulate the SummarizedExperiment object Visualizing and summarizing genomic intervals Visualizing intervals on a locus of interest Summaries of genomic intervals on multiple loci Making karyograms and circos plots Exercises Operations on Genomic Intervals with GenomicRanges package Dealing with mapped high-throughput sequencing reads Dealing with contiguous scores over the genome Visualizing and summarizing genomic intervals 7.Quality Check, Processing and Alignment of High-throughput Sequencing Reads FASTA and FASTQ formats Quality check on sequencing reads Sequence quality per base/cycle Sequence content per base/cycle Read frequency plot Other quality metrics and QC tools Filtering and trimming reads Mapping/aligning reads to the genome Further processing of aligned reads Exercises 8.RNA-seq Analysis What is gene expression? Methods to detect gene expression Gene Expression Analysis Using High-throughput Sequencing Technologies Processing raw data Alignment Quantification Within sample normalization of the read counts Computing different normalization schemes in R Exploratory analysis of the read count table Differential expression analysis Functional Enrichment Analysis Accounting for additional sources of variation Other applications of RNA-seq Exercises Exploring the count tables Differential expression analysis Functional enrichment analysis Removing unwanted variation from the expression data 9.ChIP-seq analysis Regulatory protein-DNA interactions Measuring protein-DNA interactions with ChIP-seq Factors that affect ChIP-seq experiment and analysis quality Antibody specificity Sequencing depth PCR duplication Biological replicates Control experiments Using tagged proteins Pre-processing ChIP data Mapping of ChIP-seq data ChIP quality control The data Sample clustering Visualization in the Genome Browser Plus and minus strand cross-correlation GC bias quantification Sequence read genomic distribution Peak calling Types of ChIP-seq experiments Peak calling - sharp peaks Peak calling - Broad regions Peak quality control Peak annotation Motif discovery Motif comparison What to do next? Exercises: Quality control: 10.DNA methylation analysis using bisulfite sequencing data What is DNA methylation ? How DNA methylation is set ? How to measure DNA methylation with bisulfitesequencing Analyzing DNA methylation data Processing raw data and getting data into R Data filtering and exploratory analysis Reading methylation call files Further quality check Merging samples into a single table Filtering CpGs Clustering samples Principal component analysis Extracting interesting regions: segmentation and differential methylation Differential methylation Methylation segmentation Working with large files Annotation of DMRs/DMCs and segments Further annotation with genes or gene sets Other R packages that can be used for methylation analysis Exercises Differential methylation Methylome segmentation 11.Multi-omics Analysis Use case: Multi-omics data from colorectal cancer Latent variable models for multi-omics integration Matrix factorization methods for unsupervised multi-omics data integration Multiple Factor Analysis Joint Non-negative Matrix Factorization iCluster Clustering using latent factors One-hot clustering K-means clustering Biological interpretation of latent factors Inspection of feature weights in loading vectors Making sense of factors using enrichment analysis Interpretation using additional covariates Exercises Matrix factorization methods Clustering using latent factors Biological interpretation of latent factors