-
Data Mining with R
The versatile capabilities and large set of add-on packages make R an excellent alternative to many existing and often expensive data mining tools. Exploring this area from the perspective of a practitioner, Data Mining with R: Learning with Case Studies uses practical examples to illustrate the power of R and data mining. Assuming no prior knowledge of R or data mining/statistical techniques, the book covers a diverse set of problems that pose different challenges in terms of size, type of data, goals of analysis, and analytical tools. To present the main data mining processes and techniques, the author takes a hands-on approach that utilizes a series of detailed, real-world case studies: Predicting algae blooms Predicting stock market returns Detecting fraudulent transactions Classifying microarray samples With these case studies, the author supplies all necessary steps, code, and data. Web Resource A supporting website mirrors the do-it-yourself approach of the text. It offers a collection of freely available R source files that encompass all the code used in the case studies. The site also provides the data sets from the case studies as well as an R package of several functions. -
Introduction to Scientific Programming and Simulation Using R
Known for its versatility, the free programming language R is widely used for statistical computing and graphics, but is also a fully functional programming language well suited to scientific programming. An Introduction to Scientific Programming and Simulation Using R teaches the skills needed to perform scientific programming while also introducing stochastic modelling. Stochastic modelling in particular, and mathematical modelling in general, are intimately linked to scientific programming because the numerical techniques of scientific programming enable the practical application of mathematical models to real-world problems. Following a natural progression that assumes no prior knowledge of programming or probability, the book is organised into four main sections: * Programming In R starts with how to obtain and install R (for Windows, MacOS, and Unix platforms), then tackles basic calculations and program flow, before progressing to function based programming, data structures, graphics, and object-oriented code * A Primer on Numerical Mathematics introduces concepts of numerical accuracy and program efficiency in the context of root-finding, integration, and optimization * A Self-contained Introduction to Probability Theory takes readers as far as the Weak Law of Large Numbers and the Central Limit Theorem, equipping them for point and interval estimation * Simulation teaches how to generate univariate random variables, do Monte-Carlo integration, and variance reduction techniques In the last section, stochastic modelling is introduced using extensive case studies on epidemics, inventory management, and plant dispersal. A tried and tested pedagogic approach is employed throughout, with numerous examples, exercises, and a suite of practice projects. Unlike most guides to R, this volume is not about the application of statistical techniques, but rather shows how to turn algorithms into code. It is for those who want to make tools, not just use them. -
Time Series Analysis
Time Series Analysis With Applications in R, Second Edition, presents an accessible approach to understanding time series models and their applications. Although the emphasis is on time domain ARIMA models and their analysis, the new edition devotes two chapters to the frequency domain and three to time series regression models, models for heteroscedasticity, and threshold models. All of the ideas and methods are illustrated with both real and simulated data sets. A unique feature of this edition is its integration with the R computing environment. The tables and graphical displays are accompanied by the R commands used to produce them. An extensive R package, TSA, which contains many new or revised R functions and all of the data used in the book, accompanies the written text. Script files of R commands for each chapter are available for download. There is also an extensive appendix in the book that leads the reader through the use of R commands and the new R package to carry out the analyses. -
Introductory Time Series with R
Yearly global mean temperature and ocean levels, daily share prices, and the signals transmitted back to Earth by the Voyager space craft are all examples of sequential observations over time known as time series. This book gives you a step-by-step introduction to analysing time series using the open source software R. Each time series model is motivated with practical applications, and is defined in mathematical notation. Once the model has been introduced it is used to generate synthetic data, using R code, and these generated data are then used to estimate its parameters. This sequence enhances understanding of both the time series model and the R function used to fit the model to data. Finally, the model is used to analyse observed data taken from a practical application. By using R, the whole procedure can be reproduced by the reader. All the data sets used in the book are available on the website http://www.massey.ac.nz/~pscowper/ts. The book is written for undergraduate students of mathematics, economics, business and finance, geography, engineering and related disciplines, and postgraduate students who may need to analyse time series as part of their taught programme or their research. Paul Cowpertwait is a senior lecturer in statistics at Massey University with a substantial research record in both the theory and applications of time series and stochastic models. Andrew Metcalfe is an associate professor in the School of Mathematical Sciences at the University of Adelaide, and an author of six statistics text books and numerous research papers. Both authors have extensive experience of teaching time series to students at all levels. -
Bioconductor Case Studies (Use R)
Bioconductor software has become a standard tool for the analysis and comprehension of data from high-throughput genomics experiments. Its application spans a broad field of technologies used in contemporary molecular biology. In this volume, the authors present a collection of cases to apply Bioconductor tools in the analysis of microarray gene expression data. Topics covered include: (1) import and preprocessing of data from various sources; (2) statistical modeling of differential gene expression; (3) biological metadata; (4) application of graphs and graph rendering; (5) machine learning for clustering and classification problems; (6) gene set enrichment analysis. Each chapter of this book describes an analysis of real data using hands-on example driven approaches. Short exercises help in the learning process and invite more advanced considerations of key topics. The book is a dynamic document. All the code shown can be executed on a local computer, and readers are able to reproduce every computation, figure, and table. -
R语言:实用数据分析和可视化技术
资深数据专家凝炼数十年教学和实践经验,全面阐释如何使用R的20%功能完成80%的现代数据工作。 本书是资深数据专家数十年教学与实践经验的结晶,以简单直接的方式详细讲解R语言的所有基础知识,以及常见统计方法和模型在R中的操作规范,通过大量实例,帮助读者快速理解并掌握R的核心功能,有效解决实际工作问题。 本书共24章,第1~3章介绍R语言的获取与安装、R环境的设置以及R包的基础知识;第4~5章介绍R语言基础知识和高级数据结构,涉及数学运算、向量、调用函数以及数据框、列表、矩阵和数组等;第6章介绍如何导入数据;第7章详细介绍统计图形的绘制,包括基本绘图和ggplot2;第8~10章介绍R函数编写,包括对结构、参数和返回规则的讨论,讲解if和ifelse以及复杂语句控制程序的流程、for和while循环迭代等;第11~13章介绍数据的分组操作、数据整理和字符串操作;第14~15章介绍概率分布与描述性统计;第16~20章介绍线性模型、广义线性模型、模型诊断、正则化与压缩以及非线性模型等;第21章介绍时间序列和自相关;第22章介绍各种聚类方式,包括K-means和分层聚类;第23章讨论可重复性、报告和利用knitr滑动展示;第24介绍如何创建R包。