-
Introduction to Data Mining
Introduction to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first time. Each concept is explored thoroughly and supported with numerous examples. The text requires only a modest background in mathematics. Each major topic is organized into two chapters, beginning with basic concepts that provide necessary background for understanding each data mining technique, followed by more advanced concepts and algorithms. Quotes This book provides a comprehensive coverage of important data mining techniques. Numerous examples are provided to lucidly illustrate the key concepts. -Sanjay Ranka, University of Florida In my opinion this is currently the best data mining text book on the market. I like the comprehensive coverage which spans all major data mining techniques including classification, clustering, and pattern mining (association rules). -Mohammed Zaki, Rensselaer Polytechnic Institute -
Data Mining
As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights for the new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; plus much more; algorithmic methods at the heart of successful data mining-including tried and true techniques as well as leading edge methods; performance improvement techniques that work by transforming the input or output; and, downloadable Weka, a collection of machine learning algorithms for data mining tasks, including tools for data pre-processing, classification, regression, clustering, association rules, and visualization-in a new, interactive interface. -
社会计算
在刚过去的十年我们见证了共享Web和社会媒体的诞生,它们用各种富有创意的方式将人们联系在一起。目前,成千上万的用户忙着在线玩、加标签、工作以及开展社交活动,合作、通信和智能正采取着前所未有的新形式。社会媒体的出现促进了商业模式的改变,影响了人们观点和情感的沟通,为大规模地研究人际交互和集体行为提供了无数机会。 本书从数据挖掘角度介绍社会媒体的性质,评述社会媒体计算的代表性成果,并描述社会媒体带来的挑战。书中介绍了基本概念,使用浅显易懂的例子展示最新的和有效的评价方法。特别是讨论了基于图的社区发现技术并对处理社会媒体中动态的、混杂的网络进行了重要延伸。另外还展示了发现的社区模式怎样用于社会媒体挖掘。本书中的概念、算法和方法能够帮助人们更好地利用社会媒体,并为建立社会化智能系统提供支持。本书是研究社会媒体中社区发现与挖掘技术的入门级读物,适合以数据为中心的社会媒体学科的学生、研究者和实践者阅读。 本书网站http://dmml.asu.edu/cdm/提供了讲课幻灯片、书中所有的图、主要的参考文献、书中使用的一些小型数据集,以及一些代表性算法的源代码。 -
Machine Learning for Hackers
Now that storage and collection technologies are cheaper and more precise, methods for extracting relevant information from large datasets is within the reach any experienced programmer willing to crunch data. With this book, you'll learn machine learning and statistics tools in a practical fashion, using black-box solutions and case studies instead of a traditional math-heavy presentation. By exploring each problem in this book in depth - including both viable and hopeless approaches - you'll learn to recognize when your situation closely matches traditional problems. Then you'll discover how to apply classical statistics tools to your problem. Machine Learning for Hackers is ideal for programmers from private, public, and academic sectors. -
Mining of Massive Datasets
The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike. -
Scaling up Machine Learning
This book presents an integrated collection of representative approaches for scaling up machine learning and data mining methods on parallel and distributed computing platforms. Demand for parallelizing learning algorithms is highly task-specific: in some settings it is driven by the enormous dataset sizes, in others by model complexity or by real-time performance requirements. Making task-appropriate algorithm and platform choices for large-scale machine learning requires understanding the benefits, trade-offs and constraints of the available options. Solutions presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters, concurrent programming frameworks including CUDA, MPI, MapReduce and DryadLINQ, and learning settings (supervised, unsupervised, semi-supervised and online learning). Extensive coverage of parallelization of boosted trees, SVMs, spectral clustering, belief propagation and other popular learning algorithms and deep dives into several applications make the book equally useful for researchers, students and practitioners.