-
Introduction to Data Mining
Introduction to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first time. Each concept is explored thoroughly and supported with numerous examples. The text requires only a modest background in mathematics. Each major topic is organized into two chapters, beginning with basic concepts that provide necessary background for understanding each data mining technique, followed by more advanced concepts and algorithms. Quotes This book provides a comprehensive coverage of important data mining techniques. Numerous examples are provided to lucidly illustrate the key concepts. -Sanjay Ranka, University of Florida In my opinion this is currently the best data mining text book on the market. I like the comprehensive coverage which spans all major data mining techniques including classification, clustering, and pattern mining (association rules). -Mohammed Zaki, Rensselaer Polytechnic Institute -
Data Mining
As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights for the new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; plus much more; algorithmic methods at the heart of successful data mining-including tried and true techniques as well as leading edge methods; performance improvement techniques that work by transforming the input or output; and, downloadable Weka, a collection of machine learning algorithms for data mining tasks, including tools for data pre-processing, classification, regression, clustering, association rules, and visualization-in a new, interactive interface. -
Mining of Massive Datasets
The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike. -
Beautiful Data
In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging - and beautiful - working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video. With "Beautiful Data", you will: explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web; learn how to visualize trends in urban crime, using maps and data mashups; discover the challenges of designing a data processing system that works within the constraints of space travel; also learn how crowdsourcing and transparency have combined to advance the state of drug research; and, understand how new data can automatically trigger alerts when it matches or overlaps pre-existing data. Learn about the massive infrastructure required to create, capture, and process DNA data. That's only small sample of what you'll find in "Beautiful Data". For anyone who handles data, this is a truly fascinating book. Contributors include: Nathan Yau; Jonathan Follett and Matt Holm; J.M. Hughes; Raghu Ramakrishnan, Brian Cooper, and Utkarsh Srivastava; Jeff Hammerbacher; Jason Dykes and Jo Wood; Jeff Jonas and Lisa Sokol; Jud Valeski; Alon Halevy and Jayant Madhavan; Aaron Koblin and Valdean Klump; Michal Migurski; Jeff Heer; Coco Krumme; Peter Norvig; Matt Wood and Ben Blackburne; Jean-Claude Bradley, Rajarshi Guha, Andrew Lang, Pierre Lindenbaum, Cameron Neylon, Antony Williams, and Egon Willighagen; Lukas Biewald and Brendan O'Connor; Hadley Wickham, Deborah Swayne, and David Poole; Andrew Gelman, Jonathan P. Kastellec, and Yair Ghitza; and, Toby Segaran. -
Web数据挖掘
《Web数据挖掘》旨在讲述这些任务以及它们的核心挖掘算法;尽可能涵盖每个话题的广泛内容,给出足够多的细节,以便读者无须借助额外的阅读,即可获得相对完整的关于算法和技术的知识。其中结构化数据的抽取、信息整合、观点挖掘和Web使用挖掘等4章是《Web数据挖掘》的特色,这些内容在已有书籍中没有提及,但它们在Web数据挖掘中却占有非常重要的地位。当然,传统的Web挖掘主题,如搜索、页面爬取和资源探索以及链接分析在书中也作了详细描述。 《Web数据挖掘》尽管题为“Web数据挖掘”,却依然涵盖了数据挖掘和信息检索的核心主题;因为Web挖掘大量使用了它们的算法和技术。数据挖掘部分主要由关联规则和序列模式、监督学习(分类)、无监督学习(聚类)这三大最重要的数据挖掘任务,以及半监督学习这个相对深入的主题组成。而信息检索对于Web挖掘而言最重要的核心主题都有所阐述。 -
Learning From Data
Machine learning allows computational systems to adaptively improve their performance with experience accumulated from the observed data. Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.