-
数学之美 (第二版)
几年前,“数学之美”系列文章原刊载于谷歌黑板报,获得上百万次点击,得到读者高度评价。读者说,读了“数学之美”,才发现大学时学的数学知识,比如马尔可夫链、矩阵计算,甚至余弦函数原来都如此亲切,并且栩栩如生,才发现自然语言和信息处理这么有趣。 在纸本书的创作中,作者吴军博士几乎把所有文章都重写了一遍,为的是把高深的数学原理讲得更加通俗易懂,让非专业读者也能领略数学的魅力。读者通过具体的例子学到的是思考问题的方式 —— 如何化繁为简,如何用数学去解决工程问题,如何跳出固有思维不断去思考创新。 第二版增加了针对大数据和机器学习的内容,以便满足人们对当下技术的学习需求;同时,根据专家和读者的反馈更正了一些错漏,并更新了部分内容。 《数学之美》第一版荣获国家图书馆第八届文津图书奖; 入选广电总局“2014年向全国青少年推荐百种优秀图书书目”; 荣获2012-2013年度全行业优秀畅销书; 《浪潮之巅》、《文明之光》作者吴军博士最新力作,李开复作序推荐,Google黑板报百万点击! 新版增加了大数据和机器学习等最新内容,以满足人们对当下技术的学习需求;同时,根据专家和读者的反馈更正了错漏,并更新了部分内容 -
社交网站的数据挖掘与分析
Facebook、Twitter和LinkedIn产生了大量宝贵的社交数据,但是你怎样才能找出谁通过社交媒介正在进行联系?他们在讨论些什么?或者他们在哪儿?这本简洁而且具有可操作性的书将揭示如何回答这些问题甚至更多的问题。你将学到如何组合社交网络数据、分析技术,如何通过可视化帮助你找到你一直在社交世界中寻找的内容,以及你闻所未闻的有用信息。 每个独立的章节介绍了在社交网络的不同领域挖掘数据的技术,这些领域包括博客和电子邮件。你所需要具备的就是一定的编程经验和学习基本的Python工具的意愿。 •获得对社交网络世界的直观认识 •使用GitHub上灵活的脚本来获取从诸如Twitter、Facebook和LinkedIn之类的社交网络API中的数据 •学习如何应用便捷的Python工具来交叉分析你所收集的数据 •通过XHTML朋友圈探讨基于微格式的社交联系 •应用诸如TF-IDF、余弦相似性、搭配分析、文档摘要、派系检测之类的先进挖掘技术 •通过基于HTML5和JavaScript工具包的网络技术建立交互式可视化 -
Introduction to Information Retrieval
Class-tested and coherent, this groundbreaking new textbook teaches classic web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike. Contents 1. Information retrieval using the Boolean model; 2. The dictionary and postings lists; 3. Tolerant retrieval; 4. Index construction; 5. Index compression; 6. Scoring and term weighting; 7. Vector space retrieval; 8. Evaluation in information retrieval; 9. Relevance feedback and query expansion; 10. XML retrieval; 11. Probabilistic information retrieval; 12. Language models for information retrieval; 13. Text classification and Naive Bayes; 14. Vector space classification; 15. Support vector machines and kernel functions; 16. Flat clustering; 17. Hierarchical clustering; 18. Dimensionality reduction and latent semantic indexing; 19. Web search basics; 20. Web crawling and indexes; 21. Link analysis. Reviews “This is the first book that gives you a complete picture of the complications that arise in building a modern web-scale search engine. You'll learn about ranking SVMs, XML, DNS, and LSI. You'll discover the seedy underworld of spam, cloaking, and doorway pages. You'll see how MapReduce and other approaches to parallelism allow us to go beyond megabytes and to efficiently manage petabytes." -Peter Norvig, Director of Research, Google Inc. "Introduction to Information Retrieval is a comprehensive, up-to-date, and well-written introduction to an increasingly important and rapidly growing area of computer science. Finally, there is a high-quality textbook for an area that was desperately in need of one." -Raymond J. Mooney, Professor of Computer Sciences, University of Texas at Austin “Through compelling exposition and choice of topics, the authors vividly convey both the fundamental ideas and the rapidly expanding reach of information retrieval as a field.” -Jon Kleinberg, Professor of Computer Science, Cornell University -
集体智慧编程
想要探寻搜索排名、产品推荐、社会化书签和在线匹配背后的力量吗?这本颇具魅力的书籍向你展现如何创建Web 2.0应用程序,从参与性?Internet应用程序产生的大量数据中挖掘金矿。运用本书中介绍的先进算法,你可以编写聪明的程序,以访问其他网站那些有趣的数据集,从自有应用程序的用户中收集数据,或者分析和理解你所发现的数据。 《集体智慧编程》将你带入机器学习和统计的世界,并且阐释了如何从你和他人每天收集的信息中获得关于用户体验、市场营销、个性品味及人类行为的结论。每个算法的描述都十分简明清晰,相关代码均可以立即用于你的网站、博客、Wiki或特定应用程序。本书讲解了下列主题: 可以让在线零售商推荐产品或媒体的协作过滤技术 用于在大数据集中发现同类项组的聚类方法 从数以百万计可能方案中选择问题最佳解决方案的最优化算法 贝叶斯过滤,用在基于单词类型和其他特征的垃圾信息过滤中 支持向量(support-vector)机器,用于在线交友网站中的速配 用于问题解决的演化智能——计算机如何通过多次玩同样的游戏,改进自身代码并获得技能提升 每一章都包含了相关练习,可通过扩展使算法变得更强大。超越简单的数据库支持应用程序模式,让 Internet数据财富为你所用。 -
Statistical Decision Theory and Bayesian Analysis
In this new edition the author has added substantial material on Bayesian analysis, including lengthy new sections on such important topics as empirical and hierarchical Bayes analysis, Bayesian calculation, Bayesian communication, and group decision making. With these changes, the book can be used as a self-contained introduction to Bayesian analysis. In addition, much of the decision-theoretic portion of the text was updated, including new sections covering such modern topics as minimax multivariate (Stein) estimation. -
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Handling inherent uncertainty and exploiting compositional structure are fundamental to understanding and designing large-scale systems. Statistical relational learning builds on ideas from probability theory and statistics to address uncertainty while incorporating tools from logic, databases and programming languages to represent structure. In Introduction to Statistical Relational Learning, leading researchers in this emerging area of machine learning describe current formalisms, models, and algorithms that enable effective and robust reasoning about richly structured systems and data. The early chapters provide tutorials for material used in later chapters, offering introductions to representation, inference and learning in graphical models, and logic. The book then describes object-oriented approaches, including probabilistic relational models, relational Markov networks, and probabilistic entity-relationship models as well as logic-based formalisms including Bayesian logic programs, Markov logic, and stochastic logic programs. Later chapters discuss such topics as probabilistic models with unknown objects, relational dependency networks, reinforcement learning in relational domains, and information extraction. By presenting a variety of approaches, the book highlights commonalities and clarifies important differences among proposed approaches and, along the way, identifies important representational and algorithmic issues. Numerous applications are provided throughout.Lise Getoor is Assistant Professor in the Department of Computer Science at the University of Maryland. Ben Taskar is Assistant Professor in the Computer and Information Science Department at the University of Pennsylvania.