-
信息检索导论
封面图片为英国伯明翰塞尔福瑞吉百货大楼,其极具线条感的轮廓外型优美,犹如水波的流动。其外表悬挂了1.5万个铝碟,创造出一种极具现代气息的纹理装饰效果,有如夜空下水流的波光粼粼,闪烁于月光之下,使建筑的商业氛围表现到极致。设计该建筑的英国“未来系统建筑事物所”,将商场内部围合成一个顶部采光的中庭,配以交叉的自动扶梯,使购物环境呈现出一种凝聚的向心力和商业广告的展示效应。作为英国第二商业城市伯明翰的建筑地标,人们称该建筑为“未来的百货商店”。因其设计构思的前卫性,该建筑获得2004年英国皇家建筑学会的“建筑设计奖”和2004年“英国皇家工艺美术委员会奖”等多个奖项。 本书从计算机科学领域的角度出发,介绍了信息检索的基础知识,并对当前信息检索的发展做了回顾,重点介绍了搜索引擎的核心技术,如文档分类和文档聚类问题,以及机器学习和数值计算方法。书中所有重要的思想都用示例进行了解释,生动形象,引人入胜,实现了理论与实战的完美结合。 本书的三位作者均是信息检索领域的顶级专家,两位来自学术教育界,一位来自硅谷业界,使本书既具备深厚的理论基础,又代表了尖端科技水准。因此,该书甫一出版,即被奉为该领域的权威著作,备受瞩目。目前已被众多世界名校采用为信息检索课程的教材。 -
Introduction to Information Retrieval
Class-tested and coherent, this groundbreaking new textbook teaches classic web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike. Contents 1. Information retrieval using the Boolean model; 2. The dictionary and postings lists; 3. Tolerant retrieval; 4. Index construction; 5. Index compression; 6. Scoring and term weighting; 7. Vector space retrieval; 8. Evaluation in information retrieval; 9. Relevance feedback and query expansion; 10. XML retrieval; 11. Probabilistic information retrieval; 12. Language models for information retrieval; 13. Text classification and Naive Bayes; 14. Vector space classification; 15. Support vector machines and kernel functions; 16. Flat clustering; 17. Hierarchical clustering; 18. Dimensionality reduction and latent semantic indexing; 19. Web search basics; 20. Web crawling and indexes; 21. Link analysis. Reviews “This is the first book that gives you a complete picture of the complications that arise in building a modern web-scale search engine. You'll learn about ranking SVMs, XML, DNS, and LSI. You'll discover the seedy underworld of spam, cloaking, and doorway pages. You'll see how MapReduce and other approaches to parallelism allow us to go beyond megabytes and to efficiently manage petabytes." -Peter Norvig, Director of Research, Google Inc. "Introduction to Information Retrieval is a comprehensive, up-to-date, and well-written introduction to an increasingly important and rapidly growing area of computer science. Finally, there is a high-quality textbook for an area that was desperately in need of one." -Raymond J. Mooney, Professor of Computer Sciences, University of Texas at Austin “Through compelling exposition and choice of topics, the authors vividly convey both the fundamental ideas and the rapidly expanding reach of information retrieval as a field.” -Jon Kleinberg, Professor of Computer Science, Cornell University -
百度:如此专注!
《专注:百度成功的故事》讲述了:2005年8月5日,中国最大的互联网搜索服务提供商——百度在线网络技术有限公司(NASDAQ:BIDU)宣布在纳斯达克(nasdaq)正式上市,发行4,040,402股美国存托凭证股票。第二天,百度IPO当日涨幅354%,在美国IPO历史上排名第18位,在海外IPO案子中则是新高纪录。从此百度就有了让人疯狂的理由。这《专注:百度成功的故事》将介绍海归传奇人物李彦宏的成长经历,以及他是如何利用中西合璧的文化创造着百度的命运?百度为什么要上市?号称“中国Google”的百度在上市之后命运又是如何?有人说百度上市是中国互联网的里程碑,那么上市真的代表成功吗?李彦宏又将面临怎样的选择,是让股价继续攀升还是投资技术开发?百度在热暴之下给国内外IT人士带来怎样的冷思考…… -
文本挖掘
《文本挖掘(英文版)》是一部文本挖掘领域名著,作者为世界知名的权威学者。书中涵盖了核心文本挖掘操作、文本挖掘预处理技术、分类、聚类、信息提取、信息提取的概率模型、预处理应用、可视化方法、链接分析、文本挖掘应用等内容,很好地结合了文本挖掘的理论和实践。《文本挖掘(英文版)》非常适合文本挖掘、信息检索领域的研究人员和实践者阅读,也适合作为高等院校计算机及相关专业研究生的数据挖掘和知识发现等课程的教材。 -
深入搜索引擎
《深入搜索引擎:海量信息的压缩、索引和查询》是斯坦福大学信息检索和挖掘课程的首选教材之一,并已成为全球主要大学信息检索的主要教材。《深入搜索引擎:海量信息的压缩、索引和查询》理论和实践并重,深入浅出地给出了海量信息数据处理的整套解决方案,包括压缩、索引和查询的方方面面。其最大的特色在于不仅仅满足信息检索理论学习的需要,更重要的是给出了实践中可能面对的各种问题及其解决方法。 《深入搜索引擎:海量信息的压缩、索引和查询》作为斯坦福大学信息检索课程的教材之一,具有一定的阅读难度,主要面向信息检索专业高年级本科生和研究生、搜索引擎业界的专业技术人员和从事海量数据处理相关专业的技术人员。 -
Lucene in Action, Second Edition
HIGHLIGHT New edition of top-selling book on the new version of Lucene--the core open-source technology behind most full-text search and "Intelligent Web" applications. DESCRIPTION When Lucene first hit the scene five years ago, it was nothing short of amazing. By using this open-source, highly scalable, super-fast search engine, developers could integrate search into applications quickly and efficiently. A lot has changed since then--search has grown from a "nice-to-have" feature into an indispensable part of most enterprise applications. Lucene now powers search in diverse companies including Akamai, Netflix, LinkedIn, Technorati, HotJobs, Epiphany, FedEx, Mayo Clinic, MIT, New Scientist Magazine, and many others. Some things remain the same, though. Lucene still delivers high-performance search features in a disarmingly easy-to-use API. Due to its vibrant and diverse open-source community of developers and users, Lucene is relentlessly improving, with evolutions to APIs, significant new features such as payloads, and a huge increase (as much as 8x) in indexing speed with Lucene 2.3. And with clear writing, reusable examples, and unmatched advice on best practices, Lucene in Action, Second Edition is still the definitive guide to developing with Lucene. KEY POINTS * Completely revised and updated to current Lucene 2.3 APIs. * Practical coverage, like how to index MS Word, PDF, HTML, and XML. * Full introduction to Intelligent Web topics like smart searching, sorting, and filtering.