Rating:  Summary: Very clear, but misses some key real-world issues Review: As others have said, MG is a good introductory text for Information Retrieval. However I think it spends a little too much time on compression techniques and lacks a good discussion of incremental or on-line indexing. The book tends to assume that the set of texts to be searched is static - if new documents can be added or old ones deleted it makes the whole problem much harder and many of MG's techniques are no longer relevant. That said, I strongly look forward to Managing Terabytes (if it ever appears).
Rating:  Summary: Well, written, with plenty of nuts and bolts Review: I found MG exceedingly readable, and particularly useful. The ideas are very well explained, and the problems are solved in a stepwise fashion, leading from a simple, inefficient solution to a problem to a more complex, efficient one. Where appropriate, pseudocode is included to communicate the algorithms unambiguously. I use the free MG software in my research on information retrieval, and this book is an indispensible supplement to the software. The ideas on compression and efficiency described in the book and implemented in the software are the best that I know of in the public domain, and I've looked!
Rating:  Summary: The best book on information retrieval I've read. Review: Managing Gigabytes (MG) is by far the best book on IR that I've read. The writing is clear, to the point, detailed, and leads one from the abstract to the implementation. Of particular note is the attention spent on maintaining compressed indices.
Rating:  Summary: Compression, Algorithms, Full Text Retrieval Review: Managing Gigabytes is a must read for anyone iterested in how to transmit, access, store, and search large amounts of data. I'm the President and CTO of Aladdin Systems, Inc, the creators of the StuffIt compression product line for Mac and Windows, and I find it an invaluable addition to my reference library. The authors take complex information and present it in an organized, easy to read format, suitable for novices to experts. I highly recommend this book.
Rating:  Summary: Great Book on Information Retrieval Review: Managing Gigabytes is the best book out there on information retrieval. If you're interested in implementing your own IR system, there's nothing available that comes close to this book. But the book is good not just because it's the only one out there: the writing is excellent, the algorithms are presented clearly and explained well, and the coverage is thorough. Additionally, the coverage of compression algorithms is the best I've found in any book. All algorithms and pseudo-code in the book are presented clearly enough such that any competent programmer should be able to implement them. If all else fails, however, the free downloadable source code for the mg system can fill in any gaps.All in all, this is the best computer science book I've purchased in years. I wish all CS books were written like this one: it doesn't skimp on the theory or on the implementation details.
Rating:  Summary: Good introduction to searching/indexing in data. Review: MG gave a good introduction to the components of practical Information Retrieval (IR). You can clearly see that the authors have a genuine interest in the field! But, I would like some more theoretical analysis of the algorithms used(i.e. O-notation), and more focus on parallell implementations of IR systems. Another book related to the same area worth mentioning is "Modern Information Retrieval".
Rating:  Summary: This is an ideal entry text for IR related study. Review: This is a hands on text book. That is, words are easy to understand. It doesn't give you tons of links for you to retrieve papers before you can move on to the next section. On contrary, it didn't mention every thing. But it shows you at least one way to accomplish the task. You will need other books to get detailed info. but this is the best starting point
Rating:  Summary: This is a great book. Review: This is one of those rare books that succeeds both on a theoretical and practical level. The theory underlying management and retrieval of large collections of mixed text and image data is thoroughly covered. The authors' experience in developing the accompanying software shines through in the clarity of their explanations and enables them to give practical information regarding the techniques discussed. The software is not just of academic interest, either - an appendix describes a digital library, accessible over the web, that is supported by the mg software. In summary, this is a great book - readable, thorough and practical.
Rating:  Summary: Great Book on Information Retrieval Review: This is the only book there is that will actually teach you how to build an information retrieval system (aka search engine). It discusses all the algorithms and tradeoffs, and comes with free downloadable source code to experiment with. Some of the material is standard, but covered in more implementation detail here than anywhere else. Some of the material is novel: you won't find better coverage of compression unless you hand-assemble twenty research papers, and reverse-engineer them to figure out how they're implemented. But with "Managing Gigabytes", it's all here. (Although, after a particularly envigorating discussion of how to string together a bunch of techniques to compress their corpus and save a couple 100MB, I did a check and found you could buy 512MB of RAM for less than the cost of the book. Knowledge is Power, but sometimes a little cash is more powerful.) The only negative is that this book is not called "Managing Terabytes", as the first edition promised/threatened it might be. RAM and disk are cheap, but not that cheap, and for now terabytes (and sometimes petabytes) are managed only by NASA, Google, and a few others. I can't wait to see the third edition!
Rating:  Summary: The Wonderful Thing Is: It's the Only One Review: This is the only book there is that will actually teach you how to build an information retrieval system (aka search engine). It discusses all the algorithms and tradeoffs, and comes with free downloadable source code to experiment with. Some of the material is standard, but covered in more implementation detail here than anywhere else. Some of the material is novel: you won't find better coverage of compression unless you hand-assemble twenty research papers, and reverse-engineer them to figure out how they're implemented. But with "Managing Gigabytes", it's all here. (Although, after a particularly envigorating discussion of how to string together a bunch of techniques to compress their corpus and save a couple 100MB, I did a check and found you could buy 512MB of RAM for less than the cost of the book. Knowledge is Power, but sometimes a little cash is more powerful.) The only negative is that this book is not called "Managing Terabytes", as the first edition promised/threatened it might be. RAM and disk are cheap, but not that cheap, and for now terabytes (and sometimes petabytes) are managed only by NASA, Google, and a few others. I can't wait to see the third edition!
|