Enterprise Search Sourcebook 2008 - (Page SMS_33) crawlers have to know how to do incremental updates from the source repository by either keeping its own database of documents that it has seen before, or by keeping some kind of high-water mark. Most enterprise search vendors supply crawlers that use a modular pipeline approach. There is a component that interacts directly with the source repository API. In the case of a web crawler, this component makes HTTP requests and retrieves webpages. It then parses the webpage for links contained within the webpage and adds them to the list of pages to download and index. File system crawlers are much simpler—usually recursively walking a directory structure and not bothering to parse the files for links. Document management system crawlers, on the other hand, use a variety of techniques, the most common of which is to use the native API of the document management system and retrieve documents according to rules set forth in the crawler configuration specific to that repository. Another technique is to create a simple web front-end page that contains a site map and templates, which render the document management repository as a set of simple webpages. These webpages can then be crawled by the HTTP module. Finally, databases are the most flexible of all data sources. Since good normalized database design is an impedance mismatch with good full-text index design, your well-normalized database must be denormalized into a flat table of “documents.” There are three general techniques for database indexing: 1. Build SQL or stored procedures on the database side, which denormalize the tables into a single “documents” table. 2. Use database connectivity from the source language (e.g., JDBC for Java) to retrieve the individual component tables and do the “join” in your program code. 3. Build HTML or XML templates hosted on a website, which renders the database information as “pages” that can be crawled by the HTTP crawler. In a way, it is kind of nice that Solr does not impose its own method of crawling data. In my experience, crawlers are often the weakest part of a commercial full-text search system. They are the redheaded stepchildren of companies that build fancy commercial systems. It is an unglamorous job to write these things, and all the cool kids want to work on the kernel. Although there is no built-in crawler for Solr, there is a loose integration between the Nutch crawler and Solr. Currently, this integration is implemented as a patch for the Nutch code base. In order to use it, you must build Nutch from scratch and apply patches. In the open source community this is seen as no big deal. It is not a big deal for open source Java geeks, so if you are one of those, right on! To me, it’s kind of a pain in the neck, and it’s one side of the double-edged sword that is open source software. Bells, Whistles, and Spangles Solr is a big step toward true enterprise search, but it’s not quite there yet. If you are expecting a complete out-ofbox experience for indexing and searching your enterprise, then look elsewhere. While Lucene, Nutch, and Solr are quite sophisticated and are not much more difficult to install than equivalent closed source software, you sometimes may have to dive into the arcane world of tools like Subversion, SOLR is for enterprises that Have access to Java programmers either in-house or as consultants. Have an understanding of their search domain and are willing to study up a bit on some pretty advanced concepts. Are willing to write some code or repurpose some existing code to get documents or data into the index. Are willing to write code for the front-end of the search engine (this is required for all search engines, not just SOLR). Either have no search access control requirements, or are confident in implementing them at the search-engine level. Want faceted search. Are willing to accept the terms of the Apache License 2.0. Need exceptional control over scoring and ranking algorithms. Want to add their own scoring and ranking algorithms. Have an enterprise mandate to use open source software when available. Do not have organizational biases or restrictions against using open source software. SOLR is not for enterprises that Don’t have access to Java programmers. Don’t have a strong business case for using a sophisticated enterprise search platform. Don’t want to write code for crawlers or connectors. Have complex access control requirements that cannot be encoded easily into search fields for filter queries. Have organizational biases or restrictions against using open source software. Require unsophisticated search or very few documents that can be indexed by one of the many free or inexpensive commercial search tools available. WWW.ENTERPRISESEARCHCENTER.COM 33 http://WWW.ENTERPRISESEARCHCENTER.COM
Table of Contents Feed for the Digital Edition of Enterprise Search Sourcebook 2008 Enterprise Search Sourcebook 2008 Contents Editor’s Note Publisher’s Note Findings and Figures Why Enterprise Search Will Never Be Google-y Searching for Search Usability Your Users Are Talking to You What’s Your Search Story? Search Is Dead—Now What? Delivering on the Promise of Enterprise Search Taming Multiple Search Engines in Your Organization Enterprise Search: Trends for 2008 Enterprise Search Seen From the Inside Open Source Search: Elixir or Poison? Avoiding the Big Mistakes in Search Semantic Search Takes Root in the Enterprise E-Discovery Essentials: The Rules You Need to Know SharePoint Search: An Enterprise Contender? Integrating Security Into Your Enterprise Search Infrastructure Engineering a Better Search Infrastructure Letting End Users Ask the Questions, Stat! The Power of Knowledge Legal Research Using Enterprise Search: A Developer’s View From Treading Water to Full Steam Ahead Pulling Out All the Stops With Midas A Natural Search Solution An Incremental Approach to Improving Enterprise Search The Enterprise Search Sourcebook Showcase Directory Index to Advertisers and Companies Mentioned Enterprise Search Sourcebook 2008 Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_991) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_992a) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_992b) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_992) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_1) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_2) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_3) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_4) Enterprise Search Sourcebook 2008 - Contents (Page SMS_5) Enterprise Search Sourcebook 2008 - Contents (Page SMS_6) Enterprise Search Sourcebook 2008 - Contents (Page SMS_7) Enterprise Search Sourcebook 2008 - Editor’s Note (Page SMS_8) Enterprise Search Sourcebook 2008 - Editor’s Note (Page SMS_9) Enterprise Search Sourcebook 2008 - Publisher’s Note (Page SMS_10) Enterprise Search Sourcebook 2008 - Findings and Figures (Page SMS_11) Enterprise Search Sourcebook 2008 - Why Enterprise Search Will Never Be Google-y (Page SMS_12) Enterprise Search Sourcebook 2008 - Why Enterprise Search Will Never Be Google-y (Page SMS_13) Enterprise Search Sourcebook 2008 - Searching for Search Usability (Page SMS_14) Enterprise Search Sourcebook 2008 - Searching for Search Usability (Page SMS_15) Enterprise Search Sourcebook 2008 - Your Users Are Talking to You (Page SMS_16) Enterprise Search Sourcebook 2008 - Your Users Are Talking to You (Page SMS_17) Enterprise Search Sourcebook 2008 - What’s Your Search Story? (Page SMS_18) Enterprise Search Sourcebook 2008 - What’s Your Search Story? (Page SMS_19) Enterprise Search Sourcebook 2008 - Search Is Dead—Now What? (Page SMS_20) Enterprise Search Sourcebook 2008 - Search Is Dead—Now What? (Page SMS_21) Enterprise Search Sourcebook 2008 - Delivering on the Promise of Enterprise Search (Page SMS_22) Enterprise Search Sourcebook 2008 - Delivering on the Promise of Enterprise Search (Page SMS_23) Enterprise Search Sourcebook 2008 - Taming Multiple Search Engines in Your Organization (Page SMS_24) Enterprise Search Sourcebook 2008 - Taming Multiple Search Engines in Your Organization (Page SMS_25) Enterprise Search Sourcebook 2008 - Enterprise Search: Trends for 2008 (Page SMS_26) Enterprise Search Sourcebook 2008 - Enterprise Search: Trends for 2008 (Page SMS_27) Enterprise Search Sourcebook 2008 - Enterprise Search Seen From the Inside (Page SMS_28) Enterprise Search Sourcebook 2008 - Enterprise Search Seen From the Inside (Page SMS_29) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_30) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_31) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_32) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_33) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_34) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_35) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_36) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_37) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_38) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_39) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_40) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_41) Enterprise Search Sourcebook 2008 - Semantic Search Takes Root in the Enterprise (Page SMS_42) Enterprise Search Sourcebook 2008 - Semantic Search Takes Root in the Enterprise (Page SMS_43) Enterprise Search Sourcebook 2008 - Semantic Search Takes Root in the Enterprise (Page SMS_44) Enterprise Search Sourcebook 2008 - Semantic Search Takes Root in the Enterprise (Page SMS_45) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_46) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_47) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_48) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_49) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_50) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_51) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_52) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_53) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_54) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_55) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_56) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_57) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_58) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_59) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_60) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_61) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_62) Enterprise Search Sourcebook 2008 - Engineering a Better Search Infrastructure (Page SMS_63) Enterprise Search Sourcebook 2008 - Engineering a Better Search Infrastructure (Page SMS_64) Enterprise Search Sourcebook 2008 - Engineering a Better Search Infrastructure (Page SMS_65) Enterprise Search Sourcebook 2008 - Letting End Users Ask the Questions, Stat! (Page SMS_66) Enterprise Search Sourcebook 2008 - Letting End Users Ask the Questions, Stat! (Page SMS_67) Enterprise Search Sourcebook 2008 - Letting End Users Ask the Questions, Stat! (Page SMS_68) Enterprise Search Sourcebook 2008 - Letting End Users Ask the Questions, Stat! (Page SMS_69) Enterprise Search Sourcebook 2008 - The Power of Knowledge (Page SMS_70) Enterprise Search Sourcebook 2008 - The Power of Knowledge (Page SMS_71) Enterprise Search Sourcebook 2008 - The Power of Knowledge (Page SMS_72) Enterprise Search Sourcebook 2008 - The Power of Knowledge (Page SMS_73) Enterprise Search Sourcebook 2008 - Legal Research Using Enterprise Search: A Developer’s View (Page SMS_74) Enterprise Search Sourcebook 2008 - Legal Research Using Enterprise Search: A Developer’s View (Page SMS_75) Enterprise Search Sourcebook 2008 - Legal Research Using Enterprise Search: A Developer’s View (Page SMS_76) Enterprise Search Sourcebook 2008 - Legal Research Using Enterprise Search: A Developer’s View (Page SMS_77) Enterprise Search Sourcebook 2008 - From Treading Water to Full Steam Ahead (Page SMS_78) Enterprise Search Sourcebook 2008 - From Treading Water to Full Steam Ahead (Page SMS_79) Enterprise Search Sourcebook 2008 - From Treading Water to Full Steam Ahead (Page SMS_80) Enterprise Search Sourcebook 2008 - Pulling Out All the Stops With Midas (Page SMS_81) Enterprise Search Sourcebook 2008 - Pulling Out All the Stops With Midas (Page SMS_82) Enterprise Search Sourcebook 2008 - Pulling Out All the Stops With Midas (Page SMS_83) Enterprise Search Sourcebook 2008 - A Natural Search Solution (Page SMS_84) Enterprise Search Sourcebook 2008 - A Natural Search Solution (Page SMS_85) Enterprise Search Sourcebook 2008 - A Natural Search Solution (Page SMS_86) Enterprise Search Sourcebook 2008 - A Natural Search Solution (Page SMS_87) Enterprise Search Sourcebook 2008 - An Incremental Approach to Improving Enterprise Search (Page SMS_88) Enterprise Search Sourcebook 2008 - An Incremental Approach to Improving Enterprise Search (Page SMS_89) Enterprise Search Sourcebook 2008 - An Incremental Approach to Improving Enterprise Search (Page SMS_90) Enterprise Search Sourcebook 2008 - An Incremental Approach to Improving Enterprise Search (Page SMS_91) Enterprise Search Sourcebook 2008 - The Enterprise Search Sourcebook Showcase Directory (Page SMS_92) Enterprise Search Sourcebook 2008 - The Enterprise Search Sourcebook Showcase Directory (Page SMS_93) Enterprise Search Sourcebook 2008 - The Enterprise Search Sourcebook Showcase Directory (Page SMS_94) Enterprise Search Sourcebook 2008 - The Enterprise Search Sourcebook Showcase Directory (Page SMS_95) Enterprise Search Sourcebook 2008 - Index to Advertisers and Companies Mentioned (Page SMS_96) Enterprise Search Sourcebook 2008 - Index to Advertisers and Companies Mentioned (Page SMS_993) Enterprise Search Sourcebook 2008 - Index to Advertisers and Companies Mentioned (Page SMS_994)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.