Enterprise Search Sourcebook 2008 - (Page SMS_34) Ant, Maven, and Eclipse to make your own build of the software. This can be daunting even for grizzled veterans since there are a lot of different ways to do it, and all the open source guys (yes, they are mostly guys) have their own idea of what constitutes a good build environment. One thing you can do is index and search a huge number of documents into Solr. The theoretical limit is the size of a big integer, about 2 billion documents. That’s a lot, even for commercial engines. Of course, if you are really thinking of indexing that many documents, you should consider Nutch rather than Solr. Nutch is more suited to web-scale applications than enterprise-scale applications. While I’ve not personally tried to index and search that many documents, the confident chatter on the message boards is that 30 million documents is a walk in the park for Solr. You probably don’t have 30 million documents to index, but it is comforting to know that you could. In the Schema Things As I mentioned before, Lucene is a low-level library. All information in Lucene is stored as text even if the values could be interpreted as integers, floating points, dates, or even custom data types. It lacks any kind of schema. This means you can’t do arithmetic comparisons, numeric range searches, or date-parameter searches—which can seriously limit the effectiveness of search in the enterprise. Solr enhances Lucene by adding sophisticated schema support. Not only does this schema support common data types, but you can also define your own custom data types, as well as dynamic fields whose names match a specific pattern. For example, you can designate all fields that end with *_s to be treated as string fields. This gives you some flexibility if you don’t know the types of all of the fields that are being added to the index beforehand, yet you still want to represent different fields in different ways. Configuration in Solr is handled by a schema configuration XML file, which allows you to specify advanced Lucene analyzers for each field type. You can also specify things like stemming, lemmatization, synonyms, stop-word lists, and sounds-like filters in the configuration file. Solr supports “keyword in context” results highlighting, advanced query caching, and index replication and integration with the Luke index analyzer toolkit. It has a nifty feature called “copy fields” which allows you to treat the same incoming data in different ways for different purposes. You will find all of these features in the best commercial software packages in varying mixes. It is surprising to see such a complete set of features this early in the release cycle of an open source software project. Licensed to Drive Technology isn’t the only difference between open source and vendor-based solutions. Remember—open source is not public domain. One of the practical problems for any organization considering the use of open source FEATURE API LUCENE Java-based traditional API. NUTCH Java-based traditional API. SOLR REST (HTTP/XML). Wrappers for Java, Ruby. CRAWLER None—roll your own. Web and file system crawler included. Accepts XML documents in a specific schema. Also accepts comma-separated lists of documents. FILTERS None—roll your own. Built-in support for HTML, text, Microsoft Office, OpenOffice, and PDF. Roll your own filters with the plug-in architecture. None—You convert from the source format to XML or CSV. SCHEMA None—all data is stored as text. You are responsible for converting to other data types. Fixed. Fully configurable schema. You can specify field types and storage methods declaratively in a configuration file. PLUG-IN MODULES None—roll your own by subclassing and implementing interfaces. Trés POJO. Plug-in system is based on the one used in Eclipse 2.x. Plug-ins are central to how Nutch works. All of the parsing, indexing, and searching that Nutch does is actually accomplished by various plug-ins. Plug-in code can be loaded into Solr by putting Jars containing your classes in a “lib” directory in your Solr Home directory prior to starting your servlet container. 34 ENTERPRISE SEARCH SOURCEBOOK 2008
Table of Contents Feed for the Digital Edition of Enterprise Search Sourcebook 2008 Enterprise Search Sourcebook 2008 Contents Editor’s Note Publisher’s Note Findings and Figures Why Enterprise Search Will Never Be Google-y Searching for Search Usability Your Users Are Talking to You What’s Your Search Story? Search Is Dead—Now What? Delivering on the Promise of Enterprise Search Taming Multiple Search Engines in Your Organization Enterprise Search: Trends for 2008 Enterprise Search Seen From the Inside Open Source Search: Elixir or Poison? Avoiding the Big Mistakes in Search Semantic Search Takes Root in the Enterprise E-Discovery Essentials: The Rules You Need to Know SharePoint Search: An Enterprise Contender? Integrating Security Into Your Enterprise Search Infrastructure Engineering a Better Search Infrastructure Letting End Users Ask the Questions, Stat! The Power of Knowledge Legal Research Using Enterprise Search: A Developer’s View From Treading Water to Full Steam Ahead Pulling Out All the Stops With Midas A Natural Search Solution An Incremental Approach to Improving Enterprise Search The Enterprise Search Sourcebook Showcase Directory Index to Advertisers and Companies Mentioned Enterprise Search Sourcebook 2008 Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_991) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_992a) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_992b) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_992) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_1) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_2) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_3) Enterprise Search Sourcebook 2008 - Enterprise Search Sourcebook 2008 (Page SMS_4) Enterprise Search Sourcebook 2008 - Contents (Page SMS_5) Enterprise Search Sourcebook 2008 - Contents (Page SMS_6) Enterprise Search Sourcebook 2008 - Contents (Page SMS_7) Enterprise Search Sourcebook 2008 - Editor’s Note (Page SMS_8) Enterprise Search Sourcebook 2008 - Editor’s Note (Page SMS_9) Enterprise Search Sourcebook 2008 - Publisher’s Note (Page SMS_10) Enterprise Search Sourcebook 2008 - Findings and Figures (Page SMS_11) Enterprise Search Sourcebook 2008 - Why Enterprise Search Will Never Be Google-y (Page SMS_12) Enterprise Search Sourcebook 2008 - Why Enterprise Search Will Never Be Google-y (Page SMS_13) Enterprise Search Sourcebook 2008 - Searching for Search Usability (Page SMS_14) Enterprise Search Sourcebook 2008 - Searching for Search Usability (Page SMS_15) Enterprise Search Sourcebook 2008 - Your Users Are Talking to You (Page SMS_16) Enterprise Search Sourcebook 2008 - Your Users Are Talking to You (Page SMS_17) Enterprise Search Sourcebook 2008 - What’s Your Search Story? (Page SMS_18) Enterprise Search Sourcebook 2008 - What’s Your Search Story? (Page SMS_19) Enterprise Search Sourcebook 2008 - Search Is Dead—Now What? (Page SMS_20) Enterprise Search Sourcebook 2008 - Search Is Dead—Now What? (Page SMS_21) Enterprise Search Sourcebook 2008 - Delivering on the Promise of Enterprise Search (Page SMS_22) Enterprise Search Sourcebook 2008 - Delivering on the Promise of Enterprise Search (Page SMS_23) Enterprise Search Sourcebook 2008 - Taming Multiple Search Engines in Your Organization (Page SMS_24) Enterprise Search Sourcebook 2008 - Taming Multiple Search Engines in Your Organization (Page SMS_25) Enterprise Search Sourcebook 2008 - Enterprise Search: Trends for 2008 (Page SMS_26) Enterprise Search Sourcebook 2008 - Enterprise Search: Trends for 2008 (Page SMS_27) Enterprise Search Sourcebook 2008 - Enterprise Search Seen From the Inside (Page SMS_28) Enterprise Search Sourcebook 2008 - Enterprise Search Seen From the Inside (Page SMS_29) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_30) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_31) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_32) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_33) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_34) Enterprise Search Sourcebook 2008 - Open Source Search: Elixir or Poison? (Page SMS_35) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_36) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_37) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_38) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_39) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_40) Enterprise Search Sourcebook 2008 - Avoiding the Big Mistakes in Search (Page SMS_41) Enterprise Search Sourcebook 2008 - Semantic Search Takes Root in the Enterprise (Page SMS_42) Enterprise Search Sourcebook 2008 - Semantic Search Takes Root in the Enterprise (Page SMS_43) Enterprise Search Sourcebook 2008 - Semantic Search Takes Root in the Enterprise (Page SMS_44) Enterprise Search Sourcebook 2008 - Semantic Search Takes Root in the Enterprise (Page SMS_45) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_46) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_47) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_48) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_49) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_50) Enterprise Search Sourcebook 2008 - E-Discovery Essentials: The Rules You Need to Know (Page SMS_51) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_52) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_53) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_54) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_55) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_56) Enterprise Search Sourcebook 2008 - SharePoint Search: An Enterprise Contender? (Page SMS_57) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_58) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_59) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_60) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_61) Enterprise Search Sourcebook 2008 - Integrating Security Into Your Enterprise Search Infrastructure (Page SMS_62) Enterprise Search Sourcebook 2008 - Engineering a Better Search Infrastructure (Page SMS_63) Enterprise Search Sourcebook 2008 - Engineering a Better Search Infrastructure (Page SMS_64) Enterprise Search Sourcebook 2008 - Engineering a Better Search Infrastructure (Page SMS_65) Enterprise Search Sourcebook 2008 - Letting End Users Ask the Questions, Stat! (Page SMS_66) Enterprise Search Sourcebook 2008 - Letting End Users Ask the Questions, Stat! (Page SMS_67) Enterprise Search Sourcebook 2008 - Letting End Users Ask the Questions, Stat! (Page SMS_68) Enterprise Search Sourcebook 2008 - Letting End Users Ask the Questions, Stat! (Page SMS_69) Enterprise Search Sourcebook 2008 - The Power of Knowledge (Page SMS_70) Enterprise Search Sourcebook 2008 - The Power of Knowledge (Page SMS_71) Enterprise Search Sourcebook 2008 - The Power of Knowledge (Page SMS_72) Enterprise Search Sourcebook 2008 - The Power of Knowledge (Page SMS_73) Enterprise Search Sourcebook 2008 - Legal Research Using Enterprise Search: A Developer’s View (Page SMS_74) Enterprise Search Sourcebook 2008 - Legal Research Using Enterprise Search: A Developer’s View (Page SMS_75) Enterprise Search Sourcebook 2008 - Legal Research Using Enterprise Search: A Developer’s View (Page SMS_76) Enterprise Search Sourcebook 2008 - Legal Research Using Enterprise Search: A Developer’s View (Page SMS_77) Enterprise Search Sourcebook 2008 - From Treading Water to Full Steam Ahead (Page SMS_78) Enterprise Search Sourcebook 2008 - From Treading Water to Full Steam Ahead (Page SMS_79) Enterprise Search Sourcebook 2008 - From Treading Water to Full Steam Ahead (Page SMS_80) Enterprise Search Sourcebook 2008 - Pulling Out All the Stops With Midas (Page SMS_81) Enterprise Search Sourcebook 2008 - Pulling Out All the Stops With Midas (Page SMS_82) Enterprise Search Sourcebook 2008 - Pulling Out All the Stops With Midas (Page SMS_83) Enterprise Search Sourcebook 2008 - A Natural Search Solution (Page SMS_84) Enterprise Search Sourcebook 2008 - A Natural Search Solution (Page SMS_85) Enterprise Search Sourcebook 2008 - A Natural Search Solution (Page SMS_86) Enterprise Search Sourcebook 2008 - A Natural Search Solution (Page SMS_87) Enterprise Search Sourcebook 2008 - An Incremental Approach to Improving Enterprise Search (Page SMS_88) Enterprise Search Sourcebook 2008 - An Incremental Approach to Improving Enterprise Search (Page SMS_89) Enterprise Search Sourcebook 2008 - An Incremental Approach to Improving Enterprise Search (Page SMS_90) Enterprise Search Sourcebook 2008 - An Incremental Approach to Improving Enterprise Search (Page SMS_91) Enterprise Search Sourcebook 2008 - The Enterprise Search Sourcebook Showcase Directory (Page SMS_92) Enterprise Search Sourcebook 2008 - The Enterprise Search Sourcebook Showcase Directory (Page SMS_93) Enterprise Search Sourcebook 2008 - The Enterprise Search Sourcebook Showcase Directory (Page SMS_94) Enterprise Search Sourcebook 2008 - The Enterprise Search Sourcebook Showcase Directory (Page SMS_95) Enterprise Search Sourcebook 2008 - Index to Advertisers and Companies Mentioned (Page SMS_96) Enterprise Search Sourcebook 2008 - Index to Advertisers and Companies Mentioned (Page SMS_993) Enterprise Search Sourcebook 2008 - Index to Advertisers and Companies Mentioned (Page SMS_994)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.