Faceted search is becoming a de facto search feature for websites such as online stores and digital libraries. From interaction design perspective, faceted search is essentially an alternative to advanced search - “post-coordinate boolean operations via a navigational metaphor”. It has a propensity to enhance the affordance of advanced search, as less efforts are required from the users to perform the equivalent and traditionally more convoluted search tasks.
This post describes the faceted search infrastructure development of the JISC UX2 project. The infrastructure is based on Apache Solr, a Java-based faceted search platform. In Part 1 of this post, I described the general setup of Solr for multi-sourced data, metadata (MARC XML) mapping and the experience of importing the CERN book dataset using Solr's Data Import Hanlder (DIH). For the purpose of UI prototyping and testing, UX2 is incorporating the CERN dataset in combinant with existing digital library content held in a Fedora Commons repository. The rest of this post provides an account of the development of Solr, to enable metadata and rich binary documents (PDF, PowerPoints etc) from a Fedora repository, to be indexed for faceted search service.
This is also a technical report for JISC. If the content is too lengthy, you can skip to "Epilogue" to review the outputs directly.