Lucene – Knowledge and References

Explore chapters and articles related to this topic

Modern Predictive Analytics and Big Data Systems Engineering

Published in Anna M. Doro-on, Handbook of Systems Engineering and Risk Management in Control Systems, Communication, Space Technology, Missile, Security and Defense Operations, 2023

Anna M. Doro-on

Solr uses Lucene search library at its core and wraps it to add features and expose it as a RESTful service that can be accessed over HTTP (Shahi 2015). Development of Solr and Lucene merged in March 2010, and both the code bases reside in the same trunk in Apache Subversion (SVN); hence it assures to get all the latest Lucene features in the latest Solr releases (Shahi 2015). Because Solr is based on open standards, it is highly extensible (ASF 2018h). Solr queries are simple HTTP request URLs and the response is a structured document: mainly JSON, but it could also be XML, CSV, or other formats (ASF 2018h). Solr has evolved beyond its search engine capabilities and can be secondarily used as data store using Solr for indexing and searching requirements and need a data store, it would be worth evaluating Solr as an alternative NoSQL solution (Shahi 2015).

Technological Overview

View Chapter

Purchase Book

Published in Peng Liu, Wang Chao, Computational Advertising, 2020

Liu Peng, Wang Chao

Among open source tools, Lucene is one of the more commonly used java-based full-text retrieval toolkits. Lucene is not a complete search engine, but it is necessary for calculating the advertising system, and it can easily achieve the full-text indexing and retrieval functions. Lucene can index text-based data, and its main function is to index every keyword in a document. In addition, Lucene provides a set of APIs for reading, filtering, analyzing documents, marshalling, and using indexes. Lucene was chosen because, in addition to its efficiency and simplicity, it allows users to customize functional logic for key elements. However, some special retrieval algorithms, such as the relevance retrieval described in Chapter 13, are not directly supported in Lucene and need to be modified or developed on the basis of in-depth understanding of the source code.

Remote Sensing Data Organization and Management in a Cloud Computing Environment

View Chapter

Purchase Book

Published in Lizhe Wang, Jining Yan, Yan Ma, Cloud Computing in Remote Sensing, 2019

Lizhe Wang, Jining Yan, Yan Ma

After spatial organization of multi-source remote sensing data, the full-text indexdata management!data index of metadata should be constructed to enable quick retrieval. It should be added that, as the query index of remote sensing data involves many terms, the column-oriented key-value data store, like HBase, cannot effectively handle multi-condition joint retrieval. Hence, in this chapter, the multi-sourced remote sensing metadata retrieval used the full-text index, and its construction was mainly implemented by Lucene and SolrCloud. In essence, Lucene is a high-performance, full-featured text search engine library written entirely in Java, and the ready-to-use search platform provided by SolrCloud is also based on Lucene. Lucene supports the full-text index construction of static metadata fields and dynamic domain fields. However, Lucene is not a complete full-text search engine; it should be combined with Solr or SolrCloud to provide a complete search service [71].

A Comparison of Lucene Search Queries Evolved as Text Classifiers

View Article

Journal Information

Published in Applied Artificial Intelligence, 2018

Laurence Hirsch, Teresa Brunsdon

The current system produces binary classifiers that simply indicate whether a document is contained within a particular category. Indexing systems such as Lucene incorporate complex and highly efficient scoring systems so that the set of documents matching a query are ranked according to some measure of closeness to the query. We believe that this could be incorporated into the fitness test to enhance classification accuracy. We are also working on developing an unsupervised learning system for clustering sets of documents using search queries to retrieve disjoint sets of documents. In addition, we believe that there is a need to drill into results as this can show the consistency of classifiers and aid in our understanding of when and why one works over another. It is hoped to revisit the statistical modeling approach of the experiments conducted here to find a more suitable way to quantify this.

Towards intelligent geospatial data discovery: a machine learning framework for search ranking

View Article

Journal Information

Published in International Journal of Digital Earth, 2018

Yongyao Jiang, Yun Li, Chaowei Yang, Fei Hu, Edward M. Armstrong, Thomas Huang, David Moroni, Lewis J. McGibbney, Christopher J. Finch

Some authors consider that the core search functionality of most existing geospatial data portals is powered by Apache Lucene, an open-source information retrieval library or products built upon Lucene such as Apache Solr or Elasticsearch (Li, Goodchild, and Raskin 2014). For example, NOAA’s OneStop project is based on Elasticsearch, and the search engine of PO.DAAC is developed using Solr. Lucene-based techniques use the Boolean model to find matching documents (e.g. data) and various similarity algorithms to calculate relevance (Gormley and Tong 2015). As one of the widely used similarity algorithms, the formula of the practical scoring function is described in the Appendix. Solely relying on the practical scoring function is insufficient for discovering the most applicable dataset out of a vast range of available geospatial datasets, as it only considers text content while the domain knowledge (e.g. spatial resolution and processing level) is ignored. Therefore, two questions need to be answered in order to address the ranking challenge of geospatial data discovery: (1) What features can represent users’ search preferences for geospatial data? (2) How can the ranking reach a balance of all these features?

Intelligent evaluation of test suites for developing efficient and reliable software

View Article

Journal Information

Published in International Journal of Parallel, Emergent and Distributed Systems, 2021

Masoud Mohammadian, Zafer Javed

Lucene is a search engine library developed by Apache [27] in Java language. It performs full-text search, and has been widely used in other applications, tools and web-sites as an underlying search engine [28]. Its source code consists of 331 files and 66,702 lines of code. The source code of the Searcher component has 112 files containing 18,388 lines of code.