Explore chapters and articles related to this topic
Storage and databases for big data
Published in Jun Deng, Lei Xing, Big Data in Radiation Oncology, 2019
Tomas Skripcak, Uwe Just, Ida Schönfeld, Esther G.C. Troost, Mechthild Krause
For situations in which high I/O rates and low latencies are of top priority, as it is common for real time analytical tasks, the in-memory database storage solutions provide an interesting solution if the database size does not exceed the available memory space. Currently, these systems cannot cope with biggest data volumes; however, they are very comfortable with accommodating small-to-medium sized databases while still providing the best possible I/O that is a prerequisite for real-time data processing algorithms. This is fitting for use in cases where processing jobs are triggered from end-user facing components and results should be delivered in a reasonable time. In-memory systems usually provide some alternative storage mechanism to prevent data loss (backup) in case of memory cell failure. It can be implemented as replication of data to other cells in the cluster or persisting to the disk (database images or transactional log). In-memory database systems usually require some sort of data model (vary on database implementation) in order to structure the data for organization and processing.
Big Data
Published in Preston de Guise, Data Protection, 2020
Traditionally, such performance limits were resolved by buying or building larger and larger servers with more RAM, more CPU, and putting the data to be analyzed on high-performance storage—what we would call scale-up. In-memory database servers such as SAP HANA for instance, might have multiple terabytes of RAM. This however has substantial limitations both in terms of maximum size/performance and the cost associated with building such a server. While it’s possible to configure servers with multiple processors, each with 10 or more cores and even terabytes of memory, there will always be practical or fiscal limits to how far a system can be scaled.
A Survey of Big Data and Computational Intelligence in Networking
Published in Yulei Wu, Fei Hu, Geyong Min, Albert Y. Zomaya, Big Data and Computational Intelligence in Networking, 2017
Yujia Zhu, Yulei Wu, Geyong Min, Albert Zomaya, Fei Hu
There are a variety of high-performance computing (HPC) platforms, widely applied in areas of economics, computer networks, and so on. For example, SAP HANA [65], a new generation of in-memory database as well as an integrated analytics platform, provides higher processing speed and accuracy. The authors in Ref. [66] took advantage of the HANA platform to implement a real-time event analysis and monitoring system (REAMS) since logs reflect system status and users’ behaviors, both internal and external, thus ensuring system reliability and security. They collected user events, especially logon and logoff events from multiple sources, and stored in a unified format for further efficient analysis. The Cloud environment is extremely suitable for streaming data analysis due to many aspects. For example, Cloud deployment on virtual machines provides application portability, platform independence, and dynamic resource allocation for specific tasks. Moreover, Cloud-based tools enable rich performance information collection and application optimization. Chef [67], Puppet [68], and Ansible [69] are currently widely used Cloud automation tools for single Cloud provider application deployment, while GEANT network infrastructure [70], a Zero touch provisioning, operations, and management (ZTPOM) concept implementation, allows inter-Cloud service delivery and provisioning. As for some specific applications in networked big data platform deployment, considering a currently growing phenomenon that cybercriminals spread malicious payloads through spam, AlMahmoud et al. [71] proposed a privacy-preserving collaborative spam detection platform (Spamdoop) built on top of a standard MapReduce facility. Network designers often use simulators to pre-evaluate the performance of a designed network with artificial network traffic prior to actual deployment. The authors in Ref. [72] introduced a novel method for modeling network traffic patterns of big data platforms, extracting communication behaviors, and replaying them instead of packet traces. Experiments proved the reliability and scalability of this methodology when compared to real traffic.
Generating Graph-Inspired Descriptors by Merging Ground-Level and Satellite Data for Robot Localization
Published in Cybernetics and Systems, 2023
Vasiliki Balaska, Loukas Bampis, Stefanos Katsavounis, Antonios Gasteratos
First, our final system was tested with regards to the achieved semantic localization performance. The selected measurements refer to the percentage of correctly localized query robot observations (Rq) to the respective memory map. Specifically, our system accuracy is defined as the set of query descriptors (Dq), which are matched correctly to the memory database (Ds), according to ground-truth. Hence, the robustness of the localization process, based on the proposed SemMetric descriptors, is compared against the corresponding accuracy of the floating-point Speeded Up Robust Features (SURF) (Bay et al. 2008), Scale-Invariant Feature Transform (SIFT) (Lowe 2004) feature extraction mechanisms, as well as the binary Oriented FAST and Rotated BRIEF (ORB) (Rublee et al. 2011) descriptors. Specifically, we extract the above-mentioned descriptors and produce visual words vectors for each scene of the query ground-level route and enhanced satellite map (i.e., database memory map). Subsequently, we compute once again the score S between those vectors, while data association is achieved by identifying the highest scoring pair. Thus, the corresponding similarity matrix M is produced, as mentioned in Section 2.4.
Integrating memory-mapping and N-dimensional hash function for fast and efficient grid-based climate data query
Published in Annals of GIS, 2021
Mengchao Xu, Liang Zhao, Ruixin Yang, Jingchao Yang, Dexuan Sha, Chaowei Yang
In 2005, the prototype of MonetDB was introduced as a main memory database system uses a column-at-a time execution model for the data warehouse. Although MonetDB is not a native array database but a full-fledged relational DBMS (Idreos et al. 2012), it provided useful thoughts and ideas for processing array models. It is a column-oriented database and each column, or BAT (Binary Association Table), in the database is implemented as a C-array on storage level. (Boncz, Zukowski, and Nes 2005) MonetDB has focused on optimizing the major components of traditional database architecture to make better use of modern hardware in database applications that support analyse massive data volumes (Boncz, Kersten, and Manegold 2008). In 2008, Cornacchia et al. also introduced an example of using a matrix framework with a Sparse Relational Array Mapping (SRAM) by Information Retrieval (IR) researchers, they used MonetDB/X100 as the relational backend, which provided fast response and good precision. The matrix framework is based on the array abstraction, and by mapping them onto the relational model and develop array queries, MonetDB allows them to optimize the performance and developed a high-performance IR application.
NoSQL real-time database performance comparison
Published in International Journal of Parallel, Emergent and Distributed Systems, 2018
Diogo Augusto Pereira, Wagner Ourique de Morais, Edison Pignaton de Freitas
A model to evaluate the Velocity, Volume and Varity (scalability) of different in-memory database systems is presented in [14]. Using this model, CRUD operations of four different database systems were compared against each other. VoltDB [15] performed better than MongoDB and MySQL in most of the tests, and SQLite [16] was significantly better than MongoDB and MySQL on select, update and delete operations.