Data-intensive computing – Knowledge and References

Explore chapters and articles related to this topic

Transition from Relational Database to Big Data and Analytics

Published in Mohiuddin Ahmed, Al-Sakib Khan Pathan, Data Analytics, 2018

In order to address the above characteristics of big data, several new technologies such as Not only SQL (NoSQL), Hadoop, and Spark were developed. NoSQL database: NoSQL [18,19] database is a schema-less database used to store and manage unstructured data, where the management layer is separated from the storage layer. The management layer provides assurance of data integrity.NoSQL provides high performance scalable data storage with low-level access to a data management layer, so that data management tasks are handled at the application layer. Advantage of NoSQL is that the structure of the data is modified at the application layer without making any changes to the original data in the tables.Parallel processing: Many processors (300 or more in number) work in loosely coupled or shared nothing architecture. Independent processors with their own operating systems and memories work parallelly on different parts of the program to improve the processing speed of the tasks and memory utilization. Communication between tasks takes place through messaging interface.Distributed file system (DFS): DFS allows multiple users working on different machines to share files, memories, and other resources. Based on access lists on both servers and clients, the client nodes get restricted access to the file systems, but not to the whole block of storage. However, it is again dependent on the protocol.Hadoop: It is a fundamental framework for managing big data on which many analytical tasks and stream computing are carried out. Apache Hadoop [20] allows distributed processing of huge datasets over multiple clusters of commodity hardware. It provides a high degree of fault tolerance with horizontal scaling from a single to thousands of machines.Data-intensive computing: Data parallel approach is used in parallel computing application to process big data [21]. This is based on the principle of association of data and programs to perform computation.

An improved density-based spatial clustering of application with noise

View Article

Journal Information

Published in International Journal of Computers and Applications, 2018

Limin Wang, Mingyang Li, Xuming Han, Kaiyue Zheng

With the maturity of big data era and artificial intelligence, human civilization has entered a new era of data-intensive computing, where data has become an extremely important asset. Due to constant proliferation of data recently, effective analyzing and utilizing vast amounts of raw data and retrieving valuable information is becoming the focus of many researchers and common subjects [1]. The nature of data mining is to identify potential rules and valuable information from massive raw data, through a series of scientific analysis and processing [2]. As an important branch of data mining technology, cluster analysis, without any prior information provided, extract valuable information by exploring the similar relationship between internal data structure information and data points from huge volumes of data [3].