Explore chapters and articles related to this topic
High-Performance Computing for Fluid Flow and Heat Transfer
Published in W.J. Minkowycz, E.M. Sparrow, Advances in Numerical Heat Transfer, 2018
In recent years, high-speed networking and improved microprocessor performance are making networks of workstations an appealing vehicle for cost- effective parallel computing [12]. Clusters of workstations and personal computers have become increasingly popular. The incremental scalability of processors, memories, and mass storage systems together with the high-performance interconnection networks have made clusters a cost-effective platform for distributed and parallel computing. Clusters and networks of computers built using commodity hardware or software are playing a major role in redefining the concept of high-performance computing. Since 1995 we have seen an explosive growth in the use of high-performance communications for information access, research at the frontiers of science, and commercial endeavors.
Big Data Computing Using Cloud-Based Technologies
Published in Mahmoud Elkhodr, Qusay F. Hassan, Seyed Shahrestani, Networks of the Future, 2017
Samiya Khan, Kashish A. Shakil, Mansaf Alam
First, there are several low-cost storage solutions available with the cloud. Besides this, the user pays for the services he or she uses, which makes the solution all the more cost effective. Second, cloud solutions offer commodity hardware, which allows effective and efficient processing of large datasets. It is because of these two reasons that cloud computing is considered an ideal infrastructural solution for big data analytics.
Big Data, Cloud, Semantic Web, and Social Network Technologies
Published in Bhavani Thuraisngham, Murat Kantarcioglu, Latifur Khan, Secure Data Science, 2022
Bhavani Thuraisngham, Murat Kantarcioglu, Latifur Khan
We utilize the cloud platform for managing and analyzing large datasets. We will see throughout this book that cloud computing is at the heart of managing large datasets. Cloud computing has emerged as a powerful computing paradigm for service-oriented computing. Many of the computing services are being outsourced to the cloud. Such cloud-based services can be used to host the various cyber security applications such as insider threat detection and identity management. Google has now introduced the MapReduce framework for processing large amounts of data on commodity hardware. Apache's Hadoop Distributed File System (HDFS) is emerging as a superior software component for cloud computing combined with integrated parts such as MapReduce [DEAN2004, GHEM2003, HDFS]. Clouds such as HP's Open Cirrus Testbed are utilizing HDFS. This, in turn, has resulted in numerous social networking sites with massive amounts of data to be shared and managed. For example, we may want to analyze multiple years of stock market data statistically to reveal a pattern or to build a reliable weather model based on several years of weather and related data. To handle such massive amounts of data distributed at many sites (i.e., nodes), scalable hardware and software components are needed. The cloud computing model has emerged to address the explosive growth of web-connected devices, and handle massive amounts of data. It is defined and characterized by massive scalability and new Internet-driven economics.
A classification-based fuzzy-rules proxy model to assist in the full model selection problem in high volume datasets
Published in Journal of Experimental & Theoretical Artificial Intelligence, 2022
Angel Díaz-Pacheco, Carlos Alberto Reyes-Garcia
In this section, we present some antecedents of the MapReduce programming model, the standard framework for the development of Big Data applications. This programming model was designed to work over computing clusters, and it operates under the master-slave communication model. The master is responsible for scheduling the jobs on worker nodes, and the worker nodes execute tasks as directed by the master. The design of MapReduce considers the following fundamental principles: Low-cost unreliable commodity hardware. MapReduce is designed to run on large clusters of commodity hardware.Extremely scalable cluster. A MapReduce node uses its own local hard drive. These nodes can be taken out of service with almost no impact to keep the MapReduce jobs running.Fault-tolerant. The MapReduce framework applies straightforward mechanisms to replicate data and launches backup tasks to keep processes running in case of failures.
A distributed unsupervised learning algorithm and its suitability to physical based observation
Published in International Journal of Parallel, Emergent and Distributed Systems, 2022
The presented algorithm provides several benefits when clustering physically based samples. A sample set with unknown a-priori number of classes may be classified, the algorithm provides inherent parallelism, and the clustering maintains resolving capability with merging clusters. The clustering operation may be tunable to commodity hardware limitations and run on a scalable grid architecture. The base algorithm may be altered to identify and cluster longer term signals also through a distance metric of an STFT of the event or may be altered to least-squares based regression on parameter sets of generalised models. Such models may be observed in Radio-Spectroscopy/Fast Radio Burst clustering, in large datasets collected by observations of Radio-Astronomy, as well as in Sonar and Radar data acquisitions to name a few. The algorithm, however, relies on a set of presumptions about the data characteristics. First, there is a minimum number of datapoints lying within a single sub-cluster such that the class is not considered anomalous and discarded during clustering. Second the presumption that higher SNR samples will found the class and that these will lie at least roughly nearer the centre of the cluster. Although the algorithm variant will adapt the centroid based on the linear classification of the sample, there remains possibility that the algorithm may establish a class founding at edge cases. Third is the requirement large event numbers are observed, too many to fit in memory of a single node (Figure 9).