Explore chapters and articles related to this topic
Big Data Computing and Graph Databases
Published in Vivek Kale, Agile Network Businesses, 2017
This approach to parallel processing is often referred to as the shared-nothing approach, since each node, consisting of a processor, local memory, and disk resources, shares nothing with other nodes in the cluster. In parallel computing this approach is considered suitable for data-processing problems that are embarrassingly parallel—that is, where it is relatively easy to separate the problem into a number of parallel tasks and there is no dependency or communication required between the tasks other than their overall management. These types of data-processing problems are inherently adaptable to various forms of distributed computing including clusters and data grids and cloud computing.
Emerging Concepts and Approaches for Efficient and Realistic Uncertainty Quantification
Published in Dan Frangopol, Yiannis Tsompanakis, Maintenance and Safety of Aging Infrastructure, 2014
Michael Beer,, Ioannis A. Kougioumtzoglou, Edoardo Patelli
In order to provide more accurate and realistic results the complexity of the model is continuously increasing and in turn, the computational effort required to evaluate them. The explicit quantification of the effects of uncertainties increases these computational costs by orders of magnitude. Moreover, these numerical methods need to be scalable and perform efficiently with the nowadays available hardware resources, i.e. high performance computing. The term High Performance Computing is most commonly associated with computing used for scientific research although high performance computing are becoming cheaper and very popular. In fact, computers are continuing evolving in terms of speed and accessibility and nowadays everyone has access to multicore, giga-flops computers (i.e. desktops and laptops computers) and new computers may have 16, 32, 48, 64, etc . . ., cores per processor (i.e. calculations can be routinely performed on office computers). Moreover, almost everyone now has access to Linux clusters and/or cloud computing increasing the computational resources available. What distinguishes grid computing from typical cluster computing systems is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed. Also, while a computing grid may be dedicated to a specialized application, it is often constructed with the aid of general purpose grid software libraries (Magoules et al. 2009). The ability to take advantages of parallel computing clearly depends on the characteristics of the algorithm itself: for instance, plain Monte Carlo simulation is an embarrassingly parallel problem, since different samples can be computed completely independently of each other and therefore little or no effort is required to separate the problem into a number of parallel tasks (with the exception of creating parallel streams of random numbers). On the other hand, algorithms which are inherently sequential, such as Markov chains and Sequential Monte Carlo, have a relatively low degree of parallelism. Nevertheless, applications of these sequential algorithms that require repeated execution of analysis tasks (e.g. optimization, reliability based optimization, sensitivity analysis) are particularly suitable for parallelisation. Although the importance of parallel computing in the field of ageing and maintenance has long been recognized, see e.g. (Schuëller (Ed.) 2007), parallel computing has not been emphasized enough so far.
Big Data Computing
Published in Vivek Kale, Digital Transformation of Enterprise Architecture, 2019
This approach to parallel processing is often referred to as a “shared nothing” approach since each node consisting of processor, local memory, and disk resources shares nothing with other nodes in the cluster. In parallel computing, this approach is considered suitable for data processing problems that are “embarrassingly parallel,” that is, where it is relatively easy to separate the problem into a number of parallel tasks and there is no dependency or communication required between the tasks other than overall management of the tasks. These types of data processing problems are inherently adaptable to various forms of distributed computing including clusters and data grids and cloud computing. Analytical environments are deployed in different architectural models. Even on parallel platforms, many databases are built on a shared everything approach in which the persistent storage and memory components are all shared by the different processing units. Parallel architectures are classified by what shared resources each processor can directly access. One typically distinguishes shared memory, shared disk, and shared nothing architectures (as depicted in Figure 17.3). In a shared memory system, all processors have direct access to all memory via a shared bus. Typical examples are the common symmetric multiprocessor systems, where each processor core can access the complete memory via the shared memory bus. To preserve the abstraction, processor caches, buffering a subset of the data closer to the processor for fast access, have to be kept consistent with specialized protocols. Because disks are typically accessed via the memory, all processes also have access to all disks.In a shared disk architecture, all processes have their own private memory, but all disks are shared. A cluster of computers connected to a SAN is a representative for this architecture.In a shared nothing architecture, each processor has its private memory and private disk. The data is distributed across all disks, and each processor is responsible only for the data on its own connected memory and disks. To operate on data that spans the different memories or disks, the processors have to explicitly send data to other processors. If a processor fails, data held by its memory and disks is unavailable. Therefore, the shared nothing architecture requires special considerations to prevent data loss.
Cloud-based storage and computing for remote sensing big data: a technical review
Published in International Journal of Digital Earth, 2022
Chen Xu, Xiaoping Du, Xiangtao Fan, Gregory Giuliani, Zhongyang Hu, Wei Wang, Jie Liu, Teng Wang, Zhenzhen Yan, Junjie Zhu, Tianyang Jiang, Huadong Guo
Data-separable computing is supported by cloud computing and is known as embarrassingly parallel or pleasingly parallel computing in computer science (Barcelona-Pons et al. 2019). This type of computing has been widely applied in RSBD using quantitative remote sensing, artificial intelligence, etc. For example, Pekel et al. integrated the computing power of 10,000 computers to map 30 m global water bodies for almost 30 years based on an expert system classifier (Pekel et al. 2016). Ni et al. extracted 10 m rice-growing areas in northeast China using machine learning (Ni et al. 2021). Xie et al. produced 30 m annual irrigation maps based on MODIS and Landsat data for the United States from 1997 to 2017 using a random forest classifier (Xie and Lark 2021). All of the above studies have relied on the parallelization of data-separable computing. Additionally, the studies used Earth Engine’s ‘image tilling data distribution model’ for spatial partitioning and ‘streaming collections’ for temporal partitioning (Gorelick et al. 2017).