Explore chapters and articles related to this topic
Performance Evaluation of Components of the Hadoop Ecosystem
Published in Salah-ddine Krit, Mohamed Elhoseny, Valentina Emilia Balas, Rachid Benlamri, Marius M. Balas, Internet of Everything and Big Data, 2021
Nibareke Thérence, Laassiri Jalal, Lahrizi Sara
As organizations are getting flooded with massive amounts of raw data, the challenge is that traditional tools are poorly equipped to deal with the scale and complexity. That is where Hadoop comes in. Hadoop is well suited to meet many Big Data challenges, especially with high volumes of data and data with a variety of structures [16, 17]. Hadoop is a framework for storing data on large clusters of everyday computer hardware that is affordable and easily available and running applications against that data. A cluster is a group of interconnected computers (known as nodes) that can work together on the same problem. As mentioned, the current Apache Hadoop ecosystem consists of the Hadoop Kernel; MapReduce [18]; HDFS; and a number of various components like Apache Hive, Pig, Flume, etc.
2
Published in Carlos Alberto Vélez Quintero, Optimization of Urban Wastewater Systems using Model Based Design and Control, 2020
A cluster is a local group of networked computers with installed software that allows them to work simultaneously in parallel. Clusters for parallel computing require a high-speed, lowlatency network in order to achieve high performance. Latency refers to the time it takes for one processor to communicate with another. Key features are the bus speeds that connect the CPU to memory, power consumption per CPU, and the networking technology that connects the CPUs to one another (Creel and Goffe 2008). If NOWs are used then there is no need for the physical creation of the cluster but still installation of the parallel communication software is needed. Reliability also depends on the network connection and the availability of idle workstations.
Big Data Techniques and Security
Published in Rakesh M. Verma, David J. Marchette, Cybersecurity Analytics, 2019
Rakesh M. Verma, David J. Marchette
The high volume and velocity of data means that individual machines are inadequate for the tasks of data handling and processing, and clusters of computers must be employed. Clustering software is used to transform the cluster of machines into a pooled resource that acts like one large machine with easy scalability and high availability.3 A cluster should be easily scalable by adding more machines as needed without having to change the characteristics of the machines that are already in the cluster. With many machines in the cluster, even if each individual machine is highly reliable, there are bound to be failures. Therefore, a cluster should easily tolerate failures of individual machines or storage components and provide high throughput.
High-Performance Computing for Nuclear Reactor Design and Safety Applications
Published in Nuclear Technology, 2020
Afaque Shams, Dante De Santis, Adam Padee, Piotr Wasiuk, Tobiasz Jarosiewicz, Tomasz Kwiatkowski, Sławomir Potempski
Computer clusters over the past decade dominated the world of supercomputing.1 They are the most cost effective way to build a high-performance computing (HPC) installation. Although a communication model in clusters is more difficult for the programmers than classical supercomputers (e.g., lack of shared memory or automatic process migration between cluster nodes), their price-to-performance ratio is unbeatable because of the possibility of using many popular PC hardware components. This is important especially for the CPUs and accelerator architectures (Intel, Nvidia, etc.) as well as for the operating system platform (usually Linux). Clusters are usually controlled by advanced schedulers that are responsible for assigning CPUs and other resources to multiple different jobs submitted by multiple users. Nevertheless, for the tasks utilizing significant subsets of the available resources, finding a homogeneous partition can be difficult. This is true especially for computational fluid dynamics (CFD), which relies heavily on the efficient interconnection and homogeneity of both the node architecture and the communication topology. Moreover, an application may require a special dedicated system software configuration.
A branch-and-bound algorithm for the cell formation problem
Published in International Journal of Production Research, 2018
Irina E. Utkina, Mikhail V. Batsyn, Ekaterina K. Batsyna
To speed up our approach, it is possible to parallelise the branch-and-bound algorithm. However, there are two issues which should be taken into account. First, different subtrees of the search tree in a branch-and-bound algorithm usually have very different sizes. This raises the issue of load balancing between working threads or processes. Another issue is that the search tree is huge and we have to use the depth-first search to limit the space complexity. An interesting approach which deals with these two problems is developed by Pietracaprina et al. (2015). It has a constant space complexity O(1) for every processor and a sublinear running time complexity for a reasonably small height h of the search tree and number of processors p in comparison with the big size n of the tree. In case of an MPI-based parallelisation with a big number of machines it makes sense to use the Parallel Enumeration and Branch-and-Bound Library (PEBBL) implemented by Eckstein, Hart, and Phillips (2015). To provide efficient communication between a big number of machines, it organises processors into clusters each having a hub processor and a number of worker processors.
The role of an ant colony optimisation algorithm in solving the major issues of the cloud computing
Published in Journal of Experimental & Theoretical Artificial Intelligence, 2023
Saied Asghari, Nima Jafari Navimipour
Load balancing splits the workload among various cloud resources so that more works can be fulfilled and all users are served faster (Elenin & Kitakami, 2011). Load balancing advances the workload distribution across numerous resources, like data storage, computers, network links, a computer cluster, and disk drives. Load balancing tries to optimise resource utilisation, exploit the throughput, reduce the response time, and reduce the resource overload. In the following, some of the important techniques in this category are reviewed.