Explore chapters and articles related to this topic
Big Data Stream Processing
Published in Vivek Kale, Parallel Computing Architectures and APIs, 2019
The components of Spark are as follows: Driver: The driver defines RDD objects and their relationships; it is the code that includes the “main” function and defines the resilient RDDs and their transformations.DAG scheduler: The DAG scheduler optimizes the code and arrives at an efficient DAG that represents the data processing steps in the application. The resulting DAG is sent to the cluster manager.Cluster manager: The cluster manager is responsible for assigning specific processing tasks to workers. The cluster manager has information about the workers, assigned threads, and the location of data blocks. The cluster manager is also the service that handles DAG playback in the case of worker failure. The cluster manager can be Yet Another Resource Negotiator (YARN), Mesos, or Spark’s cluster manager.Worker: The worker receives units of work and data to manage. The worker executes its specific task without knowledge of the entire DAG, and its results are sent back to the driver application.
Portal Infrastructure and NFR Planning
Published in Shailesh Kumar Shivakumar, and User Experience Platforms, 2015
A portal cluster consists of multiple server instances (sometimes referred to as nodes) that are managed by a cluster manager. The server nodes can run on the same machine or on different machines. The cluster manager is responsible for synchronizing the code, configuration, data, session, and cache on all its member nodes. It also takes care of routing the requests based on the performance and availability of its member nodes and, hence, helps in achieving high availability. There are mainly two types of clustering: vertical and horizontal clustering.
Parallel computing in railway research
Published in International Journal of Rail Transportation, 2020
Qing Wu, Maksym Spiryagin, Colin Cole, Tim McSweeney
In addition to Hadoop MapReduce, another cloud computing technique called Apache Spark has also found railway applications [71–73]. Within the Apache Spark framework, there are four main components: a driver node, a number of worker nodes, a cluster manager, and executors (the same number as for worker nodes). The driver node is basically a user interface which also does the computing task partitioning. The worker nodes (slave computing units) conduct the specific computing tasks in parallel. The cluster manager that is under the control of the driver node coordinates the parallel computing process while the executors manage the computing tasks within each of the worker nodes. Hadoop MapReduce and Apache Spark have different ways of moving data and are therefore most suitable for different applications. Hadoop MapReduce moves data via networks and disks, has slower computing speed and is most suitable for analysis of archived Big Data. Apache Spark caches data in memory, has faster computing speed and is most suitable for real-time data processing. Comparatively, Hadoop MapReduce would be very suitable to analyse data that are already collected from field tests, while Spark is better for processing of on-line condition monitoring data.