Explore chapters and articles related to this topic
Machine Learning
Published in Pedro Larrañaga, David Atienza, Javier Diaz-Rozo, Alberto Ogbechie, Carlos Puerto-Santana, Concha Bielza, Industrial Applications of Machine Learning, 2019
Pedro Larrañaga, David Atienza, Javier Diaz-Rozo, Alberto Ogbechie, Carlos Puerto-Santana, Concha Bielza
For clustering and supervised classification, we have selected the following five software tools: WEKA10, R11, scikit-learn12, KNIME13 and RapidMiner14. WEKA (Hall et al., 2009) is a Java-based open-source machine learning platform developed at the University of Waikato, New Zealand. The software is free under the GNU/GPL 3 license for non-commercial purposes. It is popular mainly because it is user friendly and a large number of implemented algorithms are available. WEKA offers four user interface options: command-line interface, Explorer, Experimenter and Knowledge Flow. The preferred option is Explorer, which can be used to define the data source, data preparation, machine learning algorithm execution, and visualization. Experimenter is mainly used for comparing the performance of different algorithms on the same dataset. Knowledge Flow is useful for specifying the dataflow using connected visual components. Massive Online Analysis (MOA) (Bifet et al., 2012) is based on the WEKA framework and includes many online learning algorithms for evolving data streams.
Big data analytics in medical engineering and healthcare: methods, advances and challenges
Published in Journal of Medical Engineering & Technology, 2020
Lidong Wang, Cheryl Ann Alexander
Storm, Flink and Spark Streaming are major platforms (open sources) for the distributed stream data processing. Storm and Flink are up to 15 times higher in the efficiency of throughput than Spark Streaming that is micro-batch processing. Storm is better than the others in the throughput efficiency; however, Spark Streaming is robust in the failures of nodes and can provide a recovery without any losses [54]. Stream big data in real time can be obtained through Apache Spark if integrated with Apache Kafka. Apache Spark keeps a distributed framework that can be used for processing big data while Kafka is a distributed messaging system that is appropriate for both offline and online message consumption. Stream data generated from monitoring devices pass through a Kafka producer and then the producer delivers the data to the Kafka cluster where messages are saved in chunks. The Kafka consumer (integrated with Spark) consumes the data from the Kafka server and handles it. Spark works with Kafka consumer APIs to consume stream data using the Direct approach or the Receiver-based approach [55]. In addition, massive on-line analysis (MOA) is a framework that is used for mining stream data. It contains offline and online collections for classification and clustering as well as evaluation tools. It can be used to perform regression, classification, clustering, frequent pattern mining and frequent graph mining [56].