Explore chapters and articles related to this topic
High-End Tools and Technologies for Managing Data in the Age of Big Data
Published in Chiranji Lal Chowdhary, Intelligent Systems, 2019
N. Nivedhitha, P. M. Durai Raj Vincent
Apache Mahout is a scalable open source project mainly developed for implementing machine learning techniques. Machine learning is a technique where the algorithms are trained based on past experiences and tested to predict a near-perfect future. Mahout runs along with Hadoop to manage huge volumes of data the algorithms are written on top of Hadoop. Mahout offers ready-to-use algorithms for data mining tasks. Apache Mahout supports mainly collaborative filtering to mine user behaviors and recommendation systems, clustering where similar items are grouped together, classification to categorize the data into the specific class (Giacomelli, 2013; Meng et al., 2016).
The Art of In-Memory Computing for Big Data Processing
Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017
Mihaela-Andreea Vasile, Florin Pop
Apache Mahout is a library for building scalable data mining and machine-learning applications. Mahout provides multiple machine-learning algorithms implementations. It runs over Hadoop using MapReduce. Its main design goals are the efficient processing, the ease of use, or the fact that it might be easily integrated with different data stores and extended. It has been used in multiple applications requiring machine-learning implementations such as clustering Wikipedia's articles [17], collaborative filtering [18], or in bioinformatics (also the clustering algorithms are used in this case).
Design and implementation of algorithm recommended by new users based on Mahout
Published in Amir Hussain, Mirjana Ivanovic, Electronics, Communications and Networks IV, 2015
However, achieving the recommender system algorithm is a very laborious thing, and the appearance of Mahout solves this problem. When the algorithm of Apache Mahout runs under the platform of Hadoop, Mahout projects encapsulate commonly used algorithms, such as clustering, classification, recommendation engine and excavation of frequent item sets, etc. and are designed to help developers develop intelligent applications more quickly and easily.
Malicious accounts detection from online social networks: a systematic review of literature
Published in International Journal of General Systems, 2021
Imen Ben Sassi, Sadok Ben Yahia
One of the key challenges of using OSNs data is the consideration of data streams and the design of an online real-time solution. In malicious accounts detection, we focused most of the studies on the batch processing mode by validating their models using a test set. Only Miller et al. handled the streaming nature of online posts, particularly tweets, while proposing their anomaly detection system aiming to cluster streaming data (Miller et al. 2014). In the big data domain, we refer to this challenge as the velocity problem. For example, bots spreading fake news in OSNs are usually short-lived accounts. After the 2016 U.S. election, most news accounts no longer exist. We can see this behavior as a strategy to evade detection from the OSN immune systems. The real-time detection of misbehavior is very difficult and identifying the malicious entities engaged to spread the online content in terms of data storage and computation. Thus, a distributed solution needs to be implemented for rapid and efficient analysis. Two frameworks are available to address the problem of mining large social network datasets using graph and machine learning methods for malicious behavior detection. Apache Giraph14 is a graph processing framework designed to address the scalability of large-scale graph-based algorithms. Apache Mahout15 is a framework that provides distributed implementations of machine learning algorithms.