Explore chapters and articles related to this topic
Gradient boosting machines
Published in Brandon M. Greenwell, Tree-Based Methods for Statistical Learning in R, 2022
XGBoost [Chen and Guestrin, 2016] is one of the most popular and scalable implementations of GBMs. While XGBoost follows the same principles as the standard GBM algorithm, there are some important differences, a few of which are listed below: more stringent regularization to help prevent overfitting;a novel sparsity-aware split finding algorithm;weighted quantile sketch for fast and approximate tree learning;parallel tree building (across nodes within a tree);exploits out-of-core processing for maximum scalability on a single machine;employs the deep-learning concept of dropout to mitigate the problem of overspecialization [Vinayak and Gilad-Bachrach, 2015].
Numerical Techniques
Published in Gianni Comini, Stefano Del Giudice, Carlo Nonino, Finite Element Analysis in Heat Transfer, 2018
Gianni Comini, Stefano Del Giudice, Carlo Nonino
Related to the efficient solving of large algebraic systems are the in-core and out-of-core methods of storing the system matrix. A solution scheme that retains the system matrix in a dimensioned array is an in-core algorithm, while a solution scheme that uses external files is an out-of-core algorithm. Out-of-core algorithms had to be used on old computers, which could address only a relatively limited amount of central memory. On modern computers, operating with virtual memory, in-core algorithms are always more convenient than out-of-core algorithms. In fact, out-of-core algorithms imply reading and writing to disk, while modern operating systems automatically unload large arrays to backing storage and process them very efficiently in small sections. Thus, in Chapter 5, we will present an in-core algorithm and here we have chosen to concentrate only on in-core storage schemes. However, once an effective in-core finite element solution has been studied, little difficulty should be encountered in understanding specific out-of-core implementations of the same strategy.7
Large Graph Computing Systems
Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017
Chengwen Wu, Guangyan Zhang, Keqin Li, Weimin Zheng
X-Stream is also a disk-based large graph computing system on a single machine. But different from GraphChi's vertex-centric computation model, it adopts an edge-centric computation model and computational states are maintained on the vertices. Its computational model is shown in Figure 17.8. In the scatter phase, X-Stream iterates all the edges and sends the update over the edge, and then in gather phase, iterates the updates and applies the update to the corresponding vertex. This edge-centric approach accesses edges sequentially to make better use of the disk's bandwidth. However, it incurs random access on the vertex state. To mitigate this overhead, streaming partition approach is used, which partitions vertices into subsets, so that the states of a subset can fit in high-speed memory (cache) for in-memory graphs, and main memory for out-of-core graphs. Each vertex subsets associates an edge partition, which stores all the out-edges of that vertex subset. Figure 17.9 shows the edge-centric computation model with streaming partitions. In the scatter phase, X-Stream processes all the streaming partitions, and for each partition, it loads the vertex subset, and then streams the edges on the storage to generate updates, and writes them to output buffer (Uout). X-Stream appends the update of a edge to corresponding destination partition's local input buffer (Uin). In the gather phase, it reads the update value from Uin and updates the vertex state.
Acoustic signal based water leakage detection system using hybrid machine learning model
Published in Urban Water Journal, 2023
M. Saravanabalaji, N. Sivakumaran, S. Ranganthan, V. Athappan
XGBoost cutting-edge algorithms are gaining popularity not only for regression but also for classification problems (Ogunleye and Wang 2020) because of their higher performance. Chen and Guestrin (Chen and Guestrin 2016) created the scalable tree-boost system XGBoost algorithm. Due to parallel, distributed, out-of-core, and cache-aware computation, the technique is over 10 times quicker than typical ML and DL models. This approach also has the benefit of being well-tuned and scalable. This advancement makes it possible to manage billions of samples in distributed or memory-constrained environments. This cutting-edge use of gradient boosting machines was designed to address real-world problems where sparse input data is a typical occurrence. The technique is particularly aware of incomplete data, extra zeros inside the database, and the repercussions of the utilized feature engineering methods. As additional models are produced, the efficiency of the currently available ones marginally increases. This is the ensemble approach. The gradient descent approach reduces model loss to a minimum.
Terrestrial laser scanning for the comprehensive structural health assessment of the Baptistery di San Giovanni in Florence, Italy: an integrative methodology for repeatable data acquisition, visualization and analysis
Published in Structure and Infrastructure Engineering, 2018
Michael Hess, Vid Petrovic, Mike Yeager, Falko Kuester
In order for the interactive visualization system to effectively serve this role, the visualization framework must offer both performance, to deal interactively with billions of data points, and flexibility, to minimize friction involved in data organization and exchange. To this end, a custom algorithm was developed (Petrovic, Vanoni, Richter, Levy, & Kuester, 2014; Petrovic et al., 2011) using an adaptively and progressively refining rendering strategy that loads data on the fly as required (a technique commonly referred to as out-of-core.) The algorithm leverages a GPU-based pointbuffer to decouple the interactive performance of visualization from the costs of streaming data from disk or a network server. On-demand loading of data is fully feedback-driven, governed by real-time estimates of the contributions the various data subsets make to the rendering. This paradigm allows data assets, regardless of size, to be dynamically added, removed, interactively repositioned and otherwise transformed, while maintaining interactive visualization performance – enabling the real-time exploration of billions of data points, and allowing the user to flexibly inspect the complete virtual site surrogate.
Integration of machine learning and statistical models for crash frequency modeling
Published in Transportation Letters, 2022
Dongqin Zhou, Vikash V. Gayah, Jonathan S. Wood
The XGBoost method adopts a gradient boosting scheme and introduces some critical improvements. First, it introduces regularization to control over-fitting. Second, it considers numerous system optimizations to facilitate fast execution with less computation while ensuring effective predictions. These include distributed and out-of-core computing to train sizable models with large datasets, parallelization of tree construction and cache optimization to best utilize computational hardware. Finally, it embodies algorithmic enhancements such as sparsity awareness to handle missing values and weighted quantile sketch to allow approximate tree learning. With these improvements, the XGBoost method can achieve superior predictive results with computational efficiency.