Explore chapters and articles related to this topic
More on R/Python Programming
Published in Nailong Zhang, A Tour of Data Science, 2020
R is a lazy programming language since it uses lazy evaluation by default [13]. Lazy evaluation strategy delays the evaluation of an expression until its value is needed. When an expression in R is evaluated, it follows an outermost reduction order in general. But Python is not a lazy programming language and the expression in Python follows an innermost reduction order. The code snippets below illustrate the innermost reduction vs. outermost reduction.
A methodology for prioritizing safety indicators using individual vehicle trajectory data
Published in Journal of Transportation Safety & Security, 2023
Yunjong Kim, Kawon Kang, Juneyoung Park, Cheol Oh
For the random forest model, there are the number of trees to grow (ntrees) and the number of randomly sampled variables (mtries) at each tree node as parameters adjusted for optimization (Das et al., 2020; Lee et al., 2020). To optimize the random forest model, a model for classifying traffic flows was trained by setting 33 safety indicators calculated for each traffic flow case as input variables. The random forest model for variable selection was optimized through the R-based data mining package caret. When executing ranger caret it automatically performs a grid search of mtry over the whole mtry parameter space. The algorithm evaluates 3 points in the parameter space (smallest and biggest possible mtry and the mean of these two values) with 10 bootstrap iterations as evaluation strategy. The algorithm finally chooses the parameter with the lowest classification error rate (Probst et al., 2019). For training the model, the total number of trees is optimized to 500 (ntrees), and it is derived as an optimization phase that builds trees with 33 (mtries) randomly chosen variables when configuring each node. In the optimized model, the OOB error was 12.86% and the classification accuracy was approximately 87.14%.