Subsetting – Knowledge and References

Explore chapters and articles related to this topic

HDF5

Published in Praveen Kumar, Jay Alameda, Peter Bajcsy, Mike Folk, Momcilo Markus, Hydroinformatics: Data Integrative Approaches in Computation, Analysis, and Modeling, 2005

Michael J. Folk

The HDF5 format implements the structural aspects of the data model. It determines how the objects (datasets, datatypes, groups, etc.) are stored, and uses structures that facilitate fast direct access and partial access, including subsetting. The format also allows objects to be organized or encoded in a variety of ways to achieve efficient storage and access in different circumstances. Because the format has so many options and features, it is necessarily complex, requiring applications to use the HDF5 API for virtually all access.

Bayesian Dynamic Feature Partitioning in High-Dimensional Regression With Big Data

View Article

Journal Information

Published in Technometrics, 2022

Rene Gutierrez, Rajarshi Guhaniyogi

Assessing dynamic partitions of the set of parameters over time. For the strategies implemented to dynamically construct subsets in high-dimensional regression with either shrinkage priors or variable selection priors, we monitor the stability of subsets as time progresses. To this end, we evaluate the Adjusted Rand Index (ARI) (Hubert and Arabie 1985) between partitions of parameters corresponding to two successive time points and plot the ARI over time. The ARI evaluates the agreement in subset assignment between two subsetting/partitioning configurations and is corrected for chance. It ranges between –1 and 1, with larger values indicating agreement between partitioning configurations. Thus, the ARI should converge around 1 as time progresses if the partitions stabilize over time. For the partitioning algorithm implemented for shrinkage priors, we additionally check trace-plot for the optimal value over time and offer an understanding of the sensitivity of inference to the choice of M. In order to being not repetitive, we present trace-plot of or sensitivity to the choice of M only for the Bayesian Lasso prior. The conclusions are similar for the Horseshoe prior.

swmmr - an R package to interface SWMM

View Article

Journal Information

Published in Urban Water Journal, 2019

Dominik Leutnant, Anneke Döring, Mathias Uhl

At its core, the package relies on the tidy data concept (Wickham 2014) which is expressed through a set of harmonised packages sharing common data representation principles (‘tidyverse’ – Wickham (2017)). Although most tasks could have been addressed with base R,4 packages from the ‘tidyverse’ tend to simplify both the programming and the data analysis. For example, swmmr uses tibbles (Müller and Wickham 2017) instead of R’s built-in data.frame to represent SWMM sections because tibbles have a convenient print method which only shows the first 10 rows of data, and all the columns that fit on screen (Wickham and Grolemund 2016). This becomes especially useful when dealing with large SWMM data using functions such as read_inp(), read_rpt() and read_lid_rpt() (Table 1) as the console output remains readable in case large data have been printed. Generally, these functions take the path to a corresponding SWMM file (*.inp or *.rpt) and parse its content to a named list of tibbles or a single tibble, respectively. read_inp() creates an object of class inp, whose list element names are identical to the names of SWMM input sections available in lower letters (e.g. options, subcatchments, etc). To print a summary or to quickly visualise the model structure of the inp object, two generic functions summary() and autoplot() for inp objects are implemented. read_rpt() creates a named list of class rpt containing summary sections from the report file of SWMM (e.g. subcatchment_runoff_summary). While both of the aforementioned functions maintain the original SWMM file structure, read_lid_rpt() interprets text files from specific LID elements. A single tibble or index-based time series data as xts object is returned accordingly. The latter option is provided because xts objects, which are introduced with the xts package and build upon R’s built-in matrix data type, efficiently represent time series data and offer index-focused data subsetting methods.