Explore chapters and articles related to this topic
Addressing the Utilization of Popular Regression Models in Business Applications
Published in K. Hemachandran, Sayantan Khanra, Raul V. Rodriguez, Juan R. Jaramillo, Machine Learning for Business Analytics, 2023
Meganathan Kumar Satheesh, Korupalli V. Rajesh Kumar
Quantile regression, which is an extension of quantile function (conditional), is used for the estimation of a conditional-based model (Koenker & Hallock, 2001). The word quantile means a sample is divided into equal sized and sub groups, which means 25%, 50%, and 75% are the sample quantiles (Yu, Lu, & Stander, 2003) that can be used to observe extreme values of the samples (Jareño, Ferrer, & Miroslavova, 2016). Linear regression will use only the average relationship between dependent and independent variables, whereas the quantile regression will give a clear picture of the relationship between both variables by plotting quantile regression curves (Yu et al., 2003). The coefficients obtained by quantile regression will be distributed by the outliers (Jareño et al., 2016) due to the usage of a weighted sum of absolute deviation, and the estimators of quantile regression will deliver an effective performance than ordinary least square due to error terms that are not in the normal distribution (Hung, Shang, & Wang, 2010).
Multivariable Linear Regression
Published in Harry G. Perros, An Introduction to IoT Analytics, 2021
The q-quantiles are q − 1 points that divide the area under the curve of a probability distribution into q equal parts. For instance, for a normal distribution with q = 4, we have three points x1, x2, and x3 such that the area under the curve is divided into four equal parts (−∞, x1), (x1, x2), (x2, x3), and (x3 + ∞), as shown in Figure 5.4. Let F(x) be the cumulative distribution, then we have that F(x1) = 1/4, F(x2) = 2/4, and F(x3) = 3/4. Consequently, the three points x1, x2, and x3 can be obtained by inverting F(x) at the points 1/4, 2/4, and 3/4, respectively. Quantiles can be calculated in the same way as percentiles (see Section 3.5.4, Chapter 3) in the case where we only have a sample of observations, rather than a theoretical distribution. Let y1 ≤ y2 ≤ … ≤ yn be the sample sorted out in an ascending order. Then, for q = 4, the three quantiles x1, x2, and x3 are given by the value yk where k = ⌈0.25 × n⌉, ⌈0.50 × n⌉, ⌈0.75 × n⌉, respectively.
Structural status pre-warning method for operational bridge utilizing single-class support vector machine
Published in Hiroshi Yokota, Dan M. Frangopol, Bridge Maintenance, Safety, Management, Life-Cycle Sustainability and Innovations, 2021
Damage feature indicators are extracted from structural vibrating responses in SHM system. Stability of the indicator is fundamental performance requirement. This means that damage feature should maintain a stable value in the same structural damage case. In this paper, three statistical indicators are selected. They are variance, mean and quantile of the structural acceleration response respectively. The variance and mean value can be conveniently computed with numerical statistical formula. And the quantile refers to the numerical points that divide the probability distribution interval of a random variable into several parts. As shown in Figure 5, p quantile refers to the abscissa value Qp corresponding to the cumulative probability p on the probability density curve. For the daily measured acceleration record at a special test point, frequency count is carried out to analyze the whole characters of acceleration response. A total of six quantile indexes including 25% quantile, 50% quantile, 70% quantile, 80% quantile, 90% quantile and 95% quantile of the acceleration statistical results are studied in this article.
Bivariate Functional Quantile Envelopes With Application to Radiosonde Wind Data
Published in Technometrics, 2021
Our proposed method provides quantile envelope estimation for bivariate functional data. Since the quantile based method is robust against outliers, it can be used to identify potential outliers. In Section 2.2.1, we detected the bivariate outliers using extreme quantile envelopes and marked the launches as outlying launches if they had outliers at any pressure level. Here, we present some other approaches of finding functional outliers. Functional outliers can be in the form of magnitude outliers, in which the curves that are distant from the majority of curves are marked as outliers, or shape outliers, where the shape of the curve is different from the shape of the majority of curves in the sample. We suggest two distance measures that might detect some magnitude and shape outliers. If outlier detection is the only purpose rather than visualization, there exist more sophisticated outlier detection procedures in the literature (Hubert, Rousseeuw, and Segaert 2015; Rousseeuw and Hubert 2018). To illustrate the two kinds of outliers, we simulate bivariate curves from a mean-zero bivariate Gaussian process with a bivariate Matérn covariance function (as explained in Section 3.1); then, we contaminate them to simulate a magnitude outlier and a shape outlier, as shown in Figure 4. The curve in red is the shape outlier, while the magnitude outlier is shown in blue.
Flood estimation at Hathnikund Barrage, River Yamuna, India using the Peak-Over-Threshold method
Published in ISH Journal of Hydraulic Engineering, 2020
Mukesh Kumar, Mohammed Sharif, Sirajuddin Ahmed
Daily discharge data at Hathnikund spanning 37 years was used to conduct flood frequency analysis. The first step in the methodology involved in the estimation of flood magnitudes of different return levels at Hathnikund was to extract annual peaks using the traditional annual maxima approach. Using the POT approach, a data set of peaks over a chosen threshold was created. For conducting the POT analysis, R Package version 3.2.2 — a language and environment for statistical computing and graphics — has been used (R Core Team 2014). A package called POT is available in R Package to perform POT analysis. For the identification of an appropriate threshold, a number of different threshold levels were applied to the daily flow data. Statistical tests in form of p-p plots, q-q plots, probability density plots and return level plots were then used to determine the value of the threshold that resulted in the best fit to the data. The POT packages produce four plots: (1) Probability plot, (2) q-q plot, (3) probability density plot, and (4) return-level plot. A 45-degree reference line is also plotted. The quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. By a quantile, we mean the fraction (or percent) of points below the given value. In POT package, a q-q plot is a plot of the quantiles of the observed (empirical) data against the quantiles of the theoretical (model) data set. Similarly, a probability plot is produced by the POT package.
Empirical Dynamic Quantiles for Visualization of High-Dimensional Time Series
Published in Technometrics, 2019
Daniel Peña, Ruey S. Tsay, Ruben Zamar
Quantiles are a useful tool to describe properties of the underlying distribution of the data. Given a random sample of a scalar random variable X with empirical cumulative distribution function (CDF) the empirical pth quantile is defined asand it is well known—see, for instance, Ferguson (1967)—that this quantile can be computed by