Explore chapters and articles related to this topic
Statistical and Graphical Foundation
Published in Terry A. Slocum, Robert B. McMaster, Fritz C. Kessler, Hugh H. Howard, Thematic Cartography and Geovisualization, 2022
Terry A. Slocum, Robert B. McMaster, Fritz C. Kessler, Hugh H. Howard
Finally, we need to recognize that regression techniques can be applied both globally and locally. By globally, we mean that a regression equation is applied uniformly throughout a geographic region. In contrast, local regression techniques consider the notion that a single model might not be appropriate for an entire region, as local variation might necessitate different models within subregions. As with modifications due to spatial autocorrelation, a full discussion of this issue is beyond the scope of this book. For an overview of the local–global issue applied to regression and other statistical methods, see Fotheringham (1997); a detailed discussion of the related technique of geographically weighted regression (GWR) can be found in Fotheringham et al. (2002).
Stochastic reservoir operation with data-driven modeling and inflow forecasting
Published in Journal of Applied Water Engineering and Research, 2022
Raul Fontes Santana, Alcigeimes B. Celeste
The k-nearest neighbors (kNN) classification process is the most popular form of instance-based learning. For application in the ISO procedure, the optimal data generated by the deterministic model (PFDO; Section 2.1.2) is used to build a training database. From a new instance of input data (initial storage and inflow), the 1NN model (when k = 1) chooses as allocation the corresponding value in the training data where the distance between the vector [storage, inflow] queried and the vector [storage, inflow] in the training data is minimal. The kNN model chooses the weighted average value of allocations from the k-nearest neighbors (the closer, the higher the weight). Choosing the k parameter is very important, as each different k can result in different classification labels (Kuang and Zhao 2009). Locally weighted regression is complementary to kNN – instead of constructing a global regression model for the entire domain of the function, for each point of interest a local regression model is created based on the k data neighboring the query point.
Understanding public perspectives on fracking in the United States using social media big data
Published in Annals of GIS, 2023
Xi Gong, Yujian Lu, Daniel Beene, Ziqi Li, Tao Hu, Melinda Morgan, Yan Lin
Bandwidth is the range (distance or number of nearest neighbours) over which data is borrowed in each local regression calculation; it measures the spatial scale of the relationship between an independent variable and an dependent variable (Li et al. 2020; Oshan et al. 2019). In other words, it can provide intuitive interpretations of the spatial scale of each variable’s underlying data generating process (Li et al. 2020; Oshan et al. 2019). Table 4 shows the optimal bandwidths with 95% confidence intervals for the aforementioned 12 explanatory variables, where the numbers represent the amount of nearest counties to a regression focus (a county) that have been borrowed and down-weighted according to the distance in local regression. The female-to-male ratio, the unemployment rate, the median household income, percentages of Hispanics, non-Hispanic African Americans, people aged 18 to 49, and people without a bachelor’s degree all have a bandwidth of between 250 and 275. Considering the total number of counties included in this study is 276, the influence of these seven variables on the negative tweet percentage towards fracking is virtually the same in each county across the country. Therefore, these relationships demonstrate more global stationarity, as shown in Figure 4(a,b,d,e–g,j). The percentage of people aged 50 or older, and tweet density per 100,000 people in US counties have bandwidths around 100, which indicate more locally varying relationships with the negative tweet percentage. However, the patterns are not clear as coefficients associated with these two variables are insignificant at the level of 0.05 across the country (Figure 4(c,l)). The other three variables have moderate bandwidths around 150 to 200, which show relative stationary effects on negative tweet percentage within regions and varying effects across regions. For example, the ratio of Democratic to Republican voters (Figure 4(k)) and sum of fracking activities within 100 km buffer (Figure 4(h)) shows significant associations in the Southern and Eastern regions respectively, but the relationship does not persist in other regions.