Explore chapters and articles related to this topic
Count Data Models
Published in Simon Washington, Matthew Karlaftis, Fred Mannering, Panagiotis Anastasopoulos, Statistical and Econometric Methods for Transportation Data Analysis, 2020
Simon Washington, Matthew Karlaftis, Fred Mannering, Panagiotis Anastasopoulos
Count data can be properly modeled by using a number of methods, the most popular of which are Poisson and negative binomial regression models. Poisson regression is the more popular of the two and is applied to a wide range of transportation count data. The Poisson distribution approximates rare-event count data, such as accident occurrence, failures in manufacturing or processing, and number of vehicles waiting in a queue. One requirement of the Poisson distribution is that the mean of the count process equals its variance. When the variance is significantly larger than the mean, the data are said to be overdispersed. There are numerous reasons for overdispersion, some of which are discussed later in this chapter. In many cases, overdispersed count data are successfully modeled using the negative binomial model.
Mixture Modelling of Discrete Data
Published in Sylvia Frühwirth-Schnatter, Gilles Celeux, Christian P. Robert, Handbook of Mixture Analysis, 2019
A very common extension of a finite mixture model for count data is the zero-inflated model. In fact, a zero-inflated model can be regarded as a finite mixture with one component degenerating at zero. The zero-inflated Poisson (ZIP) distribution, for instance, is defined as () p(y|θ,η)={η+(1−η)exp(−λ),y=0,(1−η)exp(−λ)λyy!,y=1,2,…
Total Quality Management – Adapted to Occupational Safety
Published in Michael B. Weinstein, Total Quality Safety Management and Auting, 2018
Shearer62 describes different types of control charts that are used to help analyze and control processes depending on the type of data (either count, attribute, or variable) obtained, and Krause63 describes the application of control charts to occupational safety data. Count data are data which give actual occurrences, with no time or area boundaries – such as defects on a product. Attribute data list items in a finite group, such as the number of cars with defective wheels. Variable data are data measured during a process. Variable data can be subdivided whereas count and attribute data are integral.
Corridor-level network screening and modeling of fatal and serious injury crashes on urban and suburban arterial corridors in Florida
Published in Journal of Transportation Safety & Security, 2023
John McCombs, Adrian Sandt, Haitham Al-Deek
Next, count models were developed and compared using R (R Core Team, 2022). The crash frequencies (number of crashes per year) for KABCO crashes and for only KA crashes were used as response variables in these models. Developing and comparing a KABCO and a KA model can show how the significant variables differ between the two, providing insights into potential effective safety improvements for fatal and serious injury crashes. Count data often have overdispersion, which is when the variance of the response variable is greater than its mean (Venables & Ripley, 2002). To account for overdispersion, NB regression models are often used. Under an NB distribution, the response variable has a mean and variance where is the overdispersion parameter (Venables & Ripley, 2002). Another common parametrization for this variance is where is the overdispersion parameter. R uses as the overdispersion parameter, although the HSM uses According to the HSM (2010), “the closer the overdispersion parameter is to zero, the more statistically reliable the [SPF].”
Comparing four regression techniques to explore factors governing the number of forest fires in Southeast, China
Published in Geomatics, Natural Hazards and Risk, 2021
Qianqian Cao, Lianjun Zhang, Zhangwen Su, Guangyu Wang, Shuaichao Sun, Futao Guo
However, GWPR is more challenging than a global Poisson model due to a common problem of overdispersion in spatial count data (Haining et al. 2009). Overdispersion causes the challenge due to the strict requirement of a Poisson distribution for the count response variable such that the mean is equal to the variance. Given the nature of rare events, the number (count) of forest fires usually has much larger variance than the mean (i.e., overdispersion), because the zero count tends to occur more often than higher numbers of fire events. To model the spatial count data with overdispersion, it may be more appropriate to use a negative binomial distribution instead of a Poisson distribution. Da Silva and Rodrigues (2014) proposed the geographically weighted negative binomial regression (GWNBR) for incorporating spatial count data with overdispersion. Including spatial effects into statistical models is valuable for understanding relationships geographically and identifying local ‘hot spots’ of high fire risks. In the past studies of forest fire modeling, the single model method was usually used. However, a comparison study among different regression techniques was rarely conduced.
Longitudinal jerk and celeration as measures of safety in bus rapid transit drivers in Tehran
Published in Theoretical Issues in Ergonomics Science, 2020
Bahram Khorram, A. E. af Wåhlberg, Ali Tavakoli Kashani
First, correlation coefficients were calculated to observe the variables’ correlation with the crashes and to find the one with the strongest association. Based upon those results, a model was fitted to the data to see the measure’s effects on crash frequency. Since crashes are count variables (i.e. they’re non-negative integers) they can be modelled with count models. Negative Binomial and Poisson regression models are the most popular models for count data modelling (Washington, Karlaftis, and Mannering 2010). For this study, a Negative Binomial model was used, since the variance of the crash data was greater than the mean (descriptive statistics of crash data is shown in Table 1), and such over-dispersion violates the assumptions of the Poisson count model (Coruh, Bilgic, and Tortum 2015; Miaou 1994; Washington et al. 2010). Also, the dispersion parameter was significantly greater than zero (Z = 1.683, = .046) meaning that using a Poisson regression would be incorrect. The Negative Binomial model is a modified version of Poisson model which addresses the problem of over-dispersion in data (Dereli and Erdogan 2017). Also, to investigate if there was any statistical difference between age groups regarding the number of crashes and critical jerks, Kruskal-Wallis and Mann-Whitney tests were conducted. For statistical calculations, the SPSS 21 software was used.