Explore chapters and articles related to this topic
Techniques for Making Sense of Behavior in Complex Datasets
Published in Donald L. Fisher, William J. Horrey, John D. Lee, Michael A. Regan, Handbook of Human Factors for Automated, Connected, and Intelligent Vehicles, 2020
Visualization provides a way to better understand the outcome to examine and the explanatory variables to include. While summary statistics such as means, medians, and standard deviations are important point estimates, exploration of complex datasets benefit from visualizing relationships, trends, and distributions. These visualizations can help you find interesting features of the data that would not be observed otherwise. For example, you can visualize where an autonomous vehicle has traveled over the course of the day and use color coding based on travel speed to reveal how fast the car is going in specific areas. This data can then help researchers, planners, and engineers identify commute patterns and the best locations for infrastructure changes.
How Data Science Happens
Published in Natalie M. Scala, James P. Howard, Handbook of Military and Defense Operations Research, 2020
“Microdata” are data that represent a single observation of something (Samarati, 2001). The speed of a vehicle at one point in time as it goes down the highway is microdata. And so is the age of the driver. This is different from the way you are used to seeing the data. News reports, briefings, and other presentations tend to focus on summary statistics, rather than the individual data elements. Summary statistics include percentages, means, and standard deviations. Microdata are the individual observations that compose those summary statistics and are a greater focus of data science than means and percentages.
Cleaning and wrangling data
Published in Tiffany Timbers, Trevor Campbell, Melissa Lee, Data Science, 2022
Tiffany Timbers, Trevor Campbell, Melissa Lee
As a part of many data analyses, we need to calculate a summary value for the data (a summary statistic). Examples of summary statistics we might want to calculate are the number of observations, the average/mean value for a column, the minimum value, etc. Oftentimes, this summary statistic is calculated from the values in a data frame column, or columns, as shown in Figure 3.15.
Data-driven algorithm for throughput bottleneck analysis of production systems
Published in Production & Manufacturing Research, 2018
Mukund Subramaniyan, Anders Skoogh, Hans Salomonsson, Pramod Bangalore, Maheshwaran Gopalakrishnan, Azam Sheikh Muhammad
The term ‘analytics’ is defined as the science of logical sequence of steps used to transform data into actions through analysis and insights (Liberatore & Luo, 2010). The main applications of data analytics in understanding and explaining past performance from real data are descriptive and diagnostic analytics. These are briefly discussed in the context of manufacturing. Descriptive analytics: the science of identifying what has happened and what is happening (Delen & Demirkan, 2013). It includes quantitative description of data using graphical or tabular representation, or summary statistics of data that is useful as a basis for decisions (Banerjee, Bandyopadhyay, & Acharya, 2013). Examples include average throughput, machine downtimes and machine blockage and starvation times.Diagnostic analytics: the science of identifying why something happened (Banerjee et al., 2013). Useful in identifying the causes behind performance (Shao et al., 2015) and exploratory in nature. For example, increased machine downtime can be tracked to any or all of various possible factors, such as non-availability of spare parts, worker absenteeism or increased priority of another machine.
Food Insecurity Is Associated with Diarrhea, Respiratory Illness, and Stunting but Not Underweight or Obesity in Low-Resource New Delhi Households
Published in Journal of Hunger & Environmental Nutrition, 2023
Rishika Chakraborty, M. Margaret Weigel, Khalid M Khan
The study data were analyzed using R software (v3.6, Lucent Technologies). Summary statistics were derived as number and percent for categorical variables or mean ± standard deviation (SD) for continuous variables. Household food insecurity, our main exposure (independent variable) was analyzed as a binary variable (moderate-severe food insecure vs. food secure/mild food insecure). Bivariate analyses were performed using Mann Whitney test for continuous variables and Chi square tests or Fisher’s Test for categorical variables, as appropriate. Bivariate and multiple logistic regression analysis were performed to investigate the association between HFI and the child dependent outcome variables (diarrhea, RI symptoms, BAZ and HAZ).