Explore chapters and articles related to this topic
It's All about the Data
Published in James Luke, David Porter, Padmanabhan Santhanam, Beyond Algorithms, 2022
James Luke, David Porter, Padmanabhan Santhanam
With so much data available, it’s all too easy for the Algorithm Addict to fall off the wagon! Surely, all you need to do is gather as much data as possible and the algorithm will do the rest? No! It’s time to get back into rehab and remember that there is no magic algorithm and just throwing algorithms at data never delivers value. To successfully deliver AI solutions, you are going to need to learn a little more about data and how to deal with it. It really is all about the data, and in AI project, you can expect to invest up to 80% of the available resources just on accessing, understanding, cleansing, preparing and managing the data. Remember, once your model is trained, you are going to have to include most or all of the data transformations you put your training data through into your live data application before you can use it. Spoiler alert: if your data transformation code needed to look up massive tables of reference data to clean/augment the transaction data, so will your live model. If you need the application that model sits in to have sub second responses, you may want to rethink your business case.
Common attributes of analytics projects
Published in Ondřej Bothe, Ondřej Kubera, David Bednář, Martin Potančok, Ota Novotný, Data Analytics Initiatives, 2022
Ondřej Bothe, Ondřej Kubera, David Bednář, Martin Potančok, Ota Novotný
If we go higher in IT maturity, it can easily happen that the project is being delivered not just on one platform but across multiple ones. For example, an ingest platform used for loading data, a data storage platform used for storing data, an integration platform used for data extraction and transformation, a master data management platform used for connection to reference data etc. Despite the many benefits of the platforms described above, it significantly increases the complexity of the delivery and support as it requires alignment across many platforms and teams. Such an approach also typically requires multiple specific roles (business analyst, data modeller, ETL developer, report developer, solution architect...), and we will discuss it more in the WoW chapter (3.2 Ways of working - WoW).
Multiscale Habitat Mapping and Monitoring Using Satellite Data and Advanced Image Analysis Techniques
Published in Prasad S. Thenkabail, Land Resources Monitoring, Modeling, and Mapping with Remote Sensing, 2015
Stefan Lang, Christina Corbane, Palma Blonda, Kyle Pipkins, Michael Förster
Within BIO_SOS, a preoperational knowledge-driven opensource three-stage processing system was developed capable of combining multiseasonal EO data (HR and VHR) and in situ data (including ancillary information and in situ measurements) and for subsequent translation of LC to habitat maps. is system, named EO data for habitat monitoring (Lucas et al. 2014), is based on expert knowledge elicited from botanists, ecologists, remote-sensing experts, and management authorities in order to monitor large and not accessible areas without any ground reference data. Ontologies are used to formally represent the expert knowledge (Arvor et al. 2013). e FAO-LCCS and the GHC taxonomies, from which HabDir Annex I habitats can be de¤ned, were used for describing LC/ LU and habitat categories (Tomaselli et al. 2013) and for subsequent translation to habitats. In addition, BIO_SOS focused on the development of a modeling framework for (1) ¤lling the gap between LC/LU and habitat domains (Blonda et al. 2012a) by coupling FAO-LCCS taxonomy with GHCs and EUNIS classi¤cation schemes and providing a reliable cost-e¦ective knowledge-driven long-term biodiversity monitoring scheme of protected areas and their surrounds (Tomaselli et al. 2013, Adamo et al. 2014, Kosmidou et al. 2014, Lucas et al. 2014); (2) analyzing appropriate spatial and temporal scales of EO data
A methodology to boost data-driven decision-making process for a modern maintenance practice
Published in Production Planning & Control, 2021
Adalberto Polenghi, Irene Roda, Marco Macchi, Alessandro Pozzetti
The first phase of the methodology prescribes to development of a reference data model grounded on theoretical contributions. Data models are selected since they offer modelling flexibility, being they capable of formalizing varied concepts (or entities), like an information system, a stakeholder, or a process step, also linked through semantic relationships (West 2011). The output of this phase is a reference data model able to formalize the different facets of the business process of interest, including data and information, relevant analyses to be performed and final decisions to be taken. To this end, theoretical contributions according to the domain of interest, i.e. the targeted business process, need to be investigated. Relevant contributions include scientific literature, already available data models and taxonomies, and domain experts’ knowledge. Also, industrial standards are worth to be looked at since they enable an alignment with standardized, agreed-upon, and shared best practices. Once gathered the relevant knowledge, two activities (to be intended as technological rules in DSR) follow:
Vehicle localization by dynamic programming from altitude and yaw rate time series acquired by MEMS sensor
Published in SICE Journal of Control, Measurement, and System Integration, 2021
Figure 11 shows a velocity profile of the reference vehicle obtained by wheel pulse data when it stopped at an intersection shortly after it started from the start point. If we set a threshold value such as for deciding whether the vehicle is stopped, it can be seen that it stopped for 25 s from 23 to at a traffic light. Thus, unwanted stops in the reference data can be extracted and the reference data can be modified.
An evaluation of data completeness of VGI through geometric similarity assessment
Published in International Journal of Image and Data Fusion, 2018
Alireza Chehreghan, Rahim Ali Abbaspour
Considering the subject of this study, the completeness measures the comprehensiveness of a data set. This component not only reports the amount of the missing data, but also encompasses the amount of the data to be excluded. Generally, there are two types of completeness measurement methods: unit-based and object-based. In the unit-based methods, the comparison of the reference data and VGI is achieved by measuring the total length, area, or the number of the objects in OSM in relation to the reference data set (Zhang and Malczewski 2017). Numerous studies (Haklay et al. 2010, Zielstra and Zipf 2010, Neis et al. 2011, Jokar Arsanjani et al. 2013a, Forghani and Delavar 2014, Hochmair et al. 2015, Jokar Arsanjani and Vaz 2015, Mashhadi et al. 2015) have used these methods to assess the completeness of the OSM data. On the other hand, in the object-based methods, the corresponding objects in the OSM data set and the reference are identified using the matching methods and measuring the completeness. Yet, not many studies (Ludwig et al. 2011, Koukoletsos et al. 2012, Abdolmajidi et al. 2015) have used these methods to measure the completeness of the OSM data. In fact, the value calculated through the object-based methods is closer to the concept of completeness because in these methods the corresponding objects are identified in the OSM data set and the reference can be used to identify the missing and excluding data. However, the number of the matching pairs identified by considering the matching methods is considerably dependent on the criteria used in these methods. Completeness in the OSM data set was measured by Koukoletsos et al. (2012) and Abdolmajidi et al. (2015) through providing a matching approach based on the length and orientation criteria, while Ludwig et al. (2011) used the buffer distance and identified the corresponding objects.