Explore chapters and articles related to this topic
Toward Data Integration in the Era of Big Data
Published in Archana Patel, Narayan C. Debnath, Bharat Bhushan, Semantic Web Technologies, 2023
Houda EL Bouhissi, Archana Patel, Narayan C. Debnath
Let us analyze the following scenario: The Ministry of High education in Algeria manages more than 34 universities in the country and deploys big data architectures and some structures that operate with traditional databases. We suppose that the Ministry wants to integrate the activities of these universities toward operating centrally to offer better services to the students (Master application, etc.). Data integration involves the collection, storage, structuring, and combining of data to operate as a unified view. Data integration plays an important role in making it easier to exchange information and communicate across the enterprise, whether it is integrating core systems or integrating processes, administrative tasks, and databases. Data integration requires appropriate software tools that automatically gather and analyze real-time information from various online data sources. This is the place where researchers use ontologies because they represent knowledge in such a way that is understandable by both humans and machines. Ontologies describe the semantics of data and are widely used for semantic interoperability and integration.
Collection of Data
Published in Shyama Prasad Mukherjee, A Guide to Research Methodology, 2019
Data integration offers several benefits to business enterprises in building up business intelligence and, therefore, in business research. As mentioned earlier, umpteen experiments are being carried out with more or less similar objectives on the same phenomenon in medicine or bio-informatics or geo-chemistry or astro-physics or some facet of climate change in different laboratories across the globe and quite often the resulting data sets are put in the public domain. To get a unified view of the data – sometimes seemingly disparate and discordant – we need to integrate the data, taking care of possible differences in operational definitions, designs of experiment, methods of measurement or of their summarization to come up with indices that are put in the data sets in the public domain. Thus, in such situations, cleaning the different data sets should have a significant role before the ETL process is taken up.
The impact of Big Data on making evidence-based decisions
Published in Matthias Dehmer, Frank Emmert-Streib, Frontiers in Data Science, 2017
Rodica Neamtu, Caitlin Kuhlman, Ramoza Ahsan, Elke Rundensteiner
Data extraction is the process of acquiring raw data points from these diverse types of sources and relies on access to the data-storage systems. Proprietary and open-source software options are available for information integration solutions that take autonomy, heterogeneity, and dynamics of data sources into account. These include services offered by large companies such as IBM, Microsoft, and Oracle to data integration and management tools such as Talend,∗ KNIME,† and Pentaho.‡ Other tools are tailored to specific fields, such as the Ingenuity platform for genomics.§ From these data integration systems to web-based mash-ups pulling data from multiple online providers [10], the extraction of the rich variety of data required to answer today’s increasingly complex questions poses a variety of technical challenges, which go right to the heart of the Vs of Big Data.
A real-world service mashup platform based on data integration, information synthesis, and knowledge fusion
Published in Connection Science, 2021
Lei Yu, Yucong Duan, Kuan-Ching Li
Data integration means the logical or physical integration of data from different sources, formats, and characteristics to provide comprehensive data sharing. Information synthesis focuses on the acquisition, storage, analysis, and utilisation of information resources, and therefore, we can conduct the decision-making of information resources. Knowledge fusion is to merge two knowledge maps (e.g. ontologies), i.e. to fuse the description information of the same entity or concept from multiple sources. Cross-modal technology means that some data exist in different forms, but they all describe the same thing or event. In information retrieval and processing, we often need more data from other patterns to enrich our knowledge of the same thing or event. At the same time, we need to cross-modal technology to retrieve different modal data. Similarly, cross-dimensional technology searches data in different dimensions by unifying the range of eigenvalues.
A data mining based approach for process identification using historical data
Published in International Journal of Modelling and Simulation, 2022
Ridouane Oulhiq, Khalid Benjelloun, Yassine Kali, Maarouf Saad
Data integration is the process of merging and consolidating data from different sources. In the process industry, data integration may concern the combination of both automatic and manual data. From operators’ reports to data stored in historian servers. Another possible use of data integration is when data is recorded from redundant sensors. For example, in the case study, two sensors are used to measure the value of a flowrate. The sensors are used one at a time. Thus, the maximum value of the two sensors was used as the real value of the measured parameter. Hereafter, the collected formatted and integrated data is denoted , a database of vectors.
Digital twin-driven product design framework
Published in International Journal of Production Research, 2019
Fei Tao, Fangyuan Sui, Ang Liu, Qinglin Qi, Meng Zhang, Boyang Song, Zirong Guo, Stephen C.-Y. Lu, A. Y. C. Nee
According to the definition of DT, data from different sources, formats and characteristics must be integrated for next step. Data integration involves combining data that reside in different sources and providing users with a unified view (Lenzerini 2002).