Explore chapters and articles related to this topic
Research on Big Data
Published in Jay Gohil, Manan Shah, Application of Big Data in Petroleum Streams, 2022
Reservoir modelling has seen tremendous amount of research work in recent years too, especially in unconventional oil and gas resources while also proving beneficial to production operations. For instance, a study by Lin [8] provided insights on principles of big data algorithms and applications for unconventional O&G resources, through a workflow of three levels, namely, computational domains & numerical schemes, fitting of pressure valves and domain & boundary conditions convergence. Another study by Chelmis et al. [9] provided a semi-automatic and semantic assistance approach for manual data curation of smart oil fields with the help of big data, which solved the problem of managing and storing humongous amounts of user-generated unstructured data with no proper metadata. Moreover, the research work mentioned until now in this section is summarized in Table 9.1.
Big data in radiation oncology: Opportunities and challenges
Published in Jun Deng, Lei Xing, Big Data in Radiation Oncology, 2019
Several national and international data curation initiatives are underway. In the United States, the Radiation Therapy Oncology Group (RTOG), National Surgical Adjuvant Breast and Bowel Project (NSABP), and Gynecologic Oncology Group (GOG) have already created a cloud to gather radiation oncology data,55 along the existing platforms: Radiation Oncology Incident Learning System (RO-ILS),56 the National Radiation Oncology Registry,57 and John Hopkins’ Oncospace (https://oncospace.radonc.jhmi.edu/). The National Institutes of Health Personalized Medicine Initiative will gather data from a million patients.58 Finally, American Society of Clinical Oncology (ASCO) has created its own initiative, CancerLinQ (Cancer Learning Intelligence Network for Quality; http://www.cancerlinq.org) that American Society for Radiation Oncology (ASTRO) joined in 2017. In Europe, the German Cancer Consortium (DKTK), the Eurocan Platform,59 and the EuroCAT60 have also been created with the same goals.
Automated Processing of Big Data in Sleep Medicine
Published in Ervin Sejdić, Tiago H. Falk, Signal Processing and Machine Learning for Biomedical Big Data, 2018
Sara Mariani, Shaun M. Purcell, Susan Redline
Ontologies can facilitate the use of standardized information models for harmonizing and integrating varied sleep data. Specific to sleep medicine, and as prototyped by the NSRR, a Sleep Domain Ontology (SDO) [130] contains standardized terms representing sleep disorders, medications, clinical findings, and physiological phenotypes, as well as terms representing procedures and devices used in sleep medicine, such as PSGs. SDOs can be used for data curation, federated data integration, visual query interfaces, analysis, and standardizing metadata parameters associated with sleep data collection. While the International Classification of Sleep Disorders has proposed some standards, there is not yet a nationally endorsed SDO [11]. There are several NIH-supported data warehouse systems, such as BIRN [131] and i2b2/SHRINE [132]; however, these resources do not specifically address sleep research needs including PSG data management. Adoption of SDOs by professional societies may further their use and utility across the sleep medicine community.
Approaches and tools for user-driven provenance and data quality information in spatial data infrastructures
Published in International Journal of Digital Earth, 2023
Julia Fischer, Lukas Egli, Juliane Groth, Caterina Barrasso, Steffen Ehrmann, Heiko Figgemeier, Christin Henzen, Carsten Meyer, Ralph Müller-Pfefferkorn, Arne Rümmler, Michael Wagner, Lars Bernard, Ralf Seppelt
While there is a good overview of the major challenges related to the availability and accessibility of data quality and provenance information in fitness-for-use assessments, a comprehensive synthesis of needs and most importantly specific ways to overcome them integrating the perspectives of data users, data producers, and software developers are missing. Since earth system sciences are a strongly data-driven and data-producing research field, researchers need clear approaches and guidelines to foster the curation of research data. Although such approaches partly exist an overview is needed to facilitate their selection and use. Data curation provides a methodological and technological basis for improving data management, data quality, and the (re-)usability of datasets, including transparency and reproducibility (Freitas and Curry 2016), throughout the full life cycle of a data product.
A review on machine learning methods for in silico toxicity prediction
Published in Journal of Environmental Science and Health, Part C, 2018
Gabriel Idakwo, Joseph Luttrell, Minjun Chen, Huixiao Hong, Zhaoxian Zhou, Ping Gong, Chaoyang Zhang
The quality of data used to train a model is considered more important than the choice of algorithm used. Fourches et al.14 designed a workflow that can aid reproducibility of the data cleaning process. However, the process of cleaning and standardizing compounds prior to feature generation remains unclear and irreproducible in many published works. Details of the data curation process should be well documented. Molecular descriptors play an integral role in modeling the relationship between structure and activity. The choice of descriptors and the selection/extraction methods employed to keep only useful explanatory features for modeling was discussed. Although thousands of molecular descriptors exist, there is room to develop more informative and explanatory descriptors for molecules. Several methods, each with its advantages and disadvantages, have been proposed for dealing with imbalanced data. Such methods can prevent the development of biased or over-trained models. Defining and dealing with activity landscapes remains an active research area, and no definitive work has been reported on the effect of removing activity cliff generators. One solution could be the use of ensemble learning methods to account for different regions of the chemical space being modeled.88