Explore chapters and articles related to this topic
Data Curation Challenges for Artificial Intelligence
Published in Jinzhong Yang, Gregory C. Sharp, Mark J. Gooding, Auto-Segmentation for Radiation Oncology, 2021
Ken Chang, Mishka Gidwani, Jay B. Patel, Matthew D. Li, Jayashree Kalpathy-Cramer
The choice of a well-defined sample population is important when beginning data curation. In order to obtain primary medical data of a pathology of interest, an authorized user must access the electronic medical record (EMR) with the approval of an institutional review board (IRB). This may not be the researcher themselves. Therefore, this requires clear communication and definitions of the cohort of interest. The authorized user further needs to be able to search and harvest data from the EMR, which is complicated by heterogeneously tagged data and differing formats. When medical data is downloaded and saved, often in an anonymized fashion, relevant information may be lost, such as prior treatments, comorbidities, and genetic characteristics, all of which may have influence data analysis. Importantly, the DICOM header may contain useful information such as the resolution, orientation, and acquisition settings that may be stripped away during anonymization. One other aspect of data selection that should be mentioned is concept drift, that is how the classification of disease (and thus, relevant annotations) changes over time as knowledge of medicine evolves over time. In addition, there can be technology shift, where newer imaging systems replace older ones. It is important that data selection captures the most recent clinical standards in order to ensure trained algorithms can be used prospectively. As such, data selection, curation, and annotation must be a continuous process. Broadening this statement, the training and testing data should be similar. If the training data significantly predates (or differs in other ways) from the testing data, the performance of the algorithm may degrade. Additionally, it is critical that the positive and negative cases are acquired under the same conditions. If not, the algorithm may learn differences in image acquisition instead of the disease of interest, achieving deceptively high performance [39].
Stress emotion classification using optimized convolutional neural network for online transfer learning dataset
Published in Computer Methods in Biomechanics and Biomedical Engineering, 2022
G. Linda Rose, M. Punithavalli
But, the time consumption of data acquisition via sensors and conversion into specific knowledge from the stress domain to the emotion domain was high. Also, it assumes a fixed and slowly varying input distribution and cannot handle sudden concept drift in real-world applications. Hence, the major contributions to this paper are the following:First, a novel online OCNNTL (O2CNNTL) model is proposed to tackle the issue of concept drift learning. The main goal of this model is to perform online learning in the stress and emotion domains by converting knowledge from small data from these domains.Also, it considers data in the emotion domain follows an equal distribution as that in the stress domain and enhances the OCNNTL process in the stress-emotion domain by exploiting the prior knowledge that had been learned from training data in these two domains. In the O2CNNTL model, two different configurations of OTL are considered: (1) OTL on homogeneous domains of common feature space, since to cope up the concept drifting problem that exhibits in the prior works and (2) OTL across heterogeneous domains of different feature spaces by reducing the complexity when gathering the data from multiple sensors.To tackle the concept drifting data streams in homogenous domains, online classification via OCNN classifier is utilized.Moreover, the learning process on varied feature spaces is effectively handled by utilizing the co-regularization learning method for knowledge transfer that combines OCNN classifiers co-trained from different views of identical training instances to improve learning efficiency. Thus, this model can reduce time consumption for handling real-time scenarios of new data acquired sequentially.