Explore chapters and articles related to this topic
*
Published in Chad A. Mirkin, Spherical Nucleic Acids, 2020
All H. Alhasan, Alexander W. Scott, Jia J. Wu, Gang Feng, Joshua J. Meeks, C. Shad Ihaxton, Chad A. Mirlinq
Raw Scano-miR expression data were extracted from 4608 probes using GenePix Pro-6 software (Molecular Devices). Expression values below background threshold as well as abnormal probe shape index were filtered from downstream data analysis. An average of three probe replicates per miRNA target were used for expression analysis. In total, 705 human miRNAs were screened for each sample. The identities and frequencies of the expression profiles were calculated for five exclusively expressed miRNAs that were detected solely in aggressive serum samples, where frequency denotes the number of times the miRNA was detected in the serum sample divided by the number of aggressive samples. We filtered 583 miRNAs that were not expressed in all 16 samples from further expression analysis. Quantile normalization was performed on 16 samples with 167 coexpressed features. Heat maps were clustered using Pearson correlation as a distance metric and visualized using MATLAB.
Patient Stratification and Treatment Response Prediction
Published in Inna Kuperstein, Emmanuel Barillot, Computational Systems Biology Approaches in Cancer Research, 2019
Inna Kuperstein, Emmanuel Barillot
Figure 6.1 summarizes the performance of using the raw mutation profile, or its normalized version by NetNorM or NSQN (for network smoothing and quantile normalization, which refers here to the method of Hofree et al.20), for survival prediction on eight cancer types. We see that for two cancers (LUSC, HNSC), none of the methods manages to outperform a random prediction, questioning the relevance of the mutation information in this context. For OV, BRCA, KIRC and GBM, all three methods are significantly better than random, although the estimated CI remains below 0.56, and we again observe no significant difference between the raw data and the data transformed by NSQN or NetNorM. Finally, the last two cases, SKCM and LUAD, are the only ones for which we reach a median CI above 0.6. In both cases, processing the mutation data with NetNorM significantly improves performances compared to using the raw data or profiles processed with NSQN. More precisely, for LUAD the median CI increases from 0.56 for the raw data and 0.53 for NSQN to 0.62 for NetNorM. In the case of SKCM, the median CI increases from 0.48 for the raw data to 0.52 for NSQN, and to 0.61 for NetNorM. For SKCM, both NetNorM and NSQN are significantly better than the raw data (p < 0.01).
Big Data and Transcriptomics
Published in Shampa Sen, Leonid Datta, Sayak Mitra, Machine Learning and IoT, 2018
Sudharsana Sundarrajan, Sajitha Lulu, Mohanapriya Arumugam
The most common application of RNA-Seq is the estimation of gene and transcript expression. It depends on the number of reads that map to each of the transcript. Numerous statistical methods are used to quantify transcript abundance based on the read coverage. The RPKM (reads per kilo base per million mapped reads) is a widely used method to account for expression and normalized read counts with respect to overall mapped read number and gene length. Other than read coverage, sequence depth, gene length, and isoform abundance also determine the transcript abundance. Many algorithms such as RSEM, eXpress, Sailfish, kallisto, and others are developed to estimate the transcript-level expression. Once the read counts are estimated, data normalization is a crucial step in the data processing since it is important to determine accurate results of the gene expression and further analysis. Multiple parameters such as transcript size, sequencing depth and error rate, GC-content, and many more should be considered while choosing normalization methods. For example quantile normalization can improve the quality of the mRNA-Seq data. EDASeq, an R package using within and between normalization methods can decrease the GC-content.
A generic evolutionary ensemble learning framework for surface roughness prediction in manufacturing
Published in International Journal of Computer Integrated Manufacturing, 2023
Shutong Xie, Zongbao He, Chunjin Wang, Chao Liu, Xiaolong Ke
Data normalization is a common method of aligning data from different scales to a common scale, which allows for faster convergence of the model and more accurate prediction results. There are various normalization techniques, such as zero-mean normalization, logarithmic function transformation normalization and quantile normalization. In this module, Min-Max normalization is chosen to process the data to reduce the effect of different units or sizes of data features. Depending on the nature of the data, Min-Max normalization will rescale the range of the data features to be in the range of 0 to 1. Min-Max normalization is shown in Equation (1).