Explore chapters and articles related to this topic
Using Molecular Methods to Identify and Monitor Xenobiotic-Degrading Genes for Bioremediation
Published in Ederio Dino Bidoia, Renato Nallin Montagnolli, Biodegradation, Pollutants and Bioremediation Principles, 2021
Edward Fuller, Victor Castro-Gutiérrez, Juan Carlos Cambronero-Heinrichs, Carlos E. Rodríguez-Rodríguez
With the reduced cost of next generation sequencing, RNA-Seq provides a very powerful method in the detection and quantification of RNA within a given sample The method relies on the isolation of RNA, followed by the conversion into the more stable cDNA. Once converted, the sequences can be subjected to nucleotide sequencing (Stark et al. 2019). By quantifying the change in expression levels in different conditions, such as with and without the xenobiotic substrate, the expression profile can determine inducible genes and potential catabolic operons. This approach can be used on single bacterial strains, where the ability to culture them is available (Wang et al. 2018, Levy-Booth et al. 2019). However, a metagenomic approach using RNA-Seq can also be undertaken. By detecting and quantifying gene expression in the environment, inducible genes can be identified without the need to isolate and culture the xenobiotic-degrading microorganism(s). This is one of RNA-Seq’s greatest strengths and allows for the identification of inducible novel catabolic genes that may have been overlooked using other methods (Culligan et al. 2014).
Big Data and Transcriptomics
Published in Shampa Sen, Leonid Datta, Sayak Mitra, Machine Learning and IoT, 2018
Sudharsana Sundarrajan, Sajitha Lulu, Mohanapriya Arumugam
Studies on individual transcripts started in early 1970s. In 1980s, low-throughput sequencing methods, namely Sanger sequencing, were used to sequence random transcripts called expressed sequence tags (ESTs). With the advent of high-throughput methods, such as sequencing by synthesis, Sanger sequencing was taken over. Few other methods, which were laborious and captured small subsection of the transcriptome, such as northern blot and reverse transcriptase quantitative PCR (RT-qPCR) were also available before the evolution of current transcriptomics techniques. Serial analysis of gene expression (SAGE) was one of the early transcriptome techniques, which worked based on the Sanger sequencing method by concatenating random transcript fragments. Later on, during early 2000s, the contemporary microarray and RNA-Seq techniques were developed. In late 2000s, a variety of microarrays were produced, which were utilized to measure the expression of known genes of various model organisms. The advancement in scientific technologies allowed the measurement of more genes in a single array. The improvement in the fluorescence detection also increased the measurement accuracy and sensitivity of low abundance transcripts. RNA-Seq refers to the sequencing of cDNAs, and the number of counts of each transcript determines their abundance. In 2004, massively parallel signature sequencing (MPSS) utilized complex hybridization series to sequence 16–20 bp sequences. After 2008, RNA-Seq gained popularity and allowed the measurement of 109 transcript sequences.
Bacterial Small RNA and Nanotechnology
Published in Sunil K. Deshmukh, Mandira Kochar, Pawan Kaur, Pushplata Prasad Singh, Nanotechnology in Agriculture and Environmental Science, 2023
Next-generation transcriptome sequencing or RNA-Seq, is a common method to analyse gene expression, uncover novel RNA species, conduct analysis of unknown genes and novel transcript isoforms, thus carrying out the complete transcriptome profiling of the biological sample (Hrdlickova et al., 2017). It is one of the best methods to identify the presence of non-coding sRNAs in their genome. For better output and sRNA enrichment, the processed RNAs (rRNA and tRNA), which represent the majority in the total RNA, can be degraded and removed (Liu et al., 2009; Yoder-Himes et al., 2009) before proceeding to RNA-Seq. Fan et al. (2015) used the RNA-Seq approach to identify a large number of regulatory RNAs in gram-positive bacteria, Bacillus amyloliquefaciens, which included 53 cis-encoded riboswitches, 136 anti-sense RNAs, and 86 potential sRNA candidates, most of which were validated by northern-blotting. Seven potential novel, intergenic sRNAs have been identified using RNA-Seq in P. aeruginosa (Heera et al., 2015). In another study, RNA-Seq was used to uncover the antisense RNAs and new putative sRNAs in human pathogen, Streptococcus pyogenes (Le Rhun et al., 2015). Liu et al. (2009) improved the specificity of sequencing strategy by depleting the starting RNA sample of tRNA and rRNAs, and in addition to the 20 known sRNAs in V. cholerae, 500 new putative intergenic and 127 antisense sRNAs were revealed. An improved RNA sequencing strategy known as differential RNA sequencing (dRNA-seq) has been developed, where appropriate exonucleases [5’ monophosphate-dependent terminator exonuclease (TEX)] are used to degrade processed RNAs (Podkaminski et al., 2014). This approach has been used to map the primary transcriptome and identify sRNAs in diverse species. More than 60 sRNAs in gram-negative human pathogen, Helicobacter pylori were discovered using this strategy (Sharma et al., 2010). In the model cyanobacterium, Synechocystis sp. PCC6803, this approach was used to establish genome-wide map of 3,527 transcriptional start sites (TSS). Orphan TSS located in the intergenic regions subsequently led to the prediction 314 non-coding RNAs in this organism Nitschke et al., 2011).
RNA-Seq analysis of Phanerochaete sordida YK-624 degrades neonicotinoid pesticide acetamiprid
Published in Environmental Technology, 2023
Jianqiao Wang, Yilin Liu, Ru Yin, Nana Wang, Tangfu Xiao, Hirofumi Hirai
RNA sequencing (RNA-Seq) has gradually improved in the last ten years [24,25]. The transcriptome is the link between genome and proteome information and gene biological function. Therefore, the transcriptome has become an important tool in molecular biology and plays an important role in understanding genomic function. RNA-Seq is most commonly employed to analyse differentially expressed genes (DEGs). In the present study, we utilised RNA-Seq to explore the DEGs of the white-rot fungus P. sordida YK-624 under ACE-degrading conditions and in the absence of ACE. The findings of this study may help to determine the functional genes involved in the degradation of ACE by white-rot fungi.
Novel hybrid DCNN–SVM model for classifying RNA-sequencing gene expression data*
Published in Journal of Information and Telecommunication, 2019
Phuoc-Hai Huynh, Van-Hoa Nguyen, Thanh-Nghi Do
Classifying RNA-Seq gene expression data has provided useful information for diagnosing cancer and drug discovery (Li et al., 2017). Gene expression can be simply defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA-Seq technology has become a prevalent approach to quantify gene expression that is expected to gain better insights to a number of biological and biomedical questions, compared to DNA microarray technology (Johnson, Dhroso, Hughes, & Korkin, 2018). The processing of RNA-Seq gene expression includes many stages to obtain data matrix (RNASeqV2 level 3 expression data) (MIT and Harvard, 2016). The gene expression data matrix type contains gene expression values taken under given sampling conditions. In this data structure, each row represents a gene expression profile and each column corresponds to an RNA-Seq experiment. A characteristic of gene expression data is that the number of variables (genes) n far exceeds the number of samples m, commonly known as ‘curse of dimensionality’ issue (Clarke et al., 2008). The issue leads to statistical and analytical challenges and conventional statistical methods which give improper result due to the high dimension of gene expression data with a limited number of patterns (Köppen, 2000). During the past decade, many algorithms have been used to classify gene expression data including support vector machines (SVM) (Furey et al., 2000), neural network (Khan et al., 2001), k nearest neighbours (Li, Weinberg, Darden, & Pedersen, 2001), decision trees (Netto et al., 2010), random forests (Díaz-Uriarte & De Andres, 2006), random forests of oblique decision trees (Do, Lenca, Lallich, & Pham, 2010), bagging and boosting (Dettling, 2004; Tan & Gilbert, 2003), and random ensemble oblique decision stumps (Huynh, Nguyen, & Do, 2018b). Although there have been many studies for classifying gene expression data, there remains a critical need for better accuracy improvement. DCNN and SVM are two successful approaches for pattern recognition (Christopher, 2016). On the one hand, the SVM has several advantages for classifying high-dimensional data. The main idea of this algorithm is to maximize the margin and to minimize an upper bound on the generalization error (Vapnik, 1995). SVM resolves this issue by convex optimization problem to find globally optimal solutions. Although SVM often outperforms other algorithms, it is shallow architecture models that has a single adjustable layer. On the other hand, deep convolutional neural network (DCNN) is deep architecture that learns latent representations. Tradition DCNN architecture uses the multinomial logistic regression (softmax activation) at the top layer for classifying. In fact, SVM is widely used alternative to softmax for classifying (Boser, Guyon, & Vapnik, 1992) to improve classification performance.