Explore chapters and articles related to this topic
Big Data and Transcriptomics
Published in Shampa Sen, Leonid Datta, Sayak Mitra, Machine Learning and IoT, 2018
Sudharsana Sundarrajan, Sajitha Lulu, Mohanapriya Arumugam
The raw sequence-based metrics checks the experiments at low-level as prior sequence alignments are not required. The raw sequence quality is assessed based on the Phred quality score (Q). Phred score measures the base-calling reliability from Sanger sequencing chromatograms. It is defined as Q = −10 × log10(P), where P is the probability of erroneous base calling. GC content gives the percentage of either guanine or cytosine bases in a DNA sequence. It is a simple way to measure the nucleotide composition. Read duplication is often determined by read length, transcript abundance, PCR amplification, and sequence depth.
Contrasting histories of microcystin-producing cyanobacteria in two temperate lakes as inferred from quantitative sediment DNA analyses
Published in Lake and Reservoir Management, 2019
Shinjini Pilon, Arthur Zastepa, Zofia E. Taranu, Irene Gregory-Eaves, Marianne Racine, Jules M. Blais, Alexandre J. Poulain, Frances R. Pick
Initial data trimming of the sequence data was performed using the program MOTHUR (v. 1.33.3; Schloss et al. 2009). The quality file was used to eliminate sequences with an average Phred quality score below 25. Low-quality sequences (length less than 100 and over 1000 bp) were removed, as well as sequences with ambiguous bases. Remaining sequences were then aligned using Silva reference files, and chimeras were subsequently removed using Chimera Slayer (Haas et al. 2011). Sequences representative of cyanobacteria were identified using the online Ribosomal Database Project (RDP) classifier, which is specifically designed for classifying bacterial 16S rRNA sequences (Wang et al. 2007; Cole et al. 2014). Sequences not identified as cyanobacterial by the RDP classifier were removed from further analyses. Previous studies have found non-cyanobacterial species amplified using cyanobacterial-specific primers and have similarly filtered them out of downstream analyses (Kleinteich et al. 2014).
Characterization of bacterial community structure in a hydrocarbon-contaminated tropical African soil
Published in Environmental Technology, 2018
Lateef B. Salam, Mathew O. Ilori, Olukayode O. Amund, Yee LiiMien, Hideaki Nojiri
Bases below 20 (Phred quality score) were trimmed, and all the sequence data were converted to reverse complement. In the sequence data analysis, sequence reads that are less than 200 bp were excluded. Chimera check of the sequenced clones was conducted using ss_DECIPHER v1.0.4 [29]. Sequence alignment was carried out using the inference of RNA alignments aligner tool in the Ribosomal Database Project (RDP) pipeline [30]. The nucleotide sequences of the entire 437 clones in the MWO library were submitted to the GenBank database using the Sequin software (downloaded from National Center for Biotechnology Information (NCBI)) and were assigned the accession numbers KF916697–KF917133. Taxonomic affiliations of each of the sequenced clones were determined using RDP Classifier v2.5 [31] in the RDP-II database. Phylogenetic trees were constructed from the clone library with a representative of each of the classified sequences using the neighbor-joining method within the program MEGA 6.06 (The Biodesign Institute) and bootstrapped with 100 repetitions. Alpha-diversity indices such as Shannon-Wiener Index, Chao1, Simpson’s inverse, Fisher’s Alpha, evenness and rarefaction curve for the clone library were computed using the RDP Pipeline and EstimateS v9.1.0 [32]. The rarefaction curve was plotted where the X-axis represents the number of clones (sequences) and the Y-axis represents the number of operational taxonomic unit (OTU). Sequences showing more than 97% similarity were considered to belong to the same OTU. Good’s coverage formula [1 − (n/N)] × 100 (where n is the number of single clone OTU and N is the total number of sequences for the analyzed sample) was used to evaluate the MWO library coverage.
Genomic-wide analysis approach revealed genomic similarity for environmental Mexican S. Oranienburg genomes
Published in International Journal of Environmental Health Research, 2023
J. R. Aguirre-Sanchez, I. F. Vega-Lopez, N. Castro Del Campo, J. A. medrano-Felix, J. Martínez-Urtaza, C. Chaidez-Quiroz
Reads initial quality was evaluated using FASTQC (Andrews 2017). Next, the first 20 bp were removed from each read and a 4-base wide sliding window was used to cut when the average Phred quality score per base was below 15 using Trimommatic v0.32 (Bolger et al. 2014). Reads less than 50 bp were removed from the dataset. Draft genomes were assembled de novo with the pipeline A5-miseq v20160825 (Coil et al. 2015). Resulted assemblies were selected according to genome size, N50, and scaffolds number.