Explore chapters and articles related to this topic
Genomic Informatics in the Healthcare System
Published in Salvatore Volpe, Health Informatics, 2022
As mentioned earlier, the NGS technologies are designed to yield millions of relatively short sequence reads (50–400 bps) redundantly overlapping a specified genomic region of interest (targeted sequencing) or potentially extending across the whole genome (whole-genome sequencing). The portion of the targeted genome for which reads are actually generated during sequencing represents the extent of coverage provided by a sequencing run. A typical NGS bioinformatics pipeline includes a series of complex and computationally expensive data analysis processes that derive a list of genomic alterations from raw NGS signal output followed by signal processing and alignment against a reference genome (Figure 21.1). The pipeline usually begins with proprietary, platform-specific algorithms generating sequential base calls from primary fluorescent, chemiluminescent, or electrical current signals. Each of the predicted nucleotide bases is assigned a quality score (Phred-like score or Q score), which reflects the degree of statistical confidence that the base call is correct. The sequence reads generated during this process are stored in one of the several file formats (FASTQ, XSEQ, unaligned BAM, or FASTA) with or without the base quality score information. Because of the platform-specific and proprietary nature of this portion of the pipeline, these Q scores are not comparable across different sequencing systems.
Human Gut Microbiota–Transplanted Gn Pig Models for HRV Infection
Published in Lijuan Yuan, Vaccine Efficacy Evaluation, 2022
Sequencing reads were processed with Quantitative Insights into Microbial Ecology (QIIME) (Caporaso et al., 2010). High-quality reads with Phred quality score ≥20 (corresponding to a sequencing error rate ≤0.01) were clustered into operational taxonomic units (OTUs) with the program UCLUST (Edgar, 2010). Chimeric sequences were identified with CHIMERASLAYER (Haas et al., 2011) and removed from further analysis. Bacterial taxonomy was assigned by using a naïve Bayes classifier (Wang et al., 2007b) against reference databases and bacterial taxonomy maps at Greengenes (McDonald et al., 2012). A phylogenetic tree was constructed (Price et al., 2010) from PyNAST-aligned sequences representing each OTU. Principle coordinate analysis on stool samples was based on UniFrac distances (Lozupone and Knight, 2005). Distance-based redundancy analysis for the effect of HRV on community structures was performed with the Vegan package (Vegan: Community Ecology Package, 2013). Shannon and Simpson diversity indices and a rank abundance curve were both generated with QIIME.
Quality Check, Processing and Alignment of High-throughput Sequencing Reads
Published in Altuna Akalin, Computational Genomics with R, 2020
Line 1 begins with the ‘@’ character and is followed by a sequence identifier and an optional description. This line is utilized by the sequencing technology, and usually contains specific information for the technology. It can contain flow cell IDs, lane numbers, and information on read pairs. Line 2 is the sequence letters. Line 3 begins with a ‘+’ character; it marks the end of the sequence and is optionally followed by the same sequence identifier again in line 1. Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence. Each letter corresponds to a quality score. Although there might be different definitions of the quality scores, a de facto standard in the field is to use “Phred quality scores”. These scores represent the likelihood of the base being called wrong. Formally, , where e is the probability that the base is called wrong. Since the score is in minus log scale, the higher the score, the more unlikely that the base is called wrong.
RNA-sequencing revealed apple pomace ameliorates expression of genes in the hypothalamus associated with neurodegeneration in female rats fed a Western diet during adolescence to adulthood
Published in Nutritional Neuroscience, 2023
Ayad A. Alawadi, Vagner A. Benedito, R. Chris Skinner, Derek C. Warren, Casey Showman, Janet C. Tou
All sample paired-reads FASTQ files were quality checked using Fastqc (version 0.11.5). This step produced sequencing files with an average per base Phred score > 35, which indicates a high-quality base call. VICE DESeq2 pipeline was used to analyze gene expression through the Cyverse Discovery Environment (de.cyverse.org) [16]. RMTA v2.6.3 and Rstudio-DESeq2 were used to determine gene expression according to the pipeline workflow. Rattus norvegicus (GCF_000001895.5/Rnor_6.0) genome assembly was used as a reference [17]. Gene annotation (Rattus norvegicus) Rnor_6.0.99.gtf.gz was obtained from the Ensembl website (useast.ensembl.org) and used to annotate the genome assembly. HISAT2 was used to map sample reads to the genome assembly, according to the pipeline, producing feature counts [16]. Data characterization was performed using principal component analysis (PCA) in pcaExplorer [18]. Gene expression profiles for each dietary treatment group were clustered using Pearson correlation coefficients. The proportion of total variance explained by the first principal component (PC1) and second principal component (PC2) was 24.32% and 16.06%, respectively.
Females with impaired ovarian function could be vulnerable to environmental pollutants: identification via next-generation sequencing of the vaginal microbiome
Published in Journal of Obstetrics and Gynaecology, 2022
Seongmin Kim, Se Hee Lee, Kyung Jin Min, Sanghoon Lee, Jin Hwa Hong, Jae Yun Song, Jae Kwan Lee, Nak Woo Lee, Eunil Lee
A total of 937 species were identified in the analysis of the vaginal microbiome, and 24 dominant species were selected for further analysis. The samples had 56,338–147,588 read counts, and >89.06% had a PHRED (Q) score above 30, indicating a 1:1000 probability of an incorrect base call. Taxonomic abundance showed that lactobacilli and enterococci were the main taxa in the normal group, but that the species tended to be more heterogeneous in the IO group (Figure 1). A heat map was drawn using the taxonomy information, which showed that the two groups were not different from each other (Figure 2). The proportions of each species were compared between the two groups, and only two species (Propionibacterium acnes and Prevotella copri) were significantly more abundant in the POI group (p = .005 and p = .002, respectively).
Determining the accuracy of next generation sequencing based copy number variation analysis in Hereditary Breast and Ovarian Cancer
Published in Expert Review of Molecular Diagnostics, 2022
Nihat Bugra Agaoglu, Busra Unal, Ozlem Akgun Dogan, Payam Zolfagharian, Pari Sharifli, Aylin Karakurt, Burak Can Senay, Tugba Kizilboga, Jale Yildiz, Gizem Dinler Doganay, Levent Doganay
The NGS raw data generated by Illumina MiSeq and NextSeq 500 is in FASTQ format, which contains quality scores of each base. All samples were analyzed in a single workflow for SNV, INDEL, and CNVs with the Sophia Genetics Data Driven Medicine (DDM) platform. Fastq DNA sequence files, with a Phred Quality Score of 30 (Q30), were automatically uploaded and immediately processed by specific algorithms and machine learning approaches. The sequences were mapped to the hg19 human reference genome, and CNV regions were then evaluated by the Sophia Genetics MUSKAT algorithm. The CNVs were identified by measuring the coverage levels of the desired regions along with samples in the same run. The CNV attributions were also classified by high or medium confidence level, high being less than 50 mapped reads, and samples that do not achieve this quality level are considered as rejected analysis.