Explore chapters and articles related to this topic
Next-Generation Sequencing (NGS) for Companion Diagnostics (CDx) and Precision Medicine
Published in Il-Jin Kim, Companion Diagnostics (CDx) in Precision Medicine, 2019
Il-Jin Kim, Mendez Pedro, David Jablons
The sample preparation for MinION is inexpensive because of removal of fluorophores, polymerases, and ligases.24 DNA is initially fragmented and then ligated to two adaptors, making a leader-hairpin complex (Fig. 5.3).18 Then 1D and 2D reads are sequentially passed through the nanopore, where current is passing through.18 When DNA moves into the pore, with the help of motor protein, voltage shift happens and is recorded as a k-mer sequence (Fig. 5.3).25 As no library amplification is required and 2–4 TB sequencing per two-day run is expected, very high throughput and quick sequencing is available with MinION.18
Quality Check, Processing and Alignment of High-throughput Sequencing Reads
Published in Altuna Akalin, Computational Genomics with R, 2020
After the quality check and potential pre-processing, the reads are ready to be mapped or aligned to the reference genome. This process simply finds the most probable origin of each read in the genome. Since there might be errors in sequencing and mutations in the genomes, we may not find exact matches of reads in the genomes. An important feature of the alignment algorithms is to tolerate potential mismatches between reads and the reference genome. In addition, efficient algorithms and data structures are needed for the alignment to be completed in a reasonable amount of time. Alignment methods usually create data structures to store and efficiently search the genome for matching reads. These data structures are called genome indices and creating these indices is the first step for the read alignment. Based on how indices are created, there are two major types of methods. One class of methods relies on “hash tables”, to store and search the genomes. Hash tables are simple lookup tables in which all possible k-mers point to locations in the genome. The general idea is that overlapping k-mers constructed from a read go through this lookup table. Each k-mer points to potential locations in the genome. Then, the final location for the read is obtained by optimizing the k-mer chain by their distances in the genome and in the read. This optimization process removes k-mer locations that are distant from other k-mers that map nearby each other.
Detection of Klebsiella pneumoniae human gut carriage: a comparison of culture, qPCR, and whole metagenomic sequencing methods
Published in Gut Microbes, 2022
Kenneth Lindstedt, Dorota Buczek, Torunn Pedersen, Erik Hjerde, Niclas Raffelsberger, Yutaka Suzuki, Sylvain Brisse, Kathryn Holt, Ørjan Samuelsen, Arnfinn Sundsfjord
Kp strain analysis was performed using StrainGST, part of the Strain Genome Explorer (StrainGE) toolkit.37 A custom database of KpSC genomes (n = 3604) was constructed with default k-mer size 23. The database consisted of i) all Kp genomes from refseq (NCBI) (n = 1010), downloaded on the 02/02/2022 using NCBI Genome Downloading Scripts, (https://github.com/kblin/ncbi-genome-download), ii) 484 KpSC genomes from our KpSC carriage study (303 Kp1, 134 Kp3, 31 Kp2, and 16 Kp4 genomes), and iii) 2109 KpSC genomes from the recent SPARK study (1705 Kp1, 279 Kp3, 76 Kp2, and 49 Kp4 genomes).20,74 The default lower limit for database clustering of 0.90 k-mer similarity resulted in closely related ST types co-clustered (e.g., ST11 and ST258), thus, a lower limit of 0.95 was used for final database clustering.
Novel technologies to characterize and engineer the microbiome in inflammatory bowel disease
Published in Gut Microbes, 2022
Alba Boix-Amorós, Hilary Monaco, Elisa Sambataro, Jose C. Clemente
An important caveat of these studies is that multiple software pipelines exist for identifying bacterial taxa at high resolution from metagenomic data129 and the interpretation of these results depends significantly on the specific tool used.42,130,131 While some tools construct k-mers and assemble them using de Bruijn graphs,132 others assign reads to specific genomes using a set of marker genes.39–41 While computationally less expensive, this marker-based approach depends strongly on the reference database used. Methods such as MetaPhlAn have been often utilized in IBD studies,15,46–49 and although recently expanded with the inclusion of additional genomes,39 it remains to be determined to what extent these results are limited by existing genome collections. Recently developed methods based on strain collections42 have increased potential to identify strains with high confidence and resolution, although they also suffer from limitations related to the size of the collection being used.
Improved gut microbiome recovery following drug therapy is linked to abundance and replication of probiotic strains
Published in Gut Microbes, 2022
Jamie FitzGerald, Shriram Patel, Julia Eckenberger, Eric Guillemard, Patrick Veiga, Florent Schäfer, Jens Walter, Marcus J Claesson, Muriel Derrien
After trimming or fully removing DNA sequences (“reads”) below a quality threshold of Q = 26, and purging of sequences considered “contaminant,” samples were reduced by approximately 6% to an average of 61 million reads (±27%), giving a total of 33.5 billion reads across all samples. Kraken 2 (version 2.0.8) assigned taxonomy to reads through comparison of sequence k-mer frequencies using the Ecogenomics GT Database (release 89); the abundances of identified taxa were then estimated using Bracken.51 Separately, the HUMAnN252 (version 2.8.1 pipeline) allows characterization of aligned reads to pre-computed reference databases (full chocophlan database plus viral sequences, v0.1.1; uniref90 database v. 1.1), and then compiled the tallied gene abundances to determine whether pathways were fully represented within samples (coverage of MetaCyc metabolic pathways and superpathways),53 what microbes contributed those pathways (pathway source organism), and at what frequencies those pathways were present (MetaCyc metabolic pathway abundance). The set of pathway features was subset to include only those with an assigned function for further analysis.