Explore chapters and articles related to this topic
Genetic Basis of Neuromuscular Disorders
Published in Maher Kurdi, Neuromuscular Pathology Made Easy, 2021
The classical pipeline for RNA-Seq data is generating FASTQ-format files that contain sequencing reads from the NGS platform. However, mapping RNA-Seq reads to the genome is considerably more challenging than mapping DNA sequencing reads because many reads map across splice junctions.
Quality Check, Processing and Alignment of High-throughput Sequencing Reads
Published in Altuna Akalin, Computational Genomics with R, 2020
An extension of the FASTA format is FASTQ format. This format is designed to handle base quality metrics output from sequencing machines. In this format, both the sequence and quality scores are represented as single ASCII characters. The format uses four lines for each sequence, and these four lines are stacked on top of each other in text files output by sequencing workflows. Each of the 4 lines will represent a read. Figure 7.2 shows those four lines with brief explanations for each line.
Molecular Diagnosis of Autosomal Dominant Polycystic Kidney Disease
Published in Jinghua Hu, Yong Yu, Polycystic Kidney Disease, 2019
Matthew Lanktree, Amirreza Haghighi, Xueweng Song, York Pei
Illumina provided software bcl2fastq (v2.20) is used to convert the per-cycle BCL base call files to standard sequencing output in FASTQ format. Reads are aligned to the reference human genome (GRCh37) using BWA mem (v0.7.12). Duplicate reads are marked using Picard Tools (v2.5.0). Indel realignment, base quality score recalibration and germline variant detection using HaplotypeCaller are performed using GATK 3.7 following recommended best practices. Variants are annotated using an ANNOVAR-based pipeline. ERDS (v1.1) and CNVnator (v0.3.2) call CNVs and a custom annotation and prioritization pipeline is used define rare CNVs.35 Manta (v0.29.6) is used to call structural variants.
Macrophage polarization induces endothelium-to-myofibroblast transition in chronic allograft dysfunction
Published in Renal Failure, 2023
Zeping Gui, Xiang Zhang, Qianguang Han, Zhou Hang, Ruoyun Tan, Min Gu, Zijie Wang
Raw data (raw reads) of fastq format were first processed through in-house Perl scripts. An index of the reference genome was built and paired-end clean reads were aligned. The reads numbers mapped to each gene were counted using FeatureCounts v1.5.0-p3. Subsequently, the FPKM of each gene was calculated based on the length of the gene and the read count mapped to it. Differential expression genes (DEGs) analysis of M1 and M0 groups was performed using the DESeq2 R package (1.20.0). The resulting P-values were adjusted using Benjamini and Hochberg’s approach. Genes with an adjusted P-value less than 0.05 were considered differentially expressed. The cluster profile R package was used to test the statistical enrichment of differential expression genes in KEGG pathways.
Visual Observation of Abdominal Adhesion Progression Based on an Optimized Mouse Model of Postoperative Abdominal Adhesions
Published in Journal of Investigative Surgery, 2023
Zijun Wang, Enmeng Li, Cancan Zhou, Bolun Qu, Tianli Shen, Jie Lian, Gan Li, Yiwei Ren, Yunhua Wu, Qinhong Xu, Guangbing Wei, Xuqi Li
Sequencing libraries were generated using the NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) following the manufacturer’s recommendations. Fastq format raw data (raw reads) were processed first by internal Perl scripts. High-quality clean data were used for all downstream analyses. The mapped reads of each sample were assembled by StringTie (v1.3.3b) with a reference-based approach. The read numbers mapped to each gene were counted by featureCounts. The fragments per kilobase per million mapped fragments (FPKM) of each gene was then mapped to the gene based on read counts calculated from the length of the gene. Two conditions/group (4 biological replicates per condition) were used for differential expression analysis using the DESeq2 R package (1.16.1). Using the Benjamini–Hochberg method, the resulting P values were adjusted to control for the false discovery rate. Genes with an adjusted p value <0.05 identified by DESeq2 were identified as differentially expressed. Differentially expressed genes (DEGs) were subjected to Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis using the cluster Profiler R package to correct for gene length bias. For the DEGs, protein–protein interaction network (PPIN) analysis was performed based on the STRING database. The cytoHubba plugin was used for clustering coefficient analysis to predict the key targets and subnetworks that were more closely related in the PPIN.
Determining the accuracy of next generation sequencing based copy number variation analysis in Hereditary Breast and Ovarian Cancer
Published in Expert Review of Molecular Diagnostics, 2022
Nihat Bugra Agaoglu, Busra Unal, Ozlem Akgun Dogan, Payam Zolfagharian, Pari Sharifli, Aylin Karakurt, Burak Can Senay, Tugba Kizilboga, Jale Yildiz, Gizem Dinler Doganay, Levent Doganay
The NGS raw data generated by Illumina MiSeq and NextSeq 500 is in FASTQ format, which contains quality scores of each base. All samples were analyzed in a single workflow for SNV, INDEL, and CNVs with the Sophia Genetics Data Driven Medicine (DDM) platform. Fastq DNA sequence files, with a Phred Quality Score of 30 (Q30), were automatically uploaded and immediately processed by specific algorithms and machine learning approaches. The sequences were mapped to the hg19 human reference genome, and CNV regions were then evaluated by the Sophia Genetics MUSKAT algorithm. The CNVs were identified by measuring the coverage levels of the desired regions along with samples in the same run. The CNV attributions were also classified by high or medium confidence level, high being less than 50 mapped reads, and samples that do not achieve this quality level are considered as rejected analysis.