Multiple sequence alignment – Knowledge and References

Explore chapters and articles related to this topic

An Efficient Protein Structure Prediction Using Genetic Algorithm

Published in Abdel-Badeeh M. Salem, Innovative Smart Healthcare and Bio-Medical Systems, 2020

Mohamad Yousef, Tamer Abdelkader, Khaled El-Bahnasy

It is a web-based server for protein homology detection and structure prediction. Input to HHpred can be amino acid sequence or a multiple sequence alignment. It uses a novel approach that conducts a pair-wise alignments of profile hidden Markov models (HMMs). It also uses a variety of databases like SCOP (Structural Classification of Proteins), Pfam, and PDB (Protein Data Bank). HHpred performs fast and well for single domain and for multi-domain query sequences and can be used to predict functional information of a protein from homolog proteins using various sequence-based search tools like BLAST, FASTA, or PSI-BLAST [11, 12, 13].

Disease Prediction and Drug Development

View Chapter

Purchase Book

Published in Arvind Kumar Bansal, Javed Iqbal Khan, S. Kaisar Alam, Introduction to Computational Health Informatics, 2019

Arvind Kumar Bansal, Javed Iqbal Khan, S. Kaisar Alam

For example, SIFT is a multiple-sequence alignment tool to identify the variations in a conserved section. MAPP is a multiple-sequence alignment tool that decides the aberrations based upon variation in the biochemical properties of conserved domains. PANTHER uses multiple-sequence alignment and HMM-based statistical analysis to derive the variation in the conservation score. Parepro uses support vector machine to study the detrimental effect of base-pair change on the overall functionality of a protein. PhD-SNP also exploits SVM-based classifier to predict whether change in a local base-pair will cause a deleterious effect on the protein-function. Polyphen combines the sequence-based analysis with the 3D-structure matching of a protein sequence.

Mechanisms of Fibril Formation and Cellular Response

View Chapter

Purchase Book

Published in Martha Skinner, John L. Berk, Lawreen H. Connors, David C. Seldin, XIth International Symposium on Amyloidosis, 2007

Martha Skinner, John L. Berk, Lawreen H. Connors, David C. Seldin

Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22 (22): 4673-80.

Gut microbiota diversity but not composition is related to saliva cortisol stress response at the age of 2.5 months

View Article

Journal Information

Published in Stress, 2021

Anniina Keskitalo, Anna-Katariina Aatsinki, Susanna Kortesluoma, Juho Pelto, Laura Korhonen, Leo Lahti, Minna Lukkarinen, Eveliina Munukka, Hasse Karlsson, Linnea Karlsson

The raw sequences from the 16S rRNA gene sequencing were preprocessed with DADA2-pipeline (version 1.14) to infer exact amplicon sequence variants (ASVs) (Callahan et al., 2016). The reads were truncated to length 225 and reads with more than two expected errors were discarded (maxEE = 2). SILVA taxonomy database (version 138) (McLaren, 2020; Quast et al., 2013; Yilmaz et al., 2014) and RDP Naive Bayesian Classifier algorithm (Wang et al. 2007) was used for the taxonomic assignments of the ASVs. Package DECIPHER (Wright, 2016) was used for multiple sequence alignment. The phylogenetic tree was constructed with package phangorn (Schliep, 2011) with maximum likelihood approach (Felsenstein, 1981) and general time reversible model. The resulting samples had between 14k–255k (mean = 83k, sd = 57k) reads per sample.

Distinct microbial communities colonize tonsillar squamous cell carcinoma

View Article

Journal Information

Published in OncoImmunology, 2021

Angelina De Martin, Mechthild Lütge, Yves Stanossek, Céline Engetschwiler, Jovana Cupovic, Kirsty Brown, Izadora Demmer, Martina A. Broglie, Markus B. Geuking, Wolfram Jochum, Kathy D. McCoy, Sandro J. Stoeckli, Burkhard Ludewig

Cutadapt was used for quality trimming and removal of the primer and adaptor sequences.47 Further processing and analysis were run in R version 3.6.1. Trimmed reads were filtered and truncated based on quality scores and read length using the R/bioconductor package dada2 v1.14.048 with 230 bp as length for forward reads and 210 bp for reverse reads. After dereplication error rates were estimated based on a parametric error model as implemented in the dada2 package and sequence variants were generated using the ‘dada’ function. Chimeric reads were removed using the ‘removeBimeraDenovo’ function. Finally, taxonomy was assigned based on the RDP trainset 16/release 11.549 and the HOMD v.15.124 database by applying the ‚assignTaxonomy’ function from the dada2 package that utilizes the RDP Naive Bayesian Classifier described by Wang et al,50 with kmer size = 8, and 100 bootstrap replicates. In order to further infer phylogenetic relationships R/bioconductor packages msa v.1.18.051 and phangorn v.2.5.552 were used to build multiple sequence alignments and construct a phylogenetic tree. Before downstream analysis samples were rarefied to an even sampling depth of 500 reads using the R/bioconductor package phyloseq v.1.32.0.53 Samples with less than 500 reads were removed from further analysis resulting in a final dataset of 192 samples. The observed low read counts compared to typical feces samples were expected, as the DNA was isolated from tissue biopsies.

Application of bioinformatics and molecular dynamics simulation approaches for identification of fibroblast growth factor 10 analogues with potentially improved thermostability

View Article

Journal Information

Published in Growth Factors, 2020

Ali Akbar Alizadeh, Behzad Jafari, Siavoush Dastmalchi

Multiple sequence alignment was conducted as described elsewhere with some minor changes (Dvořák et al. 2017). The FGF10 protein sequence (UniProt ID: O15520-1) was used as a query for position-specific iterated BLAST (PSI-BLAST) search against non-redundant protein sequences (nr) database of NCBI. For the initial BLAST search and inclusion of the sequence in the position-specific matrix, PSI-BLAST was performed with the E-value threshold of 10−1 and 10−5, respectively. After 3 iterations of PSI-BLAST, the sequences were collected and clustered by USEARCH software (Edgar 2010) at the 90% identity threshold. From the clustered sequences, two of the datasets which contained protein sequences of FGF family were aligned with MUSCLE program (Edgar 2004). The alignment was manually refined, where all of incomplete or diverged sequences were removed by BioEdit program (Hall 1999). The final alignment dataset containing 81 sequences was used to estimate the level of conservation of individual sites within the FGF10 related proteins. The empirical Bayesian method and Whelan and Goldman (WAG) model of evolutions were used to calculate the relative evolutionary rates for individual positions by Rate4Site program (Pupko et al. 2002). Then ConSurf software was used to convert the evolutionary rates to conservation scale (Ashkenazy et al. 2016). Chimaera software (Pettersen et al. 2004) was used to visualise the protein coloured with ConSurf scores.