Explore chapters and articles related to this topic
Bioinformatics Tools and Software in Clinical Research
Published in Rishabha Malviya, Pramod Kumar Sharma, Sonali Sundram, Rajesh Kumar Dhanaraj, Balamurugan Balusamy, Bioinformatics Tools and Big Data Analytics for Patient Care, 2023
Deepika Bairagee, Nitu Singh, Neelam Jain, Urvashi Sharma
Sean Eddy created HMMER, a free and widely used software suite for sequence analysis. Its primary use is to find the required proteins with homologous nucleotide sequences and to perform series alignments. It discovers homology by comparing a profile-HMM with either a single sequence or a sequence database. Sequences that score significantly higher on the profile-HMM than a null design are considered homologous to the sequences utilized to create the profile-HMM [27]. The program build is used to create profile-HMMs from a multiple sequence alignment within the HMMER bundle. The HMMER program’s profile-HMM execution is based on Krogh and colleagues’ work. HMMER is a system tool that has been ported to Linux, Windows, and the macOS. Pfam and InterPro are two well-known protein databases that rely heavily on HMMER. Several additional BI tools, including UGENE [27,28], utilize HMMER.
Unravelling the Oil and Gas Microbiome Using Metagenomics
Published in Kenneth Wunch, Marko Stipaničev, Max Frenzel, Microbial Bioinformatics in the Oil and Gas Industry, 2021
Assembled metagenomic data (i.e. contigs and MAGs) can subsequently be taken forward for annotation. Metagenome annotation refers to the identification of genomic features such as protein coding genes (CDS) as well as ribosomal RNA (rRNA) and transport RNA (tRNA) genes on the assembled metagenomic data. Protein coding genes can be predicted with tools like Prodigal (Hyatt et al., 2010), whereas barrnap (Seemann, 2013) and RNAmmer (Lagesen et al., 2007) can identify rRNA genes. For the identification of transport RNA genes, tRNAscan (Chan and Lowe, 2019) or ARAGORN (Laslett and Canback, 2004) can be used instead. Functional annotation of the predicted protein coding genes can be performed against databases such as KEGG, UniProt, NCBI, Pfam, and InterPro where functions from orthologous proteins and functional motifs and domains can be inferred. Standalone tools such as PROKA (Seemann, 2014), DFAST (Tanizawa et al., 2018) and MetaErg (Dong and Strous, 2019) combine the gene finding and functional annotation processes described above. In addition, web-based bioinformatics pipelines like IMG/M (Chen et al., 2019) and MG-RAST (Meyer et al., 2008) can provide gene finding and functional annotation capability as well as other advanced metagenome analysis tools.
An Efficient Protein Structure Prediction Using Genetic Algorithm
Published in Abdel-Badeeh M. Salem, Innovative Smart Healthcare and Bio-Medical Systems, 2020
Mohamad Yousef, Tamer Abdelkader, Khaled El-Bahnasy
It is a web-based server for protein homology detection and structure prediction. Input to HHpred can be amino acid sequence or a multiple sequence alignment. It uses a novel approach that conducts a pair-wise alignments of profile hidden Markov models (HMMs). It also uses a variety of databases like SCOP (Structural Classification of Proteins), Pfam, and PDB (Protein Data Bank). HHpred performs fast and well for single domain and for multi-domain query sequences and can be used to predict functional information of a protein from homolog proteins using various sequence-based search tools like BLAST, FASTA, or PSI-BLAST [11, 12, 13].
A Systematic Review of Deep Learning Approaches for Natural Language Processing in Battery Materials Domain
Published in IETE Technical Review, 2022
Geetanjali Singh, Namita Mittal, Satyendra Singh Chouhan
The increase in the available chemical data, machine learning and DL architectures enable the advancement in the field of cheminformatics and discovery of important material, which is useful in the representation and exploration of new material formation. New materials were discovered using various traditional methods but suffered from a bottleneck, as the interaction between the molecules is still difficult to predict and the whole process is time consuming. Few text alternatives for chemical structure representation are chemical formula, international chemical identifier (InChI) name given by IUPAC [50] and simplified molecular input line entry specification (SMILES) [51]. These text-based representations of the chemical entities are easily available to the researchers on the internet and hence NLP techniques can be utilized in the processing of the text-based representations of these chemicals and help to discover the unstructured or hidden knowledge. For the purpose of using NLP, some chemical databases are utilized such as UniProt, PDB, PFam, PROSITE, PubChem, DrugBank [52]. Carrera et al. [53], discovered six novel guanidinium salts using the QSPR model, which is based on the CPG neural networks, which can understand the structural relationship of the guanidinium cations.
Rapid communication: effects of cadmium exposure on the growth-related genes of Daphnia magna
Published in Journal of Toxicology and Environmental Health, Part A, 2022
Finally, a search was conducted on the transcribed genes for how many genes were related to growth. This was done by searching in water flea base (http://server7.wfleabase.org/genome/Daphnia_magna/openaccess/genes/Proteins/) for the sequence of primary and alternate isoform proteins of gene loci for Daphnia magna of the set of differentially expressed genes, which were identified in the database using the ID of genes provided by Orsini et al. (2016). With this information, homologous protein sequences were identified using HMMER 3.1b2 (2015) version 3.1b2 and protein family database Pfam repository (Mistry et al. 2020) release 32. All genes were classified with functional annotation of family proteins associated with exoskeleton formation as growth-related.
Transcriptome analysis revealing molecular mechanisms of enhanced pigment yield by succinic acid and fluconazole
Published in Preparative Biochemistry & Biotechnology, 2022
Jie Qiao, Xuanxuan He, Chengtao Wang, Huilin Yang, Zeng Xin, Binyue Xin, Junnan Wang, Ruoyun Dong, Huawei Zeng, Feng Li
Gene function was annotated using the following databases: NR (NCBI non-redundant protein sequences), NT (NCBI non-redundant nucleotide sequences), PFam (Protein family), KOG/COG (Clusters of Orthologous Groups of proteins), Swiss-Prot (a manually annotated and reviewed protein sequence database), KO (KEGG Ortholog database) and GO (Gene Ontology), and BLAST was applied according to a cutoff E-value of 10–5.