Explore chapters and articles related to this topic
Introduction to Genomics
Published in Altuna Akalin, Computational Genomics with R, 2020
Ensembl: This is another online browser maintained by the European Bioinformatics Institute and the Wellcome Trust Sanger Institute in the UK, http://www.ensembl.org. Similar to the UCSC browser, users can visualize genes or genomic coordinates from multiple species and it also comes with auxiliary data. Ensembl is associated with the Biomart tool which is similar to UCSC Table browser, and can download genome data including all the auxiliary data set in multiple formats.
An overview of technologies for MS-based proteomics-centric multi-omics
Published in Expert Review of Proteomics, 2022
Andrew T. Rajczewski, Pratik D. Jagtap, Timothy J. Griffin
When integrating proteomics with transcriptomic, genomic, metabolomic, or other data, there are several challenges that must be considered and addressed. Annotation of corresponding genes and their protein products is one such challenge; for example, unsynchronized annotations of proteomic and transcriptomic data make comparisons between coding regions and their expressed protein products difficult [64]. As a solution, the UniProt database [65] provides a well-curated repository of characterized proteins from diverse organisms. Entries contain annotations for proteins including unique UniProt identifiers cross-referenced with coding gene names, and other identifiers (e.g. RefSeq, Ensembl IDs, etc.) useful for matching proteins to corresponding genomic or transcriptomic sequences. In addition, computational tools, such as biomaRt, can be used to automatically map protein sequences to common genome or transcriptome sequence coordinates [66].
Peptidomics and proteogenomics: background, challenges and future needs
Published in Expert Review of Proteomics, 2021
Rui Vitorino, Manisha Choudhury, Sofia Guedes, Rita Ferreira, Visith Thongboonkerd, Lakshya Sharma, Francisco Amado, Sanjeeva Srivastava
Various bioinformatics software such as CanProVar 2.0 has been developed to help search for cancer-related variations in genomes. CanProVar2.0 is a workflow that uses shotgun proteomics to discover cancer-related peptides or proteins [109]. This software incorporates data from various databases such as COSMIC, TCGA, OMIM, HPI, and BIOMART. This workflow identifies variations in proteins from cancer patients using a heat map, visualizes cancer-associated hot chromosomal bands, and can provide information about proteins that belong to a particular biological pathway and are differentially expressed in cancer [109]. Independent component analysis (ICA) has been applied to proteogenomics data of breast cancer to gain insights into the disease mechanism and decipher molecular markers of the associated pathways [98]. Integrative biology allows the fusion of different domains, such as the development of a Bayesian hierarchical model (BEHAVIOR) in conjunction with proteogenomic and clinical data to identify cancer markers [110]. This tool has also been applied to the pan-cancer proteogenomic data from The Cancer Genome Atlas (TCGA) to find prognostic markers. A computational proteogenomic workflow was established to evaluate lncRNAs (long non-coding RNAs) as cancer biomarkers using LC-MS/MS data [111]. The use of quantitative proteomics data and profiles as prognostic markers for breast cancer, prostate cancer, head and neck squamous cell carcinoma (HNSCC), and glioblastoma multiforme (GBM) has been elaborated using a combination of systems biology tools [112–116]. Integrated proteogenomic analysis workflow (IPAW) is a platform for discovery, curation, and validation of new peptides [117]. This workflow can successfully identify proteins translated from pseudogenes or suspected non-coding regions of the genome.
Gene expression profiles and cytokine environments determine the in vitro proliferation and expansion capacities of human hematopoietic stem and progenitor cells
Published in Hematology, 2022
Roberto Dircio-Maldonado, Rosario Castro-Oropeza, Patricia Flores-Guzman, Alberto Cedro-Tanda, Fredy Omar Beltran-Anaya, Alfredo Hidalgo-Miranda, Hector Mayani
Microarrays were performed on freshly isolated cells, before any culture. For the HSC population, 12,432 ± 5,934 cells were obtained with 98.6% purity; for the MPC population, 35,282 ± 16,795 cells were obtained with 97.9% purity; for the EPC population, 54,530 ± 17,780 cells were obtained with 98.6% purity. The quality of the RNA obtained from the cells was evaluated by capillary electrophoresis (Agilent 2100 Bioanalyzer; Agilent Technologies). Only RNA samples with an RNA Integrity Number greater than 6 were processed for microarray studies. One hundred picograms of RNA from each experimental condition were evaluated in the Human Transcriptome Array 2.0 gene chip (HTA 2.0; Affymetrix, Santa Clara, CA), which includes both coding (44,699) and non-coding (22,829) transcripts. The cDNA synthesis, amplification, and gene expression profiling were done with WT Pico Reagent Kit for fresh samples (Affymetrix). Wash and stain processes were performed with the GeneChip Hybridization wash and stain kit in the GeneChip Fluid Station 450 (Affymetrix). The probe arrays were scanned using the GeneChip Scanner 3000 7G (Affymetrix). Array signal intensities were analyzed with the Affymetrix expression console. Raw data probes were normalized using Signal Space Transformation-Robust Multichip Analysis (SST-RMA) for the background correction and quantile algorithm. To define the differential expression profile within the different cell populations, Affymetrix Transcriptome Analysis Console (TAC) software was used. Genes with fold change greater than 2 or lower than −2, p-value <0.05, and FDR <0.05 were considered significantly altered between the three different cell populations included in this study (MPCs vs HSCs, EPCs vs HSCs, and MPCs vs EPCs). All data were uploaded in GEO (ID: GSE107497). In order to obtain high-quality gene annotation and to identify non-Affymetrix-annotated transcripts, the Bioconductor biomaRt package was used to enrich biological data. Differentially expressed genes were employed for further analysis.