Explore chapters and articles related to this topic
Secure computation outsourcing of genome-wide association studies using homomorphic encryption
Published in Shin-ya Nishizaki, Masayuki Numao, Jaime Caro, Merlin Teodosia Suarez, Theory and Practice of Computation, 2019
Angelica Khryss Yvanne C. Ladisla, Richard Bryann L. Chua
In most cases, GWAS is based on a case-control design in which SNPs across the human genomes in a case and control groups are genotyped and subjected to statistical correlation analyses (Naveed et al., 2015; Steen, 2015). Prior to statistical association methods, genetic markers must undergo quality control procedures to avoid potential false-negative and false-positive associations (Tabangin, Woo, & Martin, 2009). Quality control is attained by computing for the minor allele frequency (MAF) (New SNP Attribute, n.d.) and Hardy-Weinberg equilibrium (HWE) (Lauter et al., 2015). After the quality control procedures, the selected SNPs will be tested for association using a statistical association test. Studies on qualitative traits can use χ2-square test, Cochran-ArmitageTrend test, or logistic regression, while linear regression can be used for quantitative traits (Smith & Newton-Cheh, 2009). The method for association testing could have series of replications and validations before an interpretation can be made (Steen, 2015).
Two New Nonparametric Models for Biological Networks
Published in K Hemachandran, Shubham Tayal, Preetha Mary George, Parveen Singla, Utku Kose, Bayesian Reasoning and Gaussian Processes for Machine Learning Applications, 2022
Deniz Seçilmiş, Melih Ağraz, Vilda Purutçuoğlu
The second dataset is called as the human gene expression pathway data, which was gathered by Stranger et al. [48], and is described by Bhadra and Mallick [49] and Chen et al. [50]. This dataset is collected to measure the gene expression in the B-lymphocyte cells in people of Northern and Western European ancestry from Utah (CEU). The data are composed of 60 unrelated individuals for 100 probes. Briefly, the dataset has a 60 × 100 dimension. Here, the focus is on the 3,125 single nucleotide polymorphisms that are found in the 5 UTR (untranslated region) of mRNA (messenger RNA) with a minor allele frequency of 0.1. Since UTR of mRNA has an important role in the regulation of the gene expression, the inference of this system has been performed in the previous study [20] via the copula GGM. Different from the first real dataset, here the true precision of the gene expression pathway is unknown, leading us to not compute the accuracy measures for the comparison of RFA outputs and to not have the idea of the network topology. Therefore, for this human gene expression dataset, we run RFA by excluding the calculation of the accuracy measures based on TP, TN, FP, and FN, and instead, we only record the interactions that are found in the resulting precision matrices. The results exhibit that RFA can detect the new interactions as well as capture validated interactions based on the databases STRING and GeneMANIA, which exhibit the true structure of protein–protein interactions and gene interactions, respectively. Table 7.6 illustrates the interactions between molecules that are detected via RFA; and among these interactions, the ones between HMOX1 and IL8, RPS4Y1 and EIF1AY, and between DDX3Y and KDM5D are validated based on both the STRING and GeneMANIA databases (DB); while the interaction between TNFRSF19 and LEPREL1 is validated based on only the GeneMANIA database.
Effects of tumor necrosis factor (TNF) gene polymorphisms on the association between smoking and lung function among workers in swine operations
Published in Journal of Toxicology and Environmental Health, Part A, 2021
Zhiwei Gao, James A. Dosman, Donna C. Rennie, David A. Schwartz, Ivana V. Yang, Jeremy Beach, Ambikaipakan Senthilselvan
There are some limitations to our study. This study is a cross-sectional study and consequently is not possible to infer causation. The lack of objective measures of respiratory hazards in the swine operations limited our ability to further examine the associations with different exposure levels (i.e., dose-response relationship). Further gender-stratification analysis was limited because of the small sample size (n = 374), a small number of female workers (36%) and a lower minor allele frequency (A allele of rs361525: 6.1%, T allele of rs1799724: 10.7%, A allele of rs1800629: 14.8%) among the workers. This study defined a nonsmoker as less than 400 cigarettes in his or her life time, which is different from the Center for Disease Control (CDC) definition of a nonsmoker (less than 100 cigarettes in his or her life time). Due to the discrepancy in the definition of a nonsmoker, caution is needed in comparing the results of this study with other studies using the CDC definition.
Association between ALOX15 gene polymorphism and brick-tea type skeletal fluorosis in Tibetans, Kazaks and Han, China
Published in International Journal of Environmental Health Research, 2021
Yanru Chu, Yang Liu, Ning Guo, Qun Lou, Limei Wang, Wei Huang, Liaowei Wu, Jian Wang, Meichen Zhang, Fanshuo Yin, Yanhui Gao, Yanmei Yang
Four SNPs in ALOX15 gene were genotyped. We only observed a few AA and AG genotypes in rs743646 in our study. Therefore, we excluded rs743646 from the subsequent statistical analyses. Testing for deviation from HWE was performed within the control participants and the hypothesis of Hardy-Weinberg equilibrium could not be rejected for each of the 3 SNPs (all p﹥0.05). We used control participants to calculate the minor allele frequency (MAF) for the three SNPs and two kinds of genotypes in each SNP of ALOX15 were divided by the presence and absence of the low frequency allele. There were significant differences in genotype frequencies among the participants for SNP rs7220870 (p < 0.001) and SNP rs2664593 (p = 0.001), but no difference was observed in rs11078528 (p = 0.096). The association between ALOX15 genotypes and skeletal fluorosis is summarized in Table 3. ALOX15 genotypes rs11078528 was significantly associated with skeletal fluorosis (χ2 = 4.752, p = 0.029), however the association was not seen after adjusting for age, sex, ethnicity, IF and UF (Table 3). After stratified by ethnicity, there were no statistically significant differences in skeletal fluorosis for rs7220870, rs2664593 and rs11078528 (Table 3). We also observed the interactions between ALOX15 genotypes and potential risk factors and found that only rs7220870 AC/AA genotype exerted a significantly increased skeletal fluorosis risk in participants aged ≥65 (OR = 2.058, 95%CI 1.008–4.200, p = 0.047).
Comment: Ridge Regression, Ranking Variables and Improved Principal Component Regression
Published in Technometrics, 2020
Nam-Hee Choi, Kerby Shedden, Gongjun Xu, Xuefei Zhang, Ji Zhu
For the predictor data X, we considered data simulated to match SNP data from human subjects. Our simulated SNP dataset was generated using the GWASimulator program (Li and Li, 2008) which simulates biallelic SNP genotypes that have mean and local correlation structures that match a given set of phased measured genotypes. As the input set for GWASimulator, we used phased genotypes from the HapMap project (The International HapMap Consortium, 2003) for 60 individuals (120 phased chromosomes) in the HapMap CEU sample (Utah residents with ancestry from northern and western Europe). We then selected from the GWASimulator output the data for the 22,518 SNPs on chromosome 1 that were assayed on the Illumina platform at the Sanger Institute. The data were partitioned into 148 non-overlapping blocks of adjacent SNPs of size 150. SNPs were eliminated if the minor allele frequency was below 0.05. An iterative procedure was applied to remove SNP pairs with correlation greater than 0.9: at each step in the procedure, the pair with greatest correlation was identified and one SNP in the pair was selected at random and dropped; the procedure continued until no SNP pairs with correlation greater than 0.9 remained. If the final block was shorter than the length of β, it was discarded. Otherwise the initial segment of the block with length equal to the length of β was used. Finally, each SNP was standardized to have zero mean and unit variance. For each X matrix, a single Y vector was generated following each of the 38 population models.