Enron Corpus – Knowledge and References

Explore chapters and articles related to this topic

Use of Machine Learning and a Natural Language Processing Approach for Detecting Phishing Attacks

Published in Brojo Kishore Mishra, Raghvendra Kumar, Natural Language Processing in Artificial Intelligence, 2020

Chandrakanta Mahanty, Devpriya Panda, Brojo Kishore Mishra

Here the author uses the Scikit-learn Python library [12] and Multi-nomialNB() function. SEAHound delivers an expectation mark for every (verb-direct object) pair, and creates a certainty score (0 and 1) for the forecast. They utilized 1000 phishing and non-phishing email set from Nazario [13] and Enron Corpus [14] as preparing set respectively. They tried outcomes on every one of the 5014 phishing and 5000 non-phishing messages from Nazario and Enron Corpus email set. To extract all (verb-direct object) pairs from all sentences they used Stanford typed dependency parser and then AI is applied.

Change detection in a dynamic stream of attributed networks

View Article

Journal Information

Published in Journal of Quality Technology, 2018

Mostafa Reisi Gahrooei, Kamran Paynabar

In this section, to show how our proposed method can be applied to real problems we model and monitor the Enron email communication network. The Enron corpus consists of about 500,000 email communications among 184 employees of the Enron Corporation from 1998 to 2002 (Priebe et al. 2005). This data set can be represented by a sequence of directed networks, where each network represents one week of email communications. A directed edge is placed between nodes i and j at time t if at least one email has been sent from employee i to employee j during week t. The role of each employee within the company is available and used to set the attributes of networks. The roles consist of CEO, president, vice president, manager, director, trader, and employee. For simplicity we focus on the emails sent among president (P), managers and directors (MR), and CEO. The combination of these roles results in a categorical attribute for each edge, with nine possible values (i.e., CEO-CEO, CEO-P, CEO-MR, etc.) that are represented by dummy variables. Note that in these networks, given the attributes (employee roles) the existence of an edge between two nodes is independent from the existence of other edges.

Email Sentiment Analysis Through k-Means Labeling and Support Vector Machine Classification

View Article

Journal Information

Published in Cybernetics and Systems, 2018

Sisi Liu, Ickjai Lee

Datasets for empirical results in this research are extracted from the Enron corpus, an enormous source of Email data made public for applications for text mining problems and document classification tasks. Available at https://www:cs:cmu:edu/∼./enron/, the origin of Enron Email corpus is composed of over half million raw Email messages retrieved from more than 100 users (Klimt and Yang 2004). A database version (Liu and Lee 2015) (available at http://www.ahschulz.de/pub/R/data/enron-mysqldump_v5.sql.gz) that separates Email headings with content is chosen to fulfill the major property of the scheme of algorithm that focuses on identification and classification of sentiments from Email content.