PCFG – Knowledge and References

Explore chapters and articles related to this topic

Natural Language Processing

Published in Rakesh M. Verma, David J. Marchette, Cybersecurity Analytics, 2019

In password guessing attacks, researchers have used Probabilistic Context Free Grammars (PCFG) [482]. A PCFG is an extension of a Context Free Grammar (CFG) in which the rules have probabilities associated with them. The goal of a PCFG is to get a probability distribution over derivations, and hence also parse trees. PCFGs have been applied to the parsing problem in NLP. In password guessing attacks, the goals is to generate passwords in order from highly probable passwords to less likely ones. A database of publicly available passwords can be used for learning the probabilities. For example, we may introduce four nonterminals: P for password, L for letter, D for digit, and S for symbol. At the top level, we have the rules: P → LP ~|~ SP ~|~ DP. Of course, passwords have usually a lower and an upper limit on the length, which can be incorporated into the rules as well. For example, instead of the nonterminal P we may use P7 to indicate a password that is seven characters long and we may have rules such as P7 → L6D1 ~|~ L7 ~|~ L6S1. In practice, we do not introduce all the possible rules, rather we learn the probabilities and rules from a training data set.

Natural Language Understanding

View Chapter

Purchase Book

Published in Richard E. Neapolitan, Xia Jiang, Artificial Intelligence, 2018

Richard E. Neapolitan, Xia Jiang

Each rule in a PCFG has a probability associated with it. The probabilities for the rules for a given category sum to 1. For example, there are three rules for Sentence, namely Rules 1, 2, and 3. We have that P(Rule 1)+P(Rule 2)+P(Rule 3)=.6+.25+.15=1. $$ {\text{P(Rule 1)}}\,{\text{ + }}\,{\text{P(Rule 2)}}\,{\text{ + }}\,{\text{P(Rule 3)}} = .6 + .25 + .15 = 1. $$

Assisting academics to identify computer generated writing

View Article

Journal Information

Published in European Journal of Engineering Education, 2022

El-Sayed Abd-Elaal, Sithara H.P.W. Gamage, Julie E. Mills

Amancio (2015) compared the topological properties of complex networks (machine writing network) of artificially generated manuscripts by SCIgen against authentic papers. Also, Nguyen Minh and Labbé (2018) investigated the similarity of sentence structure in order to detect sentences that were created using Probabilistic Context-Free Grammar. Although, these techniques assisted to detect many fake publications, and in particular caught more than 120 published articles, they are ineffective against article generators that utilise other writing systems such as Markov chains or Recurrent Neural Network (Nguyen Minh and Labbé 2018). Nguyen-Son et al. (2017) reported a significant development of AAGs, in that recent AAGs can use natural language techniques that develop wording very close to human-crafted phrases. Consequently, such developments reduce the ability to detect AI writing by checking the grammatical structure of sentences.

The Use of Context-Free Probabilistic Grammar to Anonymise Statistical Data

View Article

Journal Information

Published in Cybernetics and Systems, 2020

Zygmunt Mazur, Janusz Pec

In this section, we will present a proprietary method of anonymising individual data using the properties of context-free grammar. Here are two basic definitions that will be used when discussing this method.Context-free grammar is called formal grammar of type 2 according to Chomsky’s hierarchy, i.e. ordered four (T, N, P, S), where:T is a finite collection of terminal symbols,N is a finished collection of symbols of nonterminal Ni,P is a finite set of transcription rules L R, L N, R (T *,S N is a distinguished initial symbol.Probabilistic, context-free (Probabilistic Context-Free Grammar – PCFG) is a context-free grammar that includes the probabilities of its production rules and is denoted by the symbol Production probabilities are assigned by observing that the sum of probabilities of rules with the same predecessor is 1.