Explore chapters and articles related to this topic
Probability, Random Variables, and Stochastic Processes
Published in Erchin Serpedin, Thomas Chen, Dinesh Rajan, Mathematical Foundations for SIGNAL PROCESSING, COMMUNICATIONS, AND NETWORKING, 2012
Exercise 7.7.14. In source coding, a prefix code or an instantaneous code is one in which no codeword is a prefix of any other codeword. Consider the following prefix code: Symbols {a, b, c, d} are mapped to codewords: 0, 10,110,111, respectively. Consider an i.i.d. sequence of symbols with probabilities of the symbols as Pr (a) = 0.5, Pr (b) = 0.3, Pr(c) = Pr (d) = 0.1. Find the average length of the binary representation of the code. Let Bn represent the sequence of binary symbols that result from the encoding. Find the PMF of Bn.
Learning deterministic models
Published in Richard E. Neapolitan, Xia Jiang, Artificial Intelligence, 2018
Richard E. Neapolitan, Xia Jiang
The binary codes in Figure 5.8 are prefix codes. In a prefix code, no code word for one character constitutes the beginning of the code word for another character. For example, if 10 is the code word for HT, then 101 cannot be the code word for TH. An optimal binary prefix code for a probability distribution is a binary prefix code that yields the minimal expected value of the number of bits needed to report the outcome of the experiment. The binary code in Figure 5.8 (a) is an optimal binary prefix code if p(H) = 1/2, and the one in Figure 5.8 (b) is an optimal code if P(H) = 3/4. Huffman’s algorithm, which appears in a standard algorithms text such as [Neapolitan, 2015], produces an optimal binary prefix code.
Trees
Published in Jonathan L. Gross, Jay Yellen, Mark Anderson, Graph Theory and Its Applications, 2018
Jonathan L. Gross, Jay Yellen, Mark Anderson
In a prefix code that uses its shorter codewords to encode the more frequently occurring symbols, the messages will tend to require fewer bits than in a code that does not. For instance, in a prefix code for ordinary English prose, it would make sense to use a short codeword to represent the letter “e” and a longer codeword to represent “x.”
Feature Selection for Supervised Learning and Compression
Published in Applied Artificial Intelligence, 2022
Phillip Taylor, Nathan Griffiths, Vince Hall, Zhou Xu, Alex Mouzakitis
Lossless compression aims to compress the data in such a way that the uncompressed version is indistinguishable from the original (Salomon and Motta 2010). Typically, lossless compression inspects the frequencies of symbols and looks for repeating symbols or sequences of symbols in the data stream. Perhaps the most simple method of compression is run-length encoding, in which symbols are encoded along with their number of consecutive repetitions. For example, the string ‘AAAABBA’ can be encoded as ‘A4B2A1.’ Two other notable lossless compression algorithms are LZ77 dictionary encoding (Ziv and Lempel 1978) and Huffman coding (Huffman 1952). LZ77 uses a sliding window and searches for repeating sequences, which are encoded as the length and location of its first occurrence in the window. Huffman coding produces a variable length prefix-code defining the path to the encoded symbol in a Huffman tree. Symbols that occur with higher frequencies are located closer to the root node in the tree and thus have shorter Huffman codes. Taken together, LZ77 and Huffman encoding make up the DEFLATE compression algorithm (Deutsch 1996), which is the basis of the ZIP file format.