Explore chapters and articles related to this topic
Computation in Practice
Published in Taylor Arnold, Michael Kane, Bryan W. Lewis, A Computational Approach to Statistical Learning, 2019
Taylor Arnold, Michael Kane, Bryan W. Lewis
The technique known as feature hashing—or the hashing trick—offers a solution to these potential problems, particularly when the number of possible categories is large and unbounded [173]. A hash function is a deterministic map from any string to an integer in a fixed range. As a simple example, consider mapping the letters a through z to the numbers 1‐26. A hash function can be defined by mapping each letter in a string to its corresponding number, adding all of the letters together, and giving the result modulo the desired size of the hash. Better hash functions exist that map more uniformly into the target space and deal with capital letters and other punctuation marks. We can make use of a hash function φ to convert categorical data into a data matrix by picking a starting hash size H, and matching the category v to column φ(v) in the data matrix. This allows for the construction of a data matrix without prior knowledge of the categories.
The Bitwise Hashing Trick for Personalized Search
Published in Applied Artificial Intelligence, 2019
A common method for representing and comparing items such as documents or listing titles is with a vector created by feature hashing aka “the Hashing Trick” (Attenberg, Weinberger, and Dasgupta et al. 2009; Weinberger et al. 2009). The vector is initialized to zero. Each feature is hashed to an index number modulo the vector length. The vector element at the index position is then incremented, or in some implementations decremented based on a second sign hash. Items are then compared pairwise using a similarity function such as the cosine between the vectors.
Africa
China
Japan