Hashing trick

Hashing trick

The hashing trick is a technique used in machine learning to efficiently represent and compare items with a large and unbounded number of possible categories. It involves creating a vector using feature hashing, which uses the hash values of the features as direct indices of the vector. This method is fast and space efficient, making it useful for tasks such as document classification and feature lookup.From: A Computational Approach to Statistical Learning [2019], The Bitwise Hashing Trick for Personalized Search [2019], Takeover Transition in Autonomous Vehicles: A YouTube Study [2020]

Computation in Practice

View Chapter

Purchase Book

Published in Taylor Arnold, Michael Kane, Bryan W. Lewis, A Computational Approach to Statistical Learning, 2019

Taylor Arnold, Michael Kane, Bryan W. Lewis

The technique known as feature hashing—or the hashing trick—offers a solution to these potential problems, particularly when the number of possible categories is large and unbounded [173]. A hash function is a deterministic map from any string to an integer in a fixed range. As a simple example, consider mapping the letters a through z to the numbers 1‐26. A hash function can be defined by mapping each letter in a string to its corresponding number, adding all of the letters together, and giving the result modulo the desired size of the hash. Better hash functions exist that map more uniformly into the target space and deal with capital letters and other punctuation marks. We can make use of a hash function φ to convert categorical data into a data matrix by picking a starting hash size H, and matching the category v to column φ(v) in the data matrix. This allows for the construction of a data matrix without prior knowledge of the categories.

The Bitwise Hashing Trick for Personalized Search

View Article

Journal Information

Published in Applied Artificial Intelligence, 2019

Braddock Gaskill

A common method for representing and comparing items such as documents or listing titles is with a vector created by feature hashing aka “the Hashing Trick” (Attenberg, Weinberger, and Dasgupta et al. 2009; Weinberger et al. 2009). The vector is initialized to zero. Each feature is hashed to an index number modulo the vector length. The vector element at the index position is then incremented, or in some implementations decremented based on a second sign hash. Items are then compared pairwise using a similarity function such as the cosine between the vectors.

Explore chapters and articles related to this topic

Computation in Practice

The Bitwise Hashing Trick for Personalized Search