Explore chapters and articles related to this topic
A feature-based intelligent deduplication compression system with extreme resemblance detection
Published in Connection Science, 2021
Xiaotong Wu, Jiaquan Gao, Genlin Ji, Taotao Wu, Yuan Tian, Najla Al-Nabhan
Delta compression is used to compress two similar files or chunks. For a target chunk, its source chunk is a base, which is used as a reference of delta compression. The data after compression is the difference between chunks. In FIDCS-ERD, it implements delta encoding based on Xdelta (MacDonald, 2000), which is one of the optimal approaches to compress highly similar data streams. That is, Xdelta introduces a Copy/Insert algorithm, which utilises a string matching technique to first find matching offsets in SC and TC and then generate a set of Copy instructions for every matching range and Insert instructions to cover the unmatched regions. Xdelta is an approximate implementation of the greedy algorithm based on the hash techniques. It satisfies the linear time and space complexity at the cost of sub-optimal compression.