Reassembling the remnants of destroyed (shredded or ripped) documents enables recovery of valuable information in forensic investigations and archival research. In the field of information security, reassembling document could also help us understand our limitations against adversaries' attempts to gain access to information. A computerized system capable of automatically restoring fragmented archives into their complete digital form is not only a significant time saver, but also a valuable tool to help convert these physical ruins to computer-understandable digital representations for subsequent machine-assisted analysis.
We are developing fragment matching algorithms/metrics to support the local alignment computation of document pieces. We are also developing efficient composition algorithms to find a global solution that best reassembles all the fragments.
We are maintaining a public data benchmark of digital document stripe puzzles for comparative studies.
Reassembling Shredded Document Stripes Using Word-path Metric and Greedy Composition Optimal Matching Solver
Yongqing Liang and Xin Li
IEEE Transactions on Multimedia, 2019
The DocDataset contains:
Click here to download the DocDataset. Unzip the package and copy the “gt” and “stripes” into the “/data/” folder of the repository.
The following is a demo of our Stripe Document Reassembly pipeline (IEEE Trans. Multimedia 2019 paper). This video is also available at https://youtu.be/dBlXH8XppoQ