Byte-level generative predictions for forensics multimedia carving

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This study addresses the challenge of reconstructing fragmented multimedia files in digital forensics when filesystem metadata is unavailable. To overcome this limitation, the authors propose a generative model–based file carving approach that introduces, for the first time, a byte-level generative Transformer (bGPT) to multimedia file reconstruction. Rather than relying solely on classification strategies, the method predicts subsequent bytes of BMP image fragments to synthesize missing data. The quality of the generated content is rigorously evaluated using multiple metrics, including cosine similarity, structural similarity index (SSIM), chi-square distance, and Jensen–Shannon divergence. Experimental results demonstrate that the proposed technique effectively captures byte-level patterns and significantly improves the accuracy of fragment matching within unallocated disk space.

Technology Category

Application Category

📝 Abstract

Digital forensic investigations often face significant challenges when recovering fragmented multimedia files that lack file system metadata. While traditional file carving relies on signatures and discriminative deep learning models for fragment classification, these methods cannot reconstruct or predict missing data. We propose a generative approach to multimedia carving using bGPT, a byte-level transformer designed for next-byte prediction. By feeding partial BMP image data into the model, we simulate the generation of likely fragment continuations. We evaluate the fidelity of these predictions using different metrics, namely, cosine similarity, structural similarity index (SSIM), chi-square distance, and Jensen-Shannon divergence (JSD). Our findings demonstrate that generative models can effectively predict byte-level patterns to support fragment matching in unallocated disk space.

Problem

Research questions and friction points this paper is trying to address.

file carving

fragmented multimedia files

missing data prediction

digital forensics

byte-level prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative modeling

byte-level prediction

multimedia carving