🤖 AI Summary
This work addresses the challenge of digitizing handwritten notes by proposing the first end-to-end, unpaired “derendering” method that reconstructs arbitrary offline handwritten images—such as smartphone-captured paper-and-pen notes—into editable online digital ink. The approach innovatively integrates dual priors: reading (text recognition) and writing (pen-tip dynamics), forming a multimodal generative framework that jointly models geometric structure and semantic content without requiring paired offline–online training data. Evaluated on the HierText benchmark, our method produces valid ink traces for 87% of inputs, with 67% achieving visual fidelity comparable to human-written trajectories—substantially outperforming existing geometry-only methods. Moreover, it demonstrates, for the first time, strong generalization to real-world handwritten notes and sketch-like drawings.
📝 Abstract
Digital note-taking is gaining popularity, offering a durable, editable, and easily indexable way of storing notes in a vectorized form, known as digital ink. However, a substantial gap remains between this way of note-taking and traditional pen-and-paper note-taking, a practice that is still favored by a vast majority. Our work InkSight, aims to bridge the gap by empowering physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting), a process we refer to as derendering. Prior research on the topic has focused on the geometric properties of images, resulting in limited generalization beyond their training domains. Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples, which are difficult to obtain. To our knowledge, this is the first work that effectively derenders handwritten text in arbitrary photos with diverse visual characteristics and backgrounds. Furthermore, it generalizes beyond its training domain into simple sketches. Our human evaluation reveals that 87% of the samples produced by our model on the challenging HierText dataset are considered as a valid tracing of the input image and 67% look like a pen trajectory traced by a human.