🤖 AI Summary
This study addresses the challenge of copyright infringement detection in automated software plagiarism identification, which is complicated by the diversity of digital artifacts. The authors systematically review the legal and technical landscape and propose a classification framework for detection challenges based on artifact types. Building upon this framework, they integrate multiple similarity detection paradigms—including fingerprinting, software birthmarks, and code embeddings—into a unified, open-source platform named Project Martial. The system enables cross-artifact-type code plagiarism detection and demonstrates, through real-world case studies, that combining complementary techniques significantly enhances both detection accuracy and applicability. Project Martial thus provides a reproducible tool to support both academic research and forensic practice in software copyright enforcement.
📝 Abstract
This paper explores the complexities of automatic detection of software similarities, in relation to the unique challenges of digital artifacts, and introduces Project Martial, an open-source software solution for detecting code similarity. This research enumerates some of the existing approaches to counter software plagiarism by examining both the academia and legal landscape, including notable lawsuits and court rulings that have shaped the understanding of software copyright infringements in commercial applications. Furthermore, we categorize the classes of detection challenges based on the available artifacts, and we provide a survey of the previously studied techniques in the literature, including solutions based on fingerprinting, software birthmarks, or code embeddings, and exemplify how a subset of them can be applied in the context of Project Martial.