Prompt-Hacking: The New p-Hacking?

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This paper identifies “prompt-hacking” as a novel threat to research integrity—where researchers strategically manipulate prompts to induce large language models (LLMs) to generate outputs aligned with preconceived hypotheses, analogous to p-hacking in statistics. Method: Drawing on critical analysis of LLMs’ non-determinism, opacity, and systemic biases—and integrating perspectives from philosophy of science and research methodology—the study systematically conceptualizes and defines prompt-hacking, establishing its theoretical linkage to ethical misconduct in research. Contribution/Results: The paper argues that LLMs must not supplant rigorous statistical or qualitative analytical methods. It proposes three core safeguards: transparent, auditable prompt documentation; clearly defined boundaries for LLM use in empirical research; and human-supervised, traceable application protocols. These constitute the first conceptual framework and practical guidelines for the responsible integration of LLMs in scholarly inquiry.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) become increasingly embedded in empirical research workflows, their use as analytical tools raises pressing concerns for scientific integrity. This opinion paper draws a parallel between"prompt-hacking", the strategic tweaking of prompts to elicit desirable outputs from LLMs, and the well-documented practice of"p-hacking"in statistical analysis. We argue that the inherent biases, non-determinism, and opacity of LLMs make them unsuitable for data analysis tasks demanding rigor, impartiality, and reproducibility. We emphasize how researchers may inadvertently, or even deliberately, adjust prompts to confirm hypotheses while undermining research validity. We advocate for a critical view of using LLMs in research, transparent prompt documentation, and clear standards for when LLM use is appropriate. We discuss how LLMs can replace traditional analytical methods, whereas we recommend that LLMs should only be used with caution, oversight, and justification.

Problem

Research questions and friction points this paper is trying to address.

Prompt-hacking risks scientific integrity like p-hacking

LLMs' biases threaten rigor and reproducibility in research

Researchers may manipulate prompts to confirm hypotheses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Strategic prompt tweaking for desired outputs

Transparent documentation of LLM prompts

Cautious use of LLMs with oversight

🔎 Similar Papers

No similar papers found.