Can Machines Imitate Humans? Integrative Turing Tests for Vision and Language Demonstrate a Narrowing Gap

📅 2022-11-23
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the capacity of current AI systems to emulate human behavior in language and vision tasks—and whether they can pass a Turing-test–style discrimination task. To this end, we introduce the first large-scale cross-modal Turing test benchmark, encompassing six tasks: image captioning, word association, dialogue, object detection, color estimation, and saliency prediction. Our evaluation employs a double-blind, randomized design, involving 549 human participants and outputs from 26 models, assessed jointly by human and AI discriminators. Key contributions include: (1) the first systematic visual–linguistic joint Turing test; (2) empirical evidence that anthropomorphism correlates weakly with conventional metrics (e.g., BLEU, mAP); (3) discovery that lightweight AI discriminators achieve significantly higher accuracy (62–71%) than humans (error rate 35–48%); (4) formalization of “anthropomorphism” as an independent evaluation dimension; and (5) open-sourcing of the benchmark dataset and standardized evaluation protocol.
📝 Abstract
As AI algorithms increasingly participate in daily activities, it becomes critical to ascertain whether the agents we interact with are human or not. To address this question, we turn to the Turing test and systematically benchmark current AIs in their abilities to imitate humans in three language tasks (Image captioning, Word association, and Conversation) and three vision tasks (Object detection, Color estimation, and Attention prediction). The experiments involved 549 human agents plus 26 AI agents for dataset creation, and 1,126 human judges plus 10 AI judges, in 25,650 Turing-like tests. The results reveal that current AIs are not far from being able to impersonate humans in complex language and vision challenges. While human judges were often deceived, simple AI judges outperformed human judges in distinguishing human answers from AI answers. The results of imitation tests are only minimally correlated with standard performance metrics in AI. Thus, evaluating whether a machine can pass as a human constitutes an important independent test to evaluate AI algorithms. The curated, large-scale, Turing datasets introduced here and their evaluation metrics provide new benchmarks and insights to assess whether an agent is human or not and emphasize the relevance of rigorous, systematic, and quantitative imitation tests in these and other AI domains.
Problem

Research questions and friction points this paper is trying to address.

Assessing AI's human imitation in language and vision tasks
Benchmarking AI agents against human performance in Turing tests
Evaluating human-likeness as independent AI performance criterion
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI imitation tests across language and vision tasks
Large-scale Turing datasets for human-likeness benchmarking
Independent evaluation metrics beyond conventional performance measures
🔎 Similar Papers
No similar papers found.
Mengmi Zhang
Mengmi Zhang
Assistant professor and PI of Deep NeuroCognition Lab, Nanyang Technological University, Singapore
neuroscience-inspired AIcomputer visioncomputational neurosciencecognitive science
G
Giorgia Dellaferrera
IBM Research - Zurich, Rueschlikon, Switzerland
A
Ankur Sikarwar
College of Computing and Data Science, Nanyang Technological University, Singapore
M
M. Armendáriz
Children’s Hospital, Harvard Medical School, USA
N
Noga Mudrik
Biomedical Engineering, Johns Hopkins University, USA
P
Prachi Agrawal
Birla Institute of Technology and Science, Pilani, India
S
Spandan Madan
School of Engineering and Applied Sciences, Harvard University, USA
Andrei Barbu
Andrei Barbu
MIT, CSAIL
H
Haochen Yang
Harvard University, USA
T
T. Kumar
Harvard University, USA
M
Meghna Sadwani
Jawaharlal Nehru Medical College, India
S
Stella Dellaferrera
University of Turin, Italy
M
Michele Pizzochero
School of Engineering and Applied Sciences, Harvard University, USA
H
H. Pfister
School of Engineering and Applied Sciences, Harvard University, USA
Gabriel Kreiman
Gabriel Kreiman
Professor, Harvard Medical School and Children's Hospital
Artificial Intelligence. Computational BiologyComputational Neuroscience.