Scholar

Filippos Kokkinos

Google Scholar ID: uuXQjUIAAAAJ

Research Scientist, Meta Superintelligence Lab

Generative AIvLLMs3D reconstructionDiffusion Models

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

11,765

H-index

i10-index

Publications

Co-authors

Contact

CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

6 items

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

2025

Cited

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training

2025

Cited

WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild

2025

Cited

Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects

2025

Cited

VGRP-Bench: Visual Grid Reasoning Puzzle Benchmark for Large Vision-Language Models

2025

Cited

Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

Produced nine peer-reviewed publications (five first-author at CVPR, ECCV, NeurIPS, and related venues). Contributed to projects such as 'Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training' and 'WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild'.

Research Experience

Currently leading a vision–language reasoning team at Meta’s Super Intelligence Lab. Previously developed a complete text-to-3D generation stack that combines diffusion with neural rendering techniques—NeRFs, Gaussian splats, and mesh optimization—to create high-fidelity, editable 3D assets from natural-language prompts.

Education

Ph.D. from UCL, supervised by Iasonas Kokkinos; published three NLP papers as an undergraduate at NTUA and won an NAACL/SemEval sentiment competition.

Background

Multimodal GenAI researcher at Meta. Research interests include multimodal and generative models, reinforcement learning, text-to-3D diffusion, neural rendering (NeRF, Gaussians), and 3D geometry & reconstruction.

Miscellany

Open to collaboration on foundational multimodal research. Believes that the future of AI is inherently multimodal, with text being only the beginning.

Co-authors

0 total

Co-authors: 0 (list not available)