The Book of Life approach: Enabling richness and scale for life course research

📅 2025-07-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Life course research has long faced a trade-off between qualitative depth and quantitative breadth. This paper introduces the “Book of Life” paradigm, which— for the first time—automatically transforms heterogeneous, multi-source behavioral logs (e.g., healthcare, education, employment records) into rich, socially contextualized, longitudinal, cross-domain narrative texts via structured reconstruction and large language model–driven textual pattern recognition. We develop BOLT, an open-source toolkit enabling high-fidelity textual representation of large-scale life trajectories. Applied to the full Dutch national population registry, our method successfully generated over 100 million individual “Books of Life,” demonstrating breakthroughs in scalability (n > 10⁸), richness (multi-dimensional, temporally resolved, contextually grounded narratives), and extensibility. This work establishes a novel computational infrastructure for life course research within computational social science.

Technology Category

Application Category

📝 Abstract
For over a century, life course researchers have faced a choice between two dominant methodological approaches: qualitative methods that analyze rich data but are constrained to small samples, and quantitative survey-based methods that study larger populations but sacrifice data richness for scale. Two recent technological developments now enable us to imagine a hybrid approach that combines some of the depth of the qualitative approach with the scale of quantitative methods. The first development is the steady rise of ''complex log data,'' behavioral data that is logged for purposes other than research but that can be repurposed to construct rich accounts of people's lives. The second is the emergence of large language models (LLMs) with exceptional pattern recognition capabilities on plain text. In this paper, we take a necessary step toward creating this hybrid approach by developing a flexible procedure to transform complex log data into a textual representation of an individual's life trajectory across multiple domains, over time, and in context. We call this data representation a ''book of life.'' We illustrate the feasibility of our approach by writing over 100 million books of life covering many different facets of life, over time and placed in social context using Dutch population-scale registry data. We open source the book of life toolkit (BOLT), and invite the research community to explore the many potential applications of this approach.
Problem

Research questions and friction points this paper is trying to address.

Combining qualitative depth with quantitative scale in life course research
Transforming complex log data into textual life trajectory representations
Enabling large-scale analysis of rich life data using registry information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses complex log data for rich life accounts
Applies LLMs to analyze textual life trajectories
Creates scalable 'book of life' data representation
🔎 Similar Papers
No similar papers found.
M
Mark D. Verhagen
Center for Information Technology Policy, Princeton University; Leverhulme Centre for Demographic Science, Oxford University; Amsterdam Health and Technology Institute
Benedikt Stroebl
Benedikt Stroebl
Princeton University
ai agentsnlpllmsreinforcement learning
T
Tiffany Liu
Department of Sociology, Princeton University; Office of Population Research, Princeton University
Lydia T. Liu
Lydia T. Liu
Assistant Professor of Computer Science, Princeton University
Machine LearningStatisticsDecision MakingAlgorithmic Fairness
Matthew J. Salganik
Matthew J. Salganik
Department of Sociology, Princeton University
Social NetworksComputational Social ScienceQuantitative Methods