🤖 AI Summary
This work proposes TLSQL, a SQL-based declarative language that natively embeds tabular learning tasks within relational databases using SQL-like syntax, eliminating the need for explicit data export or manual feature engineering. By integrating directly into native database workflows, TLSQL circumvents the traditional barriers of data migration and handcrafted feature construction. A lightweight Python library automatically compiles high-level TLSQL specifications into standard SQL queries and structured learning task descriptions, enabling seamless interoperability with downstream machine learning frameworks. Experimental results demonstrate that this approach substantially simplifies database-centric machine learning pipelines, enhancing both development efficiency and system integration while preserving the expressive power of relational query languages.
📝 Abstract
Table learning, which lies at the intersection of machine learning and modern database systems, has recently attracted growing attention. However, existing table learning frameworks typically require explicit data export and extensive feature engineering, creating a high barrier for database practitioners. We present TLSQL (Table Learning Structured Query Language), a system that enables table learning directly over relational databases via SQL-like declarative specifications. TLSQL is implemented as a lightweight Python library that translates these specifications into standard SQL queries and structured learning task descriptions. The generated SQL queries are executed natively by the database engine, while the task descriptions are consumed by downstream table learning frameworks. This design allows users to focus on modeling and analysis rather than low-level data preparation and pipeline orchestration. Experiments on real-world datasets demonstrate that TLSQL effectively lowers the barrier to integrating machine learning into database-centric workflows. Our code is available at https://github.com/rllm-project/tlsql/.