Research Vision

AI Database Systems

Scroll to follow the story — from the chaos of real-world data to a unified, agent-driven database for the AI era.

🤖 Query Agent 🤖 Query Agent
Query Engine
Query Parse Optimize Execute
⚡ Semantic Ops + Traditional Ops
Semantic Knowledge Graph
🤖 Maintenance 🤖 Maintenance
📄 CSV 🖼️ Image { } JSON 📝 Text 📋 Log 🗃️ DB

Heterogeneous Data Sources

01 — The Problem

Real-world data
is wild.

Heterogeneous, unstructured, schema-less

Databases were designed for structured, schema-conforming data. But reality looks different — scientific datasets, enterprise records, medical histories, sensor streams. Each exists in its own format and semantic world. Moving and transforming data (ETL) is not enough. We need a layer that captures what data means.

02 — Semantic Layer

Express meaning,
not just data.

A Semantic Knowledge Graph as the foundation

We represent heterogeneous data through a Semantic Knowledge Graph — a unified structure that captures not just data, but its meaning and relationships. Unlike ETL pipelines that transform data, the graph expresses intent: what entities exist, how they relate, and what they represent across modalities.

📄 TurboLynx — VLDB 2026

03 — Agents

Agents as
graph architects.

Autonomous construction and maintenance

Constructing and maintaining a knowledge graph over ever-changing, heterogeneous data is beyond human capacity. Autonomous agents continuously traverse, analyze, and update the semantic layer — detecting changes, resolving conflicts, and extending coverage as new data arrives.

04 — Query Layer

Query the meaning,
not just the bytes.

Full query engine for semantic data

Atop the semantic graph sits a full query engine — parsing, planning, and executing queries across richly structured, multi-modal data. Agents issue queries on behalf of users and applications, navigating the full complexity of the knowledge layer.

05 — The Full Vision

The AI
Database System.

Semantic Operators & Query Optimization

Some queries must reason over unstructured content — classifying text, matching images, extracting relations. For these, we embed LLMs directly into query execution as Semantic Operators. Since LLM calls are expensive, query optimization — proxy models and cardinality estimation — is critical to making the system practical at scale.

📄 Semantic Op Optimization — SIGMOD 2027