# Categorical Databases

Home | Download | Getting Started | Manual | Wiki | Papers | Screen Shots | Github | Google Group | YouTube | Conexus | Contact## **Read This First**

The suggested introductory course of study includes an introductory paper about CQL for computer scientists, introductory slides and video about CQL from a functional programming perspective, introductory slides and video about CQL from a mathematical perspective, introductory slides about category theory from a knowledge management perspective, a paper from NIST arguing that category theory is critical for IT interoperability, and an introductory textbook on category theory, as well as a data science case study.

## Papers

- Ologs: A Categorical Framework for Knowledge Representation (2011) [Pre-print]
Describes how to use categories as database/ontology schemas.

- Functorial Data Migration (2013) [Pre-print]
- Database Queries and Constraints via Lifting Problems (2013) [Pre-print]
- Definition of CQL as a context-free grammar with equations (2015)
- Relational Foundations for Functorial Data Migration (2015) [Pre-print] [slides]
- QINL: Query-Integrated Languages (2015)
- Algebraic Model Management: A Survey (2016) [Pre-print]
- Algebraic Databases (2017) [Pre-print]
- Algebraic Data Integration (2017) [Pre-print] [slides] [Aggregation supplement] [video]
- Informal Data Transformation Considered Harmful (2019)
- Fast Left Kan Extensions Using the Chase (2022)
- Presenting Profunctors (2024)

Describes how to migrate data between categorical databases.

Describes how certain common database queries and constraints can be encoded as topological lifts.

Describes a canonical syntax associated with CQL, as well as an axiomatic semantics.

Describes how to implement a fragment of CQL using SELECT/FROM/WHERE/UNION and a fresh-ID generator, and vice versa.

Describes the relationship between CQL and comprehension/monad syntax.

Describes CQL entirely in terms of multi-sorted equational logic, and contrasts it with existing tools.

Describes CQL in detail, and in particular, user-defined functions.

Describes CQL in detail, and in particular, how to implement it using automated theorem proving techniques and how to use it to integrate data.

Describes how CQL can preserve the data quality required to power machine learning algorithms through various data management tasks.

Describes CQL's sigma operation in detail, and in particular, how to implement it using a chase engine.

Motivated by problems in categorical database theory, we introduce and compare two notions of presentation for profunctors, uncurried and curried.

## Case Studies

- Using Category Theory to Facilitate Multiple Manufacturing Service Database Integration (2017) [Pre-print] [Companion Report]
- Categorical Data Integration for Computational Science (2019) [Pre-print] [Code] [Slides]
- Compositional Models for Power Systems (2019) [Video] [Slides]
- Financial Reporting Data Warehousing with CQL (2019)
- Algebraic Property Graphs (2019) [Coq code] [CQL code] [Java code]
- Relational to RDF Data Migration by Query Co-Evaluation (Draft, 2021)
- Consensus-Free Spreadsheet Integration (Draft, 2022)

Describes how CQL can be used for ontology-driven semantic search, and applied to commercial supply chains. Joint work with NIST, it extends the RDF-based approach previously pursued by NIST.

Describes how CQL can be used to integrate scientific data sets, such as those in quantum chemistry. Joint work with Stanford.

Describes how CQL can be used to integrate mathematical models of power grids in a modular way. Independent work by NIST.

Describes how CQL can be used to construct data warehouses suitable for client reporting in financial asset management.

Describes how CQL can be used to integrate enterprise knowledge graphs. Joint work with Uber.

Describes how CQL can be used to migrate relational data to RDF form, using the FIBO financial RDF ontology as an example.

Describes how CQL can be used to integrate spreadsheets. Join work with Chevron.

## Presentations

- Boston Haskell (2011) [videos 1 2 3 4 5]
- Oracle (2014)
- Boston Haskell (2014) [video]
- Lambda Conf (2017) [video]
- The Broad Institute (2017) [video]
- Dataversity Architecture Summit (2017)
- Kensho (2019) [video]

Gives an introduction to the math behind CQL.

Gives an introduction to the math behind CQL, tailored to database-centric audiences.

Gives an introduction to the math behind CQL, tailored to functional-programming audiences.

Gives an introduction to CQL for computer scientists.

Gives an introduction to the math behind CQL, and lessons learned from applying it in practice.

Gives an introduction to CQL tailored to data architects.

Gives an introduction CQL and the math behind it, and describes how it enables universal semantic IT inter-operability.

## Related

- A Categorical Manifesto (1991)
- A Model Theory for Generic Schema Management (2003)
- Formal Modelling and Application of Graph Transformations in the Resource Description Framework (2009)
- Allegories for Database Modeling (2013)
- Sketches as a Framework for Knowledge Management (2014)
- Entity-Attribute Sketches (2015)
- Knowledge Representation in Bicategories of Relations (2017)
- Category Theory Framework for Variability Models with Non-functional Requirements (2021)
- Representing Knowledge and Querying Data using Double-Functorial Semantics (2024)

Describes why category theory matters: how it is useful in computer science.

Describes how category theory can be used to study schema mappings in the sense of traditional database theory.

Describes various categories of RDF graphs in functorial-semantics style.

Describes a relational variation of the math behind CQL based on allegories rather than categories.

Describes how CQL and related Sketch-based formalisms can be used to integrate knowledge representations.

Describes EASIK, a category-theoretic predecessor to CQL that now ships within, and interoperates with, CQL.

Describes the math behind the allegorical approach to database modeling.

Describes an application of CQL to software engineering. [video]

Describes how the abstract structure of a 'double category of relations' is a flexible and expressive language in which to represent knowledge.