Introduction to HTSQL

What is HTSQL?

HTSQL is a query language and web service for relational databases.

HTSQL is a Web Service

GET /school HTTP/1.1
output of /school query

HTSQL is a query language for the web. Queries are URLs that can be directly typed into a browser; the output could be returned in a variety of formats including HTML, CSV, JSON, etc.

HTSQL is a Relational Database Gateway

/school
SELECT "school"."code",
       "school"."name",
       "school"."campus"
FROM "ad"."school" AS "school"
ORDER BY 1 ASC

HTSQL wraps an existing relational database and translates incoming queries into SQL. The current version of HTSQL supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server.

HTSQL is an Advanced Query Language

/school{name,
        count(program),
        count(department)}
SELECT "school"."name", COALESCE("program"."count", 0), COALESCE("department"."count", 0)
FROM "ad"."school" AS "school"
LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "program"."school_code" FROM "ad"."program" AS "program" GROUP BY 2) AS "program" ON ("school"."code" = "program"."school_code")
LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "department"."school_code" FROM "ad"."department" AS "department" GROUP BY 2) AS "department" ON ("school"."code" = "department"."school_code")
ORDER BY "school"."code" ASC

HTSQL is a compact, high-level navigational query language designed for data analysts and web developers.

Why not SQL?

Relational algebra is frequently inadequate for encoding business inquiries — elementary set operations do not correspond to meaningful data transformations. The SQL language itself is tedious, verbose, and provides poor means of abstraction. Yet, the relational database is an excellent tool for data modeling, storage and retrieval.

HTSQL reimagines what it means to query a database. The combination of a navigational model with data flows enables expressions that naturally reflect business inquiries. The HTSQL translator uses SQL as a target assembly language, which allows us to fix the query model and language while keeping current investment in relational systems.

To demonstrate this point, we walk through a set of business inquires expressed over a fictitious university schema.

This data model has two top-level tables, school and department, where department has an optional link to school. Subordinate tables, course and program, have mandatory links to their parents.

SQL Conflates Rows & Columns

“For each department, please show the department name and the corresponding school’s campus.”

This business inquiry clearly separates the requested rows (each department) and columns (department name and corresponding school’s campus), but this separation is lost when the query is encoded in SQL:

SELECT d.name, s.campus
FROM ad.department AS d
LEFT JOIN ad.school AS s
       ON (d.school_code = s.code);

In this SQL query, the FROM clause not only picks target rows, but also includes extra tables required to produce output columns. This conflation makes it difficult to determine business entities represented by each row of the output.

The HTSQL translation separates the row definition from the column selection. The linking is implicit, and correct. The encoded query can be read aloud as a verbal inquiry.

Conflating Projection with Aggregation

“How many departments by campus?”

This business inquiry asks for rows corresponding to each campus, and for each row, the number of correlated departments. In the schema, there isn’t a campus table, so we have to take distinct values of campus column from the school table. This operation is called projection.

SELECT s.campus, COUNT(d)
FROM ad.school AS s
LEFT JOIN ad.department AS d
  ON (s.code = d.school_code)
WHERE s.campus IS NOT NULL
GROUP BY s.campus;

For this SQL encoding, the GROUP BY clause combines two operations: projection and evaluating the aggregate COUNT(). This conflation causes a reader of the query some effort determining what sort of rows are returned and how the aggregate is related to those rows.

In the HTSQL query, we start with an explicit projection (the ^ operator), then we select correlated columns. This way, the aggregation is indicated separately as part of the column selector rather than being conflated with the row definition.

SQL Lacks Means of Encapsulation

“For each department, return the department name and the number of offered 100’s, 200’s, 300’s and 400’s courses.”

In this business inquiry, we are asked to evaluate the same statistic across multiple ranges.

SELECT d.name,
       COUNT(CASE WHEN c.no BETWEEN 100 AND 199 THEN TRUE END),
       COUNT(CASE WHEN c.no BETWEEN 200 AND 299 THEN TRUE END),
       COUNT(CASE WHEN c.no BETWEEN 300 AND 399 THEN TRUE END),
       COUNT(CASE WHEN c.no BETWEEN 400 AND 499 THEN TRUE END)
FROM ad.department AS d
LEFT JOIN ad.course AS c
       ON (c.department_code = d.code)
GROUP BY d.name;

This query is tedious to write and error prone to maintain since SQL provides no way to factor the repetitive expression COUNT(...).

The HTSQL translation avoids this duplication by defining a calculated attribute count_courses($level) on the department table and then evaluating it for each course level.

In SQL, Modest Complexity is Painful

“For each school with a degree program, return the school’s name, and the average number of high-credit (>3) courses its departments have.”

This business inquiry asks us to do the following:

  • pick records from the school table
  • keep only those with an associated degree program
  • for each school record, compute average of:
    • for each associated department, count:
      • associated courses with credits>3
SELECT s.name, o.avg_over_3
FROM ad.school AS s
JOIN ad.program AS p ON (p.school_code = s.code)
LEFT JOIN (
    SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
    FROM ad.department d
    LEFT JOIN (
        SELECT c.department_code, COUNT(c) AS over_3
        FROM ad.course AS c WHERE c.credits > 3
        GROUP BY c.department_code
    ) AS i ON (i.department_code = d.code)
    GROUP BY d.school_code
) AS o ON (o.school_code = s.code)
GROUP BY s.name, o.avg_over_3;

Not only is this SQL encoding is hard to read, it took several passes to get right — without the COALESCE you get results that look correct, but aren’t.

Each syntactic component of the HTSQL query is self-contained; when assembled, they form a cohesive translation of the business inquiry.

HTSQL in a Nutshell

HTSQL was designed from the ground up as a self-serve reporting tool for data analysts. With HTSQL, the easy stuff is truly easy; and, the complex stuff is easy too.

In this section we introduce the fundamentals of HTSQL syntax and semantics. For a more incremental approach, please read the HTSQL Tutorial.

Scalar Expressions

Literal values:

3.14159 ‘Hello World!’
3.14159 Hello World!

Algebraic expressions:

(3+4)*6
42

Predicate expressions:

(7<13)&(1=0|1!=0)
true

Filtering

Sieve operator produces records satisfying the specified condition:

code name campus
bus School of Business south
mus School of Music & Dance south

Sorting operator reorders records:

code name campus
ph Public Honorariums
sc School of Continuing Studies
eng School of Engineering north
art School of Art and Design old

Truncating operator takes a slice from the record sequence:

code name campus
art School of Art and Design old
bus School of Business south

Selection & Definition

Selection specifies output columns:

name campus
School of Art and Design old
School of Business south
College of Education old
School of Engineering north

Title decorator defines the title of an output column:

name # of Dept
School of Art and Design 2
School of Business 3
College of Education 2
School of Engineering 4

Calculated attributes factor out repeating expressions:

code num_dept
eng 4
la 5
mus 4
ns 4

References carry over values across nested scopes:

title credits
Financial Accounting 5
Audit 5
Accounting Internship 6
History of Art Criticism I 4

Aggregation

Aggregates convert plural expressions to singular values.

Scalar aggregates:

count(department)
27

Nested aggregates:

avg(school.count(department))
2.66666666667

Various aggregation operations:

name count(course) max(course.credits) sum(course.credits) avg(course.credits)
Accounting 12 6 42 3.5
Art History 20 6 70 3.5
Astronomy 22 5 66 3.0
Bioengineering 17 8 55 3.23529411765

Projection

Projection operator returns distinct values. This example returns distinct campus values from the school table:

campus
north
old
south

In the scope of the projection, school refers to all records from school table having the same value of campus attribute:

campus count(school)
north 1
old 4
south 2

Linking

Even though HTSQL provides automatic links inferred from foreign key constraints, arbitrary linking is also allowed:

name count(department)
School of Business 3
School of Engineering 4
School of Arts and Humanities 5
School of Music & Dance 4

This query returns schools with the number of departments above average among all schools.

What’s up Next?

We intend to add to HTSQL many more features in the future.

Hierarchical Output

HTSQL should not be limited to tabular output.

/school{name,
        /program{title},
        /department{name}}

This query is to generate a tree-shaped output: for each school, it produces the school name, a list of titles of associated programs, and a list of names of associated departments.

Analytical Processing

HTSQL should support OLAP cube operations.

/rollup(school^campus){campus, count(school.department)}

This query is to produce the number of departments per school’s campus followed by a total value for all campuses.

Recursive Queries

HTSQL should be able to construct hierarchies from parent-child relationships.

/program{title, /recurse(part_of){title}}

This query is to return programs together with a list of all dependent subprograms.

Table Of Contents

Previous topic

HTSQL Documentation

Next topic

HTSQL Tutorial