HTSQL is a query language and web service for relational databases.
HTSQL is a query language for the web. Queries are URLs that can be directly typed into a browser; the output could be returned in a variety of formats including HTML, CSV, JSON, etc.
/school
SELECT "school"."code",
"school"."name",
"school"."campus"
FROM "ad"."school" AS "school"
ORDER BY 1 ASC
HTSQL wraps an existing relational database and translates incoming queries into SQL. The current version of HTSQL supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server.
/school{name,
count(program),
count(department)}
SELECT "school"."name", COALESCE("program"."count", 0), COALESCE("department"."count", 0)
FROM "ad"."school" AS "school"
LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "program"."school_code" FROM "ad"."program" AS "program" GROUP BY 2) AS "program" ON ("school"."code" = "program"."school_code")
LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "department"."school_code" FROM "ad"."department" AS "department" GROUP BY 2) AS "department" ON ("school"."code" = "department"."school_code")
ORDER BY "school"."code" ASC
HTSQL is a compact, high-level navigational query language designed for data analysts and web developers.
Relational algebra is frequently inadequate for encoding business inquiries — elementary set operations do not correspond to meaningful data transformations. The SQL language itself is tedious, verbose, and provides poor means of abstraction. Yet, the relational database is an excellent tool for data modeling, storage and retrieval.
HTSQL reimagines what it means to query a database. The combination of a navigational model with data flows enables expressions that naturally reflect business inquiries. The HTSQL translator uses SQL as a target assembly language, which allows us to fix the query model and language while keeping current investment in relational systems.
To demonstrate this point, we walk through a set of business inquires expressed over a fictitious university schema.
This data model has two top-level tables, school and department, where department has an optional link to school. Subordinate tables, course and program, have mandatory links to their parents.
“For each department, please show the department name and the corresponding school’s campus.”
This business inquiry clearly separates the requested rows (each department) and columns (department name and corresponding school’s campus), but this separation is lost when the query is encoded in SQL:
SELECT d.name, s.campus
FROM ad.department AS d
LEFT JOIN ad.school AS s
ON (d.school_code = s.code);
In this SQL query, the FROM clause not only picks target rows, but also includes extra tables required to produce output columns. This conflation makes it difficult to determine business entities represented by each row of the output.
name | campus |
---|---|
Accounting | south |
Art History | old |
Astronomy | old |
Bioengineering | north |
… | … |
The HTSQL translation separates the row definition from the column selection. The linking is implicit, and correct. The encoded query can be read aloud as a verbal inquiry.
“For each department, return the department’s name and number of courses having more than 2 credit hours.”
This business inquiry returns department records, and for each record summarizes associated courses meeting a particular criteria.
SELECT d.name, COUNT(SELECT TRUE FROM ad.course AS c
WHERE c.department_code = d.code
AND c.credits > 2)
FROM ad.department AS d;
For this SQL encoding, the WHERE clause of the subquery conflates the linking of course to department with the filter criteria.
SELECT d.name, COUNT(c)
FROM ad.department AS d
LEFT JOIN ad.course AS c
ON (c.department_code = d.code
AND c.credits > 2)
GROUP BY d.name;
In a common optimization, the correlated subquery is replaced with a GROUP BY projection. This encoding further obfuscates the business inquiry by conflating in two ways — row/column and link/filter.
name | count(course?credits>2) |
---|---|
Accounting | 11 |
Art History | 20 |
Astronomy | 16 |
Bioengineering | 15 |
… | … |
The HTSQL translation keeps the filter criteria separate from linking and the row definition separate from output columns. The query adheres the form of the original business inquiry.
“How many departments by campus?”
This business inquiry asks for rows corresponding to each campus, and for each row, the number of correlated departments. In the schema, there isn’t a campus table, so we have to take distinct values of campus column from the school table. This operation is called projection.
SELECT s.campus, COUNT(d)
FROM ad.school AS s
LEFT JOIN ad.department AS d
ON (s.code = d.school_code)
WHERE s.campus IS NOT NULL
GROUP BY s.campus;
For this SQL encoding, the GROUP BY clause combines two operations: projection and evaluating the aggregate COUNT(). This conflation causes a reader of the query some effort determining what sort of rows are returned and how the aggregate is related to those rows.
campus | count(school.department) |
---|---|
north | 4 |
old | 13 |
south | 7 |
In the HTSQL query, we start with an explicit projection (the ^ operator), then we select correlated columns. This way, the aggregation is indicated separately as part of the column selector rather than being conflated with the row definition.
“For each department, return the department name and the number of offered 100’s, 200’s, 300’s and 400’s courses.”
In this business inquiry, we are asked to evaluate the same statistic across multiple ranges.
SELECT d.name,
COUNT(CASE WHEN c.no BETWEEN 100 AND 199 THEN TRUE END),
COUNT(CASE WHEN c.no BETWEEN 200 AND 299 THEN TRUE END),
COUNT(CASE WHEN c.no BETWEEN 300 AND 399 THEN TRUE END),
COUNT(CASE WHEN c.no BETWEEN 400 AND 499 THEN TRUE END)
FROM ad.department AS d
LEFT JOIN ad.course AS c
ON (c.department_code = d.code)
GROUP BY d.name;
This query is tedious to write and error prone to maintain since SQL provides no way to factor the repetitive expression COUNT(...).
name | count_courses(1) | count_courses(2) | count_courses(3) | count_courses(4) |
---|---|---|---|---|
Accounting | 1 | 2 | 2 | 3 |
Art History | 2 | 5 | 4 | 3 |
Astronomy | 5 | 6 | 6 | 5 |
Bioengineering | 5 | 5 | 4 | 2 |
… | … | … | … | … |
The HTSQL translation avoids this duplication by defining a calculated attribute count_courses($level) on the department table and then evaluating it for each course level.
“For each school with a degree program, return the school’s name, and the average number of high-credit (>3) courses its departments have.”
This business inquiry asks us to do the following:
SELECT s.name, o.avg_over_3
FROM ad.school AS s
JOIN ad.program AS p ON (p.school_code = s.code)
LEFT JOIN (
SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
FROM ad.department d
LEFT JOIN (
SELECT c.department_code, COUNT(c) AS over_3
FROM ad.course AS c WHERE c.credits > 3
GROUP BY c.department_code
) AS i ON (i.department_code = d.code)
GROUP BY d.school_code
) AS o ON (o.school_code = s.code)
GROUP BY s.name, o.avg_over_3;
Not only is this SQL encoding is hard to read, it took several passes to get right — without the COALESCE you get results that look correct, but aren’t.
name | avg(department.count(course?credits>3)) |
---|---|
School of Art and Design | 9.0 |
School of Business | 3.66666666667 |
College of Education | 5.5 |
School of Engineering | 3.25 |
… | … |
Each syntactic component of the HTSQL query is self-contained; when assembled, they form a cohesive translation of the business inquiry.
HTSQL was designed from the ground up as a self-serve reporting tool for data analysts. With HTSQL, the easy stuff is truly easy; and, the complex stuff is easy too.
In this section we introduce the fundamentals of HTSQL syntax and semantics. For a more incremental approach, please read the HTSQL Tutorial.
Literal values:
3.14159 | ‘Hello World!’ |
---|---|
3.14159 | Hello World! |
Algebraic expressions:
[-]/(3+4)*6
(3+4)*6 |
---|
42 |
Predicate expressions:
(7<13)&(1=0|1!=0) |
---|
true |
Sieve operator produces records satisfying the specified condition:
code | name | campus |
---|---|---|
bus | School of Business | south |
mus | School of Music & Dance | south |
Sorting operator reorders records:
code | name | campus |
---|---|---|
ph | Public Honorariums | — |
sc | School of Continuing Studies | — |
eng | School of Engineering | north |
art | School of Art and Design | old |
… | … | … |
Truncating operator takes a slice from the record sequence:
code | name | campus |
---|---|---|
art | School of Art and Design | old |
bus | School of Business | south |
Selection specifies output columns:
name | campus |
---|---|
School of Art and Design | old |
School of Business | south |
College of Education | old |
School of Engineering | north |
… | … |
Title decorator defines the title of an output column:
name | # of Dept |
---|---|
School of Art and Design | 2 |
School of Business | 3 |
College of Education | 2 |
School of Engineering | 4 |
… | … |
Calculated attributes factor out repeating expressions:
code | num_dept |
---|---|
eng | 4 |
la | 5 |
mus | 4 |
ns | 4 |
References carry over values across nested scopes:
title | credits |
---|---|
Financial Accounting | 5 |
Audit | 5 |
Accounting Internship | 6 |
History of Art Criticism I | 4 |
… | … |
Aggregates convert plural expressions to singular values.
Scalar aggregates:
count(department) |
---|
27 |
Nested aggregates:
avg(school.count(department)) |
---|
2.66666666667 |
Various aggregation operations:
name | count(course) | max(course.credits) | sum(course.credits) | avg(course.credits) |
---|---|---|---|---|
Accounting | 12 | 6 | 42 | 3.5 |
Art History | 20 | 6 | 70 | 3.5 |
Astronomy | 22 | 5 | 66 | 3.0 |
Bioengineering | 17 | 8 | 55 | 3.23529411765 |
… | … | … | … | … |
Projection operator returns distinct values. This example returns distinct campus values from the school table:
campus |
---|
north |
old |
south |
In the scope of the projection, school refers to all records from school table having the same value of campus attribute:
campus | count(school) |
---|---|
north | 1 |
old | 4 |
south | 2 |
Even though HTSQL provides automatic links inferred from foreign key constraints, arbitrary linking is also allowed:
code | name | school_code |
---|---|---|
arthis | Art History | art |
stdart | Studio Art | art |
acc | Accounting | bus |
econ | Economics | bus |
… | … | … |
This query uses a linking operator to replicate an automatic link:
code | name | school_code |
---|---|---|
arthis | Art History | art |
stdart | Studio Art | art |
acc | Accounting | bus |
econ | Economics | bus |
… | … | … |
Forking operator links a table to itself by the given expression:
name | campus |
---|---|
School of Arts and Humanities | old |
School of Music & Dance | south |
School of Natural Sciences | old |
This query returns schools with the number of departments above average among all schools in the same campus. Using a linking operator, this query could be written as:
name | campus |
---|---|
School of Arts and Humanities | old |
School of Music & Dance | south |
School of Natural Sciences | old |
We intend to add to HTSQL many more features in the future.
Currently, the HTSQL processor is not quite user friendly. In the next major release we will focus on filling these gaps:
HTSQL should not be limited to tabular output.
/school{name,
/program{title},
/department{name}}
This query is to generate a tree-shaped output: for each school, it produces the school name, a list of titles of associated programs, and a list of names of associated departments.
HTSQL should support OLAP cube operations.
/rollup(school^campus){campus, count(school.department)}
This query is to produce the number of departments per school’s campus followed by a total value for all campuses.
HTSQL should be able to construct hierarchies from parent-child relationships.
/program{title, /recurse(part_of){title}}
This query is to return programs together with a list of all dependent subprograms.