SQL Pipe gives headaches but comes with benefits
14 March 2025
Databricks Runtime 16.2 introduced a new SQL pipeline syntax that reverse the traditional SQL query structure. This feature lets chain query operations using a pipe operator (|>), constructing queries step by step. While powerful, it initially feels awkward for those who have written SQL the same way for years.
In this article, let me explore why the new syntax can be a headache for experienced SQL users, why it is still worth using (especially for dynamic queries), and how to leverage it in an object-oriented way with Python.
Classic SQL

Pipe SQL

The “Headache” Factor
SQL users may be surprised when they encounter the pipeline syntax; instead of the familiar order, SELECT FROM WHERE a pipeline query starts with the FROM clause and chains the other parts after it. In other words, you write queries in the execution order (FROM, then WHERE, then SELECT, etc.), not in the traditional textual order.
For example, a standard SQL query:
SELECT
*
FROM
users
WHERE age > 30;
It would be written in pipe form as:
FROM users
|> WHERE age > 30;
For anyone used to writing the SELECT clause first, this feels unnatural.
One Reddit user humorously dubbed it the “weird inverted Yoda form” of SQL. Just as Yoda in Star Wars speaks with reversed grammar, the pipeline syntax forces us to write the query backward compared to everyday SQL habits. It does make logical sense — databases internally process the FROM and WHERE before SELECT, so the pipeline mirrors that execution order.
However, it conflicts with how humans have been trained to write SQL for decades, hence the initial headache. Long-time SQL developers might need to unlearn old habits to get comfortable writing FROM first and SELECT last.
Why is it worthwhile despite the headache?
You gain flexibility in constructing queries on the fly by breaking a query into chained logical pieces.
-- 1. Define variables for parts of the query
DECLARE base_table STRING DEFAULT 'users';
DECLARE age_filter STRING DEFAULT 'age > 30';
-- 2. Build the pipeline query string using concatenation
DECLARE pipeline_sql STRING;
SET VARIABLE pipeline_sql =
'FROM ' || base_table ||
' |> WHERE ' || age_filter ||
' |> SELECT name, age';
-- You can inspect the constructed query
SELECT pipeline_sql; -- FROM users |> WHERE age > 30 |> SELECT name, age
-- 3. Execute the dynamically constructed SQL query
EXECUTE IMMEDIATE pipeline_sql;
In the code above, we first declare some SQL variables: base_table holds the table name (in this case, ‘’sales’’), and region_filter holds the filter condition.
We then construct a variable pipeline_sql by concatenating the parts into one string. The result of the concatenation would be a query string like:
FROM users
|> WHERE age > 30
|> SELECT name, age
Object-oriented query building in Python
Instead of manually concatenating parts of a SQL string (and worrying about spaces, commas, and where to insert clauses), you can represent the query as an object in your code and produce the pipeline SQL from that object.
Consider using a simple Python class to build an SQL pipe query. Each part of the query can be an attribute, and the class can have a method to create and execute the pipeline string.
Here is an example:
%python
class QueryBuilder:
def __init__(self, table):
self.table = table # Table name for FROM clause
self.filter = None # WHERE condition (optional)
self.limit = None # LIMIT value (optional)
def set_filter(self, condition):
"""Specify a WHERE condition."""
self.filter = condition
def set_limit(self, n):
"""Specify a LIMIT for the query."""
self.limit = n
def build_pipeline_sql(self):
"""Assemble the SQL pipeline string based on the set parts."""
query = f"FROM {self.table}"
if self.filter:
query += f" |> WHERE {self.filter}"
if self.limit:
query += f" |> LIMIT {self.limit}"
return query
def execute(self):
"""Execute the built query using a Spark session (Databricks)."""
sql_query = self.build_pipeline_sql()
print(f"Executing: {sql_query}")
return spark.sql(sql_query)
# Example usage:
qb = QueryBuilder("users")
qb.set_filter("age > 30")
qb.set_limit(10)
result_df = qb.execute()
result_df.show()
'''
Executing: FROM users |> WHERE age > 30 |> LIMIT 10
+-----+---+
| name|age|
+-----+---+
|Alice| 31|
|Cindy| 35|
+-----+---+
'''
Conclusion
Databricks’s new SQL pipeline syntax initially feels awkward, particularly for experienced SQL users who used to write queries in the classic style.
The FROM first order can be a headache. It’s like learning to speak SQL in reverse. However, once the initial discomfort decreases, the benefits become clear.
Pipeline syntax enables more modular, dynamic query construction, which is incredibly useful where queries may need to be generated dynamically.
Good news ! You can download the code I have used in this article here.