Fork me on GitHub
#sql
<
2020-12-17
>
zilti12:12:39

I ran into an error today. The query is as follows:

["WITH distinct_location AS(
  SELECT MIN(city.city_id) AS city_id, location::TEXT as location_str FROM choicelist.city
   WHERE city.name ~ ?
   GROUP BY location::TEXT
)
SELECT * FROM choicelist.city AS city
 INNER JOIN distinct_location USING (city_id)
 ORDER BY city.city_id;" "EU Only)"]
Which gives this very concerning error message: org.postgresql.util.PSQLException: ERROR: invalid regular expression: parentheses () not balanced This sounds an awful lot like sql injection...

synthomat12:12:52

indeed, the input somehow leaked into the SQL query; what library is it?

zilti12:12:30

It's generated by HugSQL

zilti12:12:40

And the query indeed works fine as long as the parameter doesn't have unbalanced parentheses

zilti12:12:55

So, the chain is HugSQL -> jdbc.next -> PostgreSQL JDBC

zilti12:12:33

Yes, and according to that, value parameters get deferred "to the underlying database library to perform SQL parameter binding to prevent SQL injection issues" which would be jdbc.next in this case

curtis.summers12:12:41

In order to debug this, can you provide the original HugSQL query and the call to the generated HugSQL function?

zilti12:12:29

Sure, the original HugSQL query is:

-- :name get-city-by-name :? :1
WITH distinct_location AS(
  SELECT MIN(city.city_id) AS city_id, location::TEXT as location_str FROM choicelist.city
   WHERE city.name ~ :name
   GROUP BY location::TEXT
)
SELECT * FROM choicelist.city AS city
 INNER JOIN distinct_location USING (city_id)
 ORDER BY city.city_id;
And the call: (get-city-by-name db {:name "EU Only)"})

zilti12:12:29

(or, for the sake of getting that intermediate output, (get-city-by-name-sqlvec {:name "EU Only)"}) )

curtis.summers12:12:40

So, the error is in the regular expression syntax expected by Postgresql. invalid regular expression: parentheses () not balanced

zilti12:12:36

Ok, so no injection going on then

zilti12:12:41

That's a relief

zilti12:12:18

I would've been very surprised by the way, this would be such a basic thing, pretty sure this is exactly what the parametrized queries are for in the first place

zilti12:12:54

Now to find out why PostgreSQL wants to treat it as a regex... The database field type is CITEXT, a case-insensitive text field

zilti12:12:53

Ah lol, of course, because of the ~ operator

zilti12:12:14

Alright, thank you very much for helping me figure things out!

curtis.summers12:12:40

you’re welcome!