Fork me on GitHub
#sql
<
2017-05-12
>
josh.freckleton19:05:01

a|x
_|_
1|A
1|B

a|y
_|_
1|C
1|D

SELECT * FROM table1 LEFT JOIN table 2 ON table1.a = table2.a;

a| x | y
_|___|___
1| A | C
1| B | C
1| A | D
1| B | D
I have a left-join implemented in clojure code, and it doesn't join properly when multiple matches are in the left table, is the above code right, in that joining those tables should yield 4 rows?

josh.freckleton19:05:56

this seems to work in SQL, but my clojure conversion is struggling http://sqlfiddle.com/#!9/a18b6c/1

josh.freckleton19:05:42

I've implemented what I think is called the merge join which doesn't allow multiple matches in the left table

josh.freckleton19:05:56

since it iterates, the right table on matches

tanzoniteblack19:05:33

@josh.freckleton it looks like it’s doing it correctly; in terms of that for every row in table1 there will be a resulting column for every row in table2 where table1.a = table2.a; from what you showed there’s 2 rows in table1 with a = 1 and 2 rows in table2 with a = 1, so you’d expect 2*2 = 4 rows for every possible combination of table1.x and table2.y where a = 1

tanzoniteblack19:05:31

if you only want a single row returned, then you probably want to do a group by table1.a and use an aggregate function to define which x and y values should be returned per every unique value for a

josh.freckleton19:05:45

thanks @tanzoniteblack I do want 4 rows returned the details of the merge left join have you iterate down both lists simultaneously, and everytime there's a match, iterate to the next right-table-row. But, if there are multiple rows in the left table with the same key, well those don't get matched multiple times

josh.freckleton19:05:03

so my code should return 4 rows, but the second row in table 1 is getting left out

josh.freckleton19:05:13

is there an elegant pseudo code for left joining two tables (and one can assume that the rows come in sorted)

tanzoniteblack19:05:17

I don’t think I understand what you’re wanting, or what’s going wrong at least

josh.freckleton19:05:47

I just want to recreate a SQL-esque left join in clojure

josh.freckleton19:05:56

and my implementation is faulty

josh.freckleton19:05:10

I know why it's faulty, just not how to fix it (elegantly)

tanzoniteblack19:05:49

so you’re for something that would allow you to do:

(let [table1 [{:a 1 :x "a"}
              {:a 1 :x "b"}]
      table2 [{:a 1 :y "c"}
              {:a 1 :y "d"}]]
  (some-join-fn table1 table2 :a))
;; ==>
[{:a 1 :x "a" :y "c"}
 {:a 1 :x "b" :y "c"}
 {:a 1 :x "a" :y "d"}
 {:a 1 :x "b" :y "d"}]

josh.freckleton19:05:46

here's what I have, but notice, the cond (= 0 c) is when the keys match and then it iterates the right table, but imagine:

(left-join [{:a :a.1 :x :x.1} {:a :a.1 :x :x.2}]
              [{:a :a.1 :b :b.1}]
              {:a :a})
the right table doesn't have the extra rows, the left does, so my iterating of the right rows will miss out on left rows that should have been captured

josh.freckleton19:05:13

I could put some gnarly loopy thing in that cond for (= 0 c), but, that'd be ugly

josh.freckleton19:05:05

the keys to match on are the 3rd input to that fn, so {:left-tables-key :right-tables-key}

tanzoniteblack19:05:18

(let [table1 [{:a 1 :x "a"}
              {:a 1 :x "b"}]
      table2 [{:a 1 :y "c"}
              {:a 1 :y "d"}]
      test-key :a]
  (reduce (fn [coll m]
            (concat coll (map #(merge m %)
                              (filter #(= (get table2 test-key)
                                          (get m test-key))
                                      table2))))
          []
          table1))

tanzoniteblack19:05:55

it’s not efficient, since it will look over every item in table2 for every item in table1, but it does give you the answer you want

josh.freckleton20:05:09

wow, thanks for drafting that up!

josh.freckleton20:05:34

re efficiency, I am passing throw (up to) millions of rows...

josh.freckleton20:05:48

so I was hoping I could take advantage of the sorted nature of the incoming data

josh.freckleton20:05:57

by walking both lists at once

tanzoniteblack20:05:21

you probably could; if you re-wrote the reduce to be based off of loop/recur

hiredman20:05:37

there is a join function in clojure.set, but I ma not sure how well it would work

tanzoniteblack20:05:09

(loop [current-1 (first table1)
       table1 (rest table1)
       table2 ...
       out-coll []]
  (if (= (:key (first table2))
         (:key current-1))
    (recur current-1
           table1
           (rest table2)
           (conj out-coll(merge (first table2) current-1)))
    (recur (first table1)
           (rest table1)
           table2
           out-coll)))

tanzoniteblack20:05:52

something like that would cause you to only walk through the rows once if you know for certain they’re already sorted by the same key

tanzoniteblack20:05:05

(you might have to play with that to get it right…I didn’t actually test that one)

tanzoniteblack20:05:36

like…I forgot to end the loop

tanzoniteblack20:05:59

actually…my loop/recur wouldn’t work because you need to use the table2 multiple times

josh.freckleton20:05:51

@hiredman that gist looks inspiring i'm not familiar with that reducing lib, but it has monoids and stuff? cool! so I'm guessing this is even more general than using lists, i'll have to dissect more

hiredman20:05:36

the gist is 3 years old, I am not sure if I would bother with the reducers stuff today, the main thing I wanted from the reducers library was parallel folding, which I think that gist may not actually do a good job with.

hiredman20:05:17

but modula how the collection processing is actually done, I think it is a pretty classic hash join

josh.freckleton20:05:17

looking at the wiki, it has an advance left and advance right, that's the part that my implementation above needs

josh.freckleton20:05:23

I just have advance right