This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-05-12
Channels
- # bangalore-clj (1)
- # beginners (28)
- # boot (33)
- # chestnut (3)
- # cider (35)
- # cljs-dev (64)
- # cljsrn (16)
- # clojure (95)
- # clojure-android (6)
- # clojure-austin (1)
- # clojure-italy (5)
- # clojure-korea (1)
- # clojure-russia (55)
- # clojure-sg (1)
- # clojure-spec (25)
- # clojure-uk (57)
- # clojurescript (120)
- # code-reviews (7)
- # community-development (2)
- # core-async (3)
- # cursive (6)
- # data-science (3)
- # datascript (10)
- # datomic (12)
- # devcards (1)
- # emacs (9)
- # gsoc (7)
- # hoplon (18)
- # lumo (2)
- # off-topic (10)
- # om (24)
- # onyx (17)
- # pedestal (46)
- # powderkeg (1)
- # protorepl (7)
- # re-frame (31)
- # ring-swagger (34)
- # spacemacs (10)
- # specter (9)
- # sql (39)
- # unrepl (9)
- # untangled (3)
- # utah-clojurians (1)
a|x
_|_
1|A
1|B
a|y
_|_
1|C
1|D
SELECT * FROM table1 LEFT JOIN table 2 ON table1.a = table2.a;
a| x | y
_|___|___
1| A | C
1| B | C
1| A | D
1| B | D
I have a left-join
implemented in clojure code, and it doesn't join properly when multiple matches are in the left table,
is the above code right, in that joining those tables should yield 4 rows?this seems to work in SQL, but my clojure conversion is struggling http://sqlfiddle.com/#!9/a18b6c/1
I've implemented what I think is called the merge join
which doesn't allow multiple matches in the left table
since it iterates, the right table on matches
@josh.freckleton it looks like it’s doing it correctly
; in terms of that for every row in table1 there will be a resulting column for every row in table2 where table1.a = table2.a
; from what you showed there’s 2 rows in table1 with a = 1
and 2 rows in table2 with a = 1
, so you’d expect 2*2 = 4
rows for every possible combination of table1.x
and table2.y
where a = 1
if you only want a single row returned, then you probably want to do a group by table1.a
and use an aggregate function to define which x
and y
values should be returned per every unique value for a
thanks @tanzoniteblack I do want 4 rows returned the details of the merge left join have you iterate down both lists simultaneously, and everytime there's a match, iterate to the next right-table-row. But, if there are multiple rows in the left table with the same key, well those don't get matched multiple times
so my code should return 4 rows, but the second row in table 1 is getting left out
is there an elegant pseudo code for left joining two tables (and one can assume that the rows come in sorted)
I don’t think I understand what you’re wanting, or what’s going wrong at least
I just want to recreate a SQL-esque left join in clojure
and my implementation is faulty
I know why it's faulty, just not how to fix it (elegantly)
ohhhhhh
so you’re for something that would allow you to do:
(let [table1 [{:a 1 :x "a"}
{:a 1 :x "b"}]
table2 [{:a 1 :y "c"}
{:a 1 :y "d"}]]
(some-join-fn table1 table2 :a))
;; ==>
[{:a 1 :x "a" :y "c"}
{:a 1 :x "b" :y "c"}
{:a 1 :x "a" :y "d"}
{:a 1 :x "b" :y "d"}]
perfect!
here's what I have, but notice, the cond (= 0 c)
is when the keys match
and then it iterates the right table, but imagine:
(left-join [{:a :a.1 :x :x.1} {:a :a.1 :x :x.2}]
[{:a :a.1 :b :b.1}]
{:a :a})
the right table doesn't have the extra rows, the left does, so my iterating of the right rows will miss out on left rows that should have been capturedI could put some gnarly loopy thing in that cond
for (= 0 c)
, but, that'd be ugly
the keys to match on are the 3rd input to that fn, so {:left-tables-key :right-tables-key}
(let [table1 [{:a 1 :x "a"}
{:a 1 :x "b"}]
table2 [{:a 1 :y "c"}
{:a 1 :y "d"}]
test-key :a]
(reduce (fn [coll m]
(concat coll (map #(merge m %)
(filter #(= (get table2 test-key)
(get m test-key))
table2))))
[]
table1))
it’s not efficient, since it will look over every item in table2 for every item in table1, but it does give you the answer you want
wow, thanks for drafting that up!
re efficiency, I am passing throw (up to) millions of rows...
so I was hoping I could take advantage of the sorted nature of the incoming data
by walking both lists at once
you probably could; if you re-wrote the reduce
to be based off of loop/recur
(loop [current-1 (first table1)
table1 (rest table1)
table2 ...
out-coll []]
(if (= (:key (first table2))
(:key current-1))
(recur current-1
table1
(rest table2)
(conj out-coll(merge (first table2) current-1)))
(recur (first table1)
(rest table1)
table2
out-coll)))
something like that would cause you to only walk through the rows once if you know for certain they’re already sorted by the same key
(you might have to play with that to get it right…I didn’t actually test that one)
like…I forgot to end the loop
actually…my loop/recur wouldn’t work because you need to use the table2 multiple times
@hiredman that gist looks inspiring i'm not familiar with that reducing lib, but it has monoids and stuff? cool! so I'm guessing this is even more general than using lists, i'll have to dissect more
the gist is 3 years old, I am not sure if I would bother with the reducers stuff today, the main thing I wanted from the reducers library was parallel folding, which I think that gist may not actually do a good job with.
but modula how the collection processing is actually done, I think it is a pretty classic hash join
looking at the wiki, it has an advance left
and advance right
, that's the part that my implementation above needs
I just have advance right