This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-12-29
Channels
- # adventofcode (13)
- # announcements (2)
- # asami (59)
- # babashka (6)
- # beginners (273)
- # calva (18)
- # cider (3)
- # cljs-dev (3)
- # clojure (84)
- # clojure-estonia (1)
- # clojure-europe (2)
- # clojure-france (11)
- # clojure-nl (3)
- # clojure-taiwan (2)
- # clojure-uk (23)
- # clojurescript (7)
- # code-reviews (321)
- # conjure (4)
- # data-science (1)
- # depstar (6)
- # fulcro (37)
- # helix (20)
- # juxt (3)
- # keechma (3)
- # leiningen (3)
- # malli (7)
- # nrepl (1)
- # off-topic (20)
- # pathom (25)
- # re-frame (30)
- # reveal (5)
- # shadow-cljs (25)
- # sql (15)
- # tools-deps (4)
- # uncomplicate (2)
- # xtdb (3)
Did I here do a good job : https://github.com/RoelofWobben/clojure_ground_up/blob/main/src/ground_up/chapter7.clj
looks pretty good. a few thoughts:
• it seems like most-duis
could be written in terms of most-prevalent
• most-prevalent
receives a file and immediately calls load-json
. I would separate the data processing from the data loading. it's very common to want to load data from different sources and reuse the data processing functions
• calculate_prevalence
should probably be calculate-prevalence
• sort-by
requires loading the full data set into memory. since you only need the 10 most prevalent values, most-prevalent
could be re-written to keep at most 10 values in memory so that a larger than memory data set could be processed
I think rewriting it so that the memory consumption is O(1) rather than O(n) is a good exercise.
if no solution comes to mind, I would try first figuring out 1. how to find the maximum element of a lazy sequence with O(1) memory 2. how to find the 2 largest elements ... 3. finally, how to find the n largest elements ...
the same techniques would apply in just about any language
are you familiar with reduce
?
no problem. have a good night
I can answer a few questions
the map
in most-prevalent
is fine
maybe just try try to write a function, find-max
that takes a sequence as an argument and returns the maximum value
with respect to most-prevalent
, the goal is to replace:
(sort-by :prevalance)
(take-last 10)
eventually with something like:
(map :prevalence)
(find-largest-elements 10)
max
would work, but the next goal is to try and find-max
to be able to return the two largest elements rather than just the largest
looks great
constant memory, and if given a lazy sequence,can process larger than memory data sets
now, can you improve the function to find the two largest elements?
`
(defn top-two [[big1 big2 :as acc] x]
(cond
(> x big1) [x big1]
(> x big2) [big1 x]
:else acc))
(defn find-two-max [sequence]
(reduce top-two [0 0] sequencel))
looks like there's a bug
this seems to work
(ns clojure.examples.hello
(:gen-class))
(defn top-two [[big1 big2 :as acc] x]
(cond
(> x big1) [x big1]
(> x big2) [big1 x]
:else acc))
(defn find-two-max [sequence]
(reduce top-two [0 0] sequence))
(print (find-two-max '(1 2 3 )))
try (top-two [1 2] 2)
the answer should be [2 2]
(defn top-two [[big1 big2 :as acc] x]
(cond
(= x big2) [big2 big2 ]
(> x big1) [x big1]
(> x big2) [big1 x]
:else acc))
(defn find-two-max [sequence]
(reduce top-two [0 0] sequence))
(print (top-two [1 2] 2))
gives now the right answerthere's still an issue:
(top-two [1 3] 2) ;; [2 1]
(defn top-two [[big1 big2 :as acc] x]
(cond
(> x big1) [x big1]
(> x big2) [big1 x]
:else acc))
the structure is fine, but you should double check the returnswhich cond
branch is the test case taking?
ok, now how would you extend find-two-max
to find-n-max
?
but then it is possible that too much or too little numbers are in it because it is not fixed
that's ok. sometimes it takes a little bit of time
to come up with an answer
especially if it's not similar to other problems you've worked on
get the feeling im close but miss a few pieces or do not know how the pieces fit together
it might make it easier if acc
was sorted
I was referring to acc
in top-two
I guess top-two
would need to be top-n
so acc
is the collection we're accumulating into that will contain the top n items. if acc
was sorted, is there a way to check if a new value, x
, should be added, replace a value, or not added?
when I have (1 2)
and the number 3
I can compare it to the 2 and keep the 2 and add the 3
when I have (1 5 9)
and the number 8
I can compare it to the last in the collection and that is not true so nothing is changed
then I can compare it to the second last one and that is true so replace that number with the given number
when I have (1 5 7)
and a 8
I can compare it again with the last one that is true so I replace it with the 8
there often built in data structures that will remain sorted as you add values. I can't think of one for clojure off the top of my head though, but there's probably something if I looked hard enough
I was thinking of something simple like: (->> (conj acc x) (sort) (take n))
it's not the most efficient implementation, but since acc
is small, it's probably fine
oke, but it could be very very big when someone wanted to hold the highest 50 or 100 or 1000 items
right. it depends on the use case
if I wanted to prepare for that use case, I would probably look for a data structure on https://www.clojure-toolbox.com/ (under data structures) to see if there's already an efficient data structure for that purpose
I would also investigate sorted-map
and then run benchmarks
hmm. not what I expect when trying this in a online repl
(ns clojure.examples.hello
(:gen-class))
(defn top-n [[collection :as acc] x n]
(->> (conj acc x) (sort) (take n)))
(print (top-n [1 2 ] 2 2))
and when i do this :
(ns clojure.examples.hello
(:gen-class))
(defn top-n [[collection :as acc] x n]
(->> (conj acc x) (sort) (take n)))
(print (top-n [1 2 ] 2 1))
I get 1
where I expect 2
(ns clojure.examples.hello
(:gen-class))
(defn top-n [[collection :as acc] x n]
(->> (conj acc x)
(sort)
(reverse)
(take n)
(reverse)
))
(print (top-n [1 5 9 ] 8 2))
gives me a (8 9)
:thumbsup:
have a good night 😄
yep, it's definitely in a unique position compared to other languages
I think this chapter : https://aphyr.com/posts/352-clojure-from-the-ground-up-polymorphism
you wrote this :
I would separate the data processing from the data loading. it's very common to want to load data from different sources and reuse the data processing functions
but I do it , file is only the filename. the loading and parsing to json is done in the load-json method
right, but most-prevalent
is doing both loading and processing
and there's no way to do just the data processing
so if you had a different data source, you'd have to copy most of the code from most-prevanent
so you were talking about this part
(map (fn [county]
{:county (fips (fips-code county)),
:prevalance (calculate_prevalance county field-name)
:report-count (field-name county)
:population (:county_population county)
}))
(sort-by :prevalance)
and I thought you were talkimng about this part
(->> file
load-json
(defn top-n [[collection :as acc] x n]
(->> (conj acc x)
(sort)
(reverse)
(take n)
(reverse)
))
(defn most-prevalent
"Given a JSON filename of UCR crime data for a particular year, finds the
counties with the most DUIs."
[file field-name]
(->> file
load-json
(map (fn [county]
{:county (fips (fips-code county)),
:prevalance (calculate_prevalance county field-name)
:report-count (field-name county)
:population (:county_population county)
}))
(map :prevalance)
(top-n 10)
(reverse)))
(clojure.pprint/print-table (most-prevalent "2008.json" :auto_thefts))
Error:
; Execution error (ArityException) at ground-up.chapter7/most-prevalent (form-init17204185006522729914.clj:70).
; Wrong number of args (2) passed to: ground-up.chapter7/top-n
you need a reduce
somewhere
you need to have a find-top-n
that uses reduce and top-n
and you'll want find-top-n
to have the collection as the last argument so that it works with ->>
in most-prevalent
happy new year over there!
i've still got 9 more hours of 2020 😕
think im given up on this : Thought about this :
(def find-top-n [n]
(reduce ((top-n n))))
`
(defn most-prevalent
"Given a JSON filename of UCR crime data for a particular year, finds the
counties with the most DUIs."
[file field-name]
(->> file
load-json
(map (fn [county]
{:county (fips (fips-code county)),
:prevalance (calculate_prevalance county field-name)
:report-count (field-name county)
:population (:county_population county)
}))
(map :prevalance)
(find-top-n 10)
(reverse)))
(def find-top-n [n coll]
(reduce (fn [acc x] (top-n acc x n)
coll)))
I haven't tested, but I think something like that should work
; Execution error (ArityException) at ground-up.chapter7/find-top-n (form-init4053567356004708660.clj:58).
; Wrong number of args (1) passed to: clojure.core/reduce
that's what I get for not typing into a repl
(def find-top-n [n coll]
(reduce (fn [acc x] (top-n acc x n))
coll))
do those parens match?
i'm blind without my repl
no a wierd error message :
; Execution error (UnsupportedOperationException) at ground-up.chapter7/top-n (form-init704639222239290302.clj:49).
; nth not supported on this type: Float
ok, have a good night
I can answer some questions
what's the code look like now?
(ns ground-up.chapter7 (:require [cheshire.core :as json] [clojure.pprint]))
(defn load-json
"Given a filename, reads a JSON file and returns it, parsed, with keywords."
[file]
(json/parse-string (slurp file) true))
(def fips
"A map of FIPS codes to their county names."
(->> "fips.json"
load-json
:table
:rows
(into {})))
(defn fips-code
"Given a county (a map with :fips_state_code and :fips_county_code keys),
returns the five-digit FIPS code for the county, as a string."
[county]
(str (:fips_state_code county) (:fips_county_code county)))
(defn calculate_prevalance
[county field-name]
( if (zero? (:county_population county))
0
(float (/ (field-name county) (:county_population county)))))
(defn most-duis
"Given a JSON filename of UCR crime data for a particular year, finds the
counties with the most DUIs."
[file]
(->> file
load-json
(map (fn [county]
{:county (fips (fips-code county)),
:prevalance (calculate_prevalance county :driving_under_influence)
:report-count (:driving_under_influence county)
:population (:county_population county )
}))
(sort-by :prevalance)
(take-last 10)
(reverse)))
(clojure.pprint/print-table (most-duis "2008.json"))
(defn top-n [[acc] x n]
(->> (conj acc x)
(sort)
(reverse)
(take n)
(reverse)
))
(defn find-top-n [n coll]
(reduce (fn [acc x] (top-n acc x n))
coll))
(defn most-prevalent
"Given a JSON filename of UCR crime data for a particular year, finds the
counties with the most DUIs."
[file field-name]
(->> file
load-json
(map (fn [county]
{:county (fips (fips-code county)),
:prevalance (calculate_prevalance county field-name)
:report-count (field-name county)
:population (:county_population county)
}))
(map :prevalance)
(find-top-n 10)
(reverse)))
(clojure.pprint/print-table (most-prevalent "2008.json" :auto_thefts))
and it produces this error :
; Execution error (UnsupportedOperationException) at ground-up.chapter7/top-n (form-init704639222239290302.clj:49).
; nth not supported on this type: Float
if you type *e
it should print the full stack trace
lots of sequence functions use nth
under the hood
oh, I think I see it:
(defn top-n [[acc] x n]
(->> (conj acc x)
(sort)
(reverse)
(take n)
(reverse)
))
[acc]
should just be acc
tricky
nope, then I get another error :
; Execution error (ClassCastException) at ground-up.chapter7/top-n (form-init3018563272993570646.clj:50).
; class java.lang.Float cannot be cast to class clojure.lang.IPersistentCollection (java.lang.Float is in module java.base of loader 'bootstrap'; clojure.lang.IPersistentCollection is in unnamed module of loader 'app')
it seems like the issue is in top-n
. do you have a guess as to what might be causing the error?
this bug is also kinda tricky
it's because reduce
isn't called with a an initial state
(defn find-top-n [n coll]
(reduce initial-val
(fn [acc x] (top-n acc x n))
coll))
with initial-val
being the starting value for the reduce state
do you know what the initial val should be?
what type should acc
be?
it should probably just be []
`; Execution error (ArityException) at ground-up.chapter7/find-top-n (form-init3018563272993570646.clj:58).
; Wrong number of args (2) passed to: clojure.lang.PersistentVector
oh whoops
args in the wrong order
(defn find-top-n [n coll]
(reduce (fn [acc x] (top-n acc x n))
[]
coll))
; Execution error (IllegalArgumentException) at ground-up.chapter7/eval22508 (form-init3018563272993570646.clj:80).
; Don't know how to create ISeq from: java.lang.Float
(ns ground-up.chapter7 (:require [cheshire.core :as json] [clojure.pprint]))
(defn load-json
"Given a filename, reads a JSON file and returns it, parsed, with keywords."
[file]
(json/parse-string (slurp file) true))
(def fips
"A map of FIPS codes to their county names."
(->> "fips.json"
load-json
:table
:rows
(into {})))
(defn fips-code
"Given a county (a map with :fips_state_code and :fips_county_code keys),
returns the five-digit FIPS code for the county, as a string."
[county]
(str (:fips_state_code county) (:fips_county_code county)))
(defn calculate_prevalance
[county field-name]
( if (zero? (:county_population county))
0
(float (/ (field-name county) (:county_population county)))))
(defn most-duis
"Given a JSON filename of UCR crime data for a particular year, finds the
counties with the most DUIs."
[file]
(->> file
load-json
(map (fn [county]
{:county (fips (fips-code county)),
:prevalance (calculate_prevalance county :driving_under_influence)
:report-count (:driving_under_influence county)
:population (:county_population county )
}))
(sort-by :prevalance)
(take-last 10)
(reverse)))
(clojure.pprint/print-table (most-duis "2008.json"))
(defn top-n [acc x n]
(->> (conj acc x)
(sort)
(reverse)
(take n)
(reverse)
))
(defn find-top-n [n coll]
(reduce (fn [acc x] (top-n acc x n))
[]
coll))
(defn most-prevalent
"Given a JSON filename of UCR crime data for a particular year, finds the
counties with the most DUIs."
[file field-name]
(->> file
load-json
(map (fn [county]
{:county (fips (fips-code county)),
:prevalance (calculate_prevalance county field-name)
:report-count (field-name county)
:population (:county_population county)
}))
(map :prevalance)
(find-top-n 10)
(reverse)))
(clojure.pprint/print-table (most-prevalent "2008.json" :auto_thefts))
it's not showing a line number for the error
is there a way to do something like eval-buffer?
usually that will fix the exception not showing a proper file and line number
I use emacs
line 80 might be it
what's line 80?
(0.0051150895 0.004174161 0.0036036037 0.0030321407 0.00243309 0.002319513 0.0018750526 0.0016433854 0.0015932024 0.0015475558)
nil
success!?
print-table wants this :
Prints a collection of maps in a textual table. Prints table headings
ah, I get it
the extra info is discarded before find-top-n is called
do you see where?
right, because find-top-n expects a list of comparables (like a list list of numbers)
jthen I see this :
; Execution error (ClassCastException) at java.util.TimSort/countRunAndMakeAscending (TimSort.java:355).
; class clojure.lang.PersistentArrayMap cannot be cast to class java.lang.Comparable (clojure.lang.PersistentArrayMap is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
you'll need to update find-top-n
and top-n
to accept a key function to sort with
check out the docs for sort-by
and see if you can think of a way to update find-top-n
and top-n
to also accept a keyfn
that's a good start
then I see this :
({:county TX, Kenedy, :prevalance 0.0051150895, :report-count 2, :population 391} {:county NM, Grant, :prevalance 0.004174161, :report-count 123, :population 29467} {:county OR, Gilliam, :prevalance 0.0036036037, :report-count 6, :population 1665} {:county OR, Sherman, :prevalance 0.0030321407, :report-count 5, :population 1649} {:county TX, Hudspeth, :prevalance 0.00243309, :report-count 8, :population 3288} {:county TX, Hall, :prevalance 0.002319513, :report-count 8, :population 3449} {:county MO, Jackson, :prevalance 0.0018750526, :report-count 1514, :population 807444} {:county TX, Refugio, :prevalance 0.0016433854, :report-count 12, :population 7302} {:county AK, Prince of Wales-Outer Ketchikan, :prevalance 0.0015932024, :report-count 3, :population 1883} {:county MD, Baltimore city, :prevalance 0.0015475558, :report-count 982, :population 634549})
nil
| :county | :prevalance | :report-count | :population |
|--------------+--------------+---------------+-------------|
| AL, Autauga | 2.861667E-4 | 15 | 52417 |
| AL, Baldwin | 1.8142852E-4 | 32 | 176378 |
| AL, Barbour | 1.4354926E-4 | 4 | 27865 |
| AL, Bibb | 9.219989E-5 | 2 | 21692 |
| AL, Blount | 1.9146418E-4 | 11 | 57452 |
| AL, Bullock | 0.0 | 0 | 10705 |
| AL, Butler | 3.988036E-4 | 8 | 20060 |
| AL, Calhoun | 5.771285E-4 | 67 | 116092 |
| AL, Chambers | 4.6220067E-4 | 16 | 34617 |
| AL, Cherokee | 8.107998E-5 | 2 | 24667 |
when I do this :
(defn top-n [acc x n]
(->> (conj acc x)
(sort-by :prevelance)
(reverse)
(take n)
(reverse)
))
this list is different:
({:county TX, Kenedy, :prevalance 0.0051150895, :report-count 2, :population 391} {:county NM, Grant, :prevalance 0.004174161, :report-count 123, :population 29467} {:county OR, Gilliam, :prevalance 0.0036036037, :report-count 6, :population 1665} {:county OR, Sherman, :prevalance 0.0030321407, :report-count 5, :population 1649} {:county TX, Hudspeth, :prevalance 0.00243309, :report-count 8, :population 3288} {:county TX, Hall, :prevalance 0.002319513, :report-count 8, :population 3449} {:county MO, Jackson, :prevalance 0.0018750526, :report-count 1514, :population 807444} {:county TX, Refugio, :prevalance 0.0016433854, :report-count 12, :population 7302} {:county AK, Prince of Wales-Outer Ketchikan, :prevalance 0.0015932024, :report-count 3, :population 1883} {:county MD, Baltimore city, :prevalance 0.0015475558, :report-count 982, :population 634549})
and is actually sorted on prevalance
where's the other list coming from?
the list sorted by names?
| :county | :prevalance | :report-count | :population |
|-------------------------------------+--------------+---------------+-------------|
| TX, Kenedy | 0.0051150895 | 2 | 391 |
| NM, Grant | 0.004174161 | 123 | 29467 |
| OR, Gilliam | 0.0036036037 | 6 | 1665 |
| OR, Sherman | 0.0030321407 | 5 | 1649 |
| TX, Hudspeth | 0.00243309 | 8 | 3288 |
| TX, Hall | 0.002319513 | 8 | 3449 |
| MO, Jackson | 0.0018750526 | 1514 | 807444 |
| TX, Refugio | 0.0016433854 | 12 | 7302 |
| AK, Prince of Wales-Outer Ketchikan | 0.0015932024 | 3 | 1883 |
| MD, Baltimore city | 0.0015475558 | 982 | 634549 |
nil
:thumbsup:
this one does it right :
(defn most-duis
"Given a JSON filename of UCR crime data for a particular year, finds the
counties with the most DUIs."
[file]
(->> file
load-json
(map (fn [county]
{:county (fips (fips-code county)),
:prevalance (calculate_prevalance county :driving_under_influence)
:report-count (:driving_under_influence county)
:population (:county_population county )
}))
(sort-by :prevalance)
(take-last 10)
(reverse)))
and this one not :
(defn most-prevalent
"Given a JSON filename of UCR crime data for a particular year, finds the
counties with the most DUIs."
[file field-name]
(->> file
load-json
(map (fn [county]
{:county (fips (fips-code county)),
:prevalance (calculate_prevalance county field-name)
:report-count (field-name county)
:population (:county_population county)
}))
(find-top-n 10)
(reverse)
))
what makes you think 8.70322E-4
is not a float?
I think there are just fewer responses because it's around the holidays
I think it's just printing it out differently
how it's formatted depends on how you're printing it and what formatter is being used
if you care how it's formatted, you should explicitly format it
¯\(ツ)/¯
> (type (float 8.70322E-4))
java.lang.Float
> (float 8.70322E-4)
8.70322E-4