Fork me on GitHub
#beginners
<
2019-12-19
>
FiVo12:12:24

Data structure question. What is the idiomatic way in clojure to have a map where I can do binary search over the keys?

FiVo12:12:56

Essentially I want the following 3 operations:

FiVo12:12:14

1. Insert Key/value's in log n

FiVo12:12:37

2. Find a key less than some other in log n

FiVo12:12:42

3 Once found, access for example the 3 largest keys less than my search key in O(1)

FiVo13:12:03

https://github.com/clojure/data.avl seems to be doing most of what I need.

andy.fingerhut18:12:29

@finn.volkel I am pretty sure a sorted-map or sorted-set should do all of those things. Implemented using red-black trees under the hood

💯 4
👍 4
andy.fingerhut18:12:24

IIRC, some combination of subseq and rseq and other sequence operations should do what you want, although it does not keep "pointers" into the ordered sequence of keys, so starting with a key and then finding smaller/larger ones will probably always be O(log N) for the first, then O(1) for each one afterwards as long as you keep going.

FiVo22:12:59

@andy.fingerhut thanks, I wasn't aware of the subseq and that it runs in O(log n). So you are saying something like (first (rsubseq (sorted-map .....) <= x))) in only O(log n) even if the subsequence returned by rsubseq might be quite large?

FiVo23:12:44

If so, is this documented somewhere because it's not clear to me.

andy.fingerhut23:12:56

It is O(log n) to find the first element, then I believe it is O(1) per element that you actually traverse after that. If you traverse m elements, it cannot be faster than O(log n) + linear in m

andy.fingerhut23:12:47

Clojure rarely provides core functions that do not have good performance characteristics, in general. Very little of the built-in documentation in doc strings gives big-oh notation running time -- a few will mention constant or logarithmic time, but not many.

andy.fingerhut23:12:59

A "sequence" in Clojure is some abstract thing that promises to implement first and rest. Many of them are implemented lazily, so do not realize long things unless you traverse them.

andy.fingerhut02:12:31

@finn.volkel ^^^ not sure if you saw that, so pinging you via @ in case not. Sorry for the bother if you have seen it already.

FiVo08:12:01

@andy.fingerhut thanks, that cleared it up. I thought that if it's lazy it's mentioned explicitly. So in absence of the mention lazy I always assumed the sequence is evaluated eagerly.

andy.fingerhut08:12:25

If a function returns a Clojure map, vector, or set, it is fully evaluated before returning -- there are no such lazy objects in Clojure. Sequences are not always returned as lazy, but usually are. Sometimes the docs say so explicitly, but apparently sometimes not.

andy.fingerhut08:12:15

If you look at the definition of subseq e.g. by doing (source subseq) in a REPL session, you will see that the final expressions returned are from (take-while ...) , and take-while is documented to return a lazy sequence

noisesmith18:12:17

I wonder if you could use a spliterator on the underlying structure via interop

Emanuele20:12:10

it's more of a Cursive question rather than a Clojure question. If I write tests about another clj file, do I need to reload the file in the REPL before running the test?

seancorfield20:12:57

@dierre_spam In general, yes. Not sure whether Cursive has any automated reload-on-save logic. But it's a really good habit to get used to always evaluating each top-level form you change as you are editing. It's going to be just a hot key to press and after a while it will become second nature. Edit, eval, edit, eval. Then your REPL always reflects the state of your code.

seancorfield20:12:11

And if you evaluate forms as you edit them, then you don't even need to save the file each time. I often edit, eval, edit, eval, switch to the test file, run tests (another hot key), switch back to source file, edit, eval, ... save.

seancorfield20:12:31

You'll hear people talk about a REPL-Driven Development workflow but what they really mean is a workflow where your REPL is always up-to-date: you eval every single small change as you make it, so you can always test your code as you work. It's often helpful to have "Rich Comment Forms" in your source code that contain expressions that exercise your functions so you can easily eval those and check your results:

(comment
  (my-func 123))

seancorfield20:12:36

The ideal workflow is never typing directly into the REPL -- type into source/test files and evaluate code from there instead. And use (comment ,,,) forms for "scratch"/test code so you don't accidentally leave code exploration at the top-level of a file.

Michael J Dorian21:12:58

Hey friends, I find that when I write code I end up with a big stack of declare statements at the tops of my files to avoid problems with forward declaration. Is this expected, or am I maybe organizing things oddly?

bfabry21:12:01

generally the declaration order required by clj means that clj files are organised from highest level at the bottom to lowest level at the top

bfabry21:12:39

I'm guessing you're striving to order it the opposite way (as recommended in clean code etc) and ending up with this

bfabry21:12:57

I'd recommend just embracing reading clojure from bottom to top

Michael J Dorian21:12:20

Ok. I guess I organize by subject, so I have a collection of functions for each subdomain in my project. I don't seem to be able to write those collections entirely independent from one another, a function that was written to interpret json data might also be useful for the game ai

bfabry22:12:27

I'd expect game ai and json reading to be in different namespaces

Michael J Dorian22:12:05

I'll give that a try then, thanks simple_smile the project is getting a lot bigger every week so it's starting to make more sense to split stuff up

bfabry22:12:47

your "subject" organisation sounds like the perfect divider for namespaces btw

Michael J Dorian22:12:41

and then within those, you have kind of smaller, building block funcs at the top? I guess if I start wanting to share those between files I can put them in their own namespace

bfabry22:12:02

many advantages to that. source control, extra names, when you use them from a different file you get to call them by something like subject/function-name etc. "Namespaces are one honking great idea—let's do more of those"

😂 4
RodgerDodger21:12:31

Hi all. Question: I am trying to map a 1d vector to a 2d vector so that [1 2 3 4] and [[5] [6] [7] [8]] produces [[1 5] [2 6] [3 7] [4 8]]. Any suggestions?

noisesmith21:12:58

(cmd)(user=> (map cons  [1 2 3 4] [[5] [6] [7] [8]])
((1 5) (2 6) (3 7) (4 8))
(ins)user=> (map (comp vec cons)  [1 2 3 4] [[5] [6] [7] [8]]) ; in case vector is mandatory
([1 5] [2 6] [3 7] [4 8])

RodgerDodger22:12:42

@noisesmith Perfect! Thank you, sir!

noisesmith22:12:56

map takes any number of cols (they become the respective args on each call of the f passed), and cons adds a new element at the front of some other coll to make a list

Gulli22:12:19

Is there something similar to Javadoc for Clojure? Main benefit being your editor can give your internal references and info

bfabry22:12:49

any form that takes a doc-string that doc-string should show up in most editors etc

seancorfield22:12:28

If you want to produce good documentation for others who use your (open source) code, take a look at http://cljdoc.org which combines your docstrings with additional docs in the GitHub repo -- and also supports Markdown in docstrings and linking to other functions and docs etc.

👍 8
Gulli22:12:20

Well, at this stage it's more so I can understand my own code :face_with_cowboy_hat:

jumar13:12:10

Also for example cider can jump into definitions of symbols in docstrings at least if you wrap them in backticks

FiVo22:12:59

@andy.fingerhut thanks, I wasn't aware of the subseq and that it runs in O(log n). So you are saying something like (first (rsubseq (sorted-map .....) <= x))) in only O(log n) even if the subsequence returned by rsubseq might be quite large?

Brandon Olivier23:12:15

I’m reading about transducers from this site: https://labs.uswitch.com/transducers-from-the-ground-up-the-practice/ And it seems to read like transducers are applied in the “wrong” direction when using comp. I double checked in the repl, and their output seems correct. Why aren’t they applied right-to-left?

Alex Miller (Clojure team)00:12:15

It’s a stack, kind of like middleware

noisesmith23:12:01

each transducer takes another transducer as an arg, returning a new transducer

noisesmith23:12:09

a different arity is used for the actual transform

Brandon Olivier23:12:37

(comp (map inc) (filter odd?))

noisesmith23:12:54

filter odd is passed the context

Brandon Olivier23:12:54

that seems to inc first, then filter

Brandon Olivier23:12:03

which seems backwards to me

noisesmith23:12:09

then map inc is passed the result of filter odd acting on the context

noisesmith23:12:20

this returns a transducer which when used, maps then filters

noisesmith23:12:28

see my doc link above

noisesmith23:12:08

each one takes the previous transform as an arg, and decides whether to call it on the data it sees

noisesmith23:12:00

> Composition of the transformer runs right-to-left but builds a transformation stack that runs left-to-right (filtering happens before mapping in this example).

noisesmith23:12:14

the middleware pattern might be helpful to think of here if you are familiar

Brandon Olivier23:12:39

okay, that’s helping me

Brandon Olivier23:12:52

I’m not totally confident about how it fits in my brain, but I’m still reading some of these docs

Brandon Olivier23:12:03

at least it’s documented somewhere and I’m not totally crazy 😅

noisesmith23:12:36

from the "creating transducers" section:

(fn [rf]
  (fn ([] ...)
      ([result] ...)
      ([result input] ...)))
rf is the "continuation" or "target" - the transducing context, which wraps the previously composed transducers if any

noisesmith23:12:12

the transducer should call rf on its input (unless you are filtering, then you skp calling rf for this data)

noisesmith23:12:26

and of course you might also transform the input before calling rf on it

Brandon Olivier23:12:35

Okay, I think I got it now

Brandon Olivier23:12:48

Thanks for the help @noisesmith! 🙂

noisesmith23:12:59

glad I could assist