Fork me on GitHub
#beginners
<
2019-07-14
>
ec09:07:07

In functional langs I find myself doing this pipeline-y things a lot. Which leads to computing everything before moving to next operation. Is there a func that can turn this into actual pipeline computation? Since everything is immutable when filter finds an item (map vector) can compute on it. In other langs I would probably stick all the computation to one function then loop over the initial sequence maybe fire up some threads.

dmaiocchi10:07:14

not sure if I understood but if you are looking at a way to do side-effects and evaluate the lazyness of maps, look at run!

dmaiocchi10:07:17

if you want a way to group this pipeline togheter, you should have look at transducer

dmaiocchi10:07:17

filter/map/reduce are lazy by dft so actually there is no computation/eval

dmaiocchi10:07:38

hope it helps!

ec10:07:27

Thx, but what I want is something like this: when filter finds an element that satisfies predicate, that element immediately starts getting processed by functions below that are running on a separate thread. Sort of every map filter inside ->> has their own queues and map/filter above sends an item to it.

ec11:07:40

idk if it makes sense but, pipelining functions over sequences concurrently

Crispin11:07:24

try this. Do an experiment. Make a pipeline of your functions. map to filter to map to processor funcs. Just put it all in one single execution block. Give it endless input data. Run the program. Open the process list with top command. Look at how much CPU its using. Look at how many cores it's running on.

Crispin11:07:39

Do you really need to farm it out with some execution task pool? If you are a beginner you may be trying to solve a problem that exists in the language you came from with the approach you would use over there. When you don't even need it here because its trivial to utilise all your power.

Crispin11:07:53

theres also pmap 🙂

ec11:07:42

Thx for the tip. To be honest I'm just asking these to learn more about concurrency, clojure way not that I really need it. pmap is nice but it would require me to wrap all the processing into one function then pmap it over initial sequence to get the pipelining effect, works tho.

joshkh13:07:54

each function in your pipeline iterates over the output of the previous function's output. with transducers you can (in theory) iterate just once.

joshkh13:07:37

i remember reading that pmap is good for transforming data so long as each transformation is independent of the data already transformed. map works on each individual bit of data in a collection where as reduce folds in the results -- you can map with reduce but not reduce with map.

Crispin14:07:51

I have an 8 core machine. Running heavy processing but just standard "single threaded" clojure code uses 8 cores at 100%.

Crispin14:07:24

what you may think you code single threaded, will not actually be single threaded when it runs

Crispin14:07:45

try something like (doall (map count (partition 2 (filter identity (map even? (map inc (range))))))) Appears a 'single thread'. But fire up top, and evaluate the line, and watch the JVM use every core running endless numbers through that pipeline

Crispin14:07:42

but try (loop (recur)) (imperative loop) and watch it bound to 100% of a single CPU core (ie actually is single threaded)

Crispin14:07:54

there are many ways to do it in clojure to aid the parallelism of it

Crispin14:07:12

but it may be sufficiently parallel already without you realising

ec21:07:14

@UKH2HDSQH didnt know that, will check it out thx!

jayesh-bhoot12:07:56

Hi. I was checking the clojure source code to look into what happens under the hood for (def x 1). From what I understand through Symbol.java and Var.java, Symbol doesn't bind to a Var, but Var binds itself to a Symbol: Var(Namespace ns, Symbol sym, Object root), where root is probably the Object with value 1. While an unbound Var (def x) is probably set through Var(Namespace ns, Symbol sym) where root goes undefined. Also, lookup via symbol, like typing x on REPL, occurs through public static Var find(Symbol nsQualifiedSym) which then goes on to call namespace.findInternedVar. Is this understanding correct?

jayesh-bhoot14:07:08

I read the article. While a lot of the content is advanced for me currently, I think it concurs with my basic understanding articulated above.

jayesh-bhoot14:07:18

Thanks for the link!

valtteri14:07:49

No problem! Your statements above sound correct to me. However I don't have very deep understanding about Clojure internals.

jayesh-bhoot14:07:50

No problem. Your pointer was helpful!

bartuka22:07:51

I am using toucan which is a very simple layer of indirection on top of jdbc. I like it, however, I'm facing some problems to test *stateful* functions that interact with my database. Let me describe the current scenario. I'm using midje leveraging the with-state-changes to perform tests on stateful fns this is an mock example of how the setup of midje is working:

(midje/with-state-changes [(before :facts (do
                                            (println "Connect with database")
                                            (db/set-default-db-connection! postgresql-test-config)
                                            (db/set-default-quoting-style! :postgresql)
                                            (println "Guarantee that my database is clear for this test facts")
                                            (reset-with-migrations)))
                           (after :facts (reset-with-migrations))]
  (midje/fact "Inserindo registros"
              (sut/register-legacy-project {:gitlabid 1
                                            :name "teste"
                                            :description "testando o midje"}) => {:gitlab 1 :message "success" :status true}))
basically, I'm using a test database that is *reset* at the beginning and the end of each fact sequence. This is working just fine, but that's something odd with this approach from my point of view. Isn't this too risky because people can forget and perform these tests in the production database? the most promising way would be to build manners to avoid production database to be touched or to forget about this current approach and try something different? I know I could use the provided function in midje to perform stubbing over the functions that interact with my database. Would this be a better approach?

Crispin02:07:44

How are you resetting the database? I know nothing about these libraries you are using, but iIf you were to start a db transaction at the beginning, do the test inside it, and then issue a rollback at the end, then it can't 'reset' your prod database if it was accidentally run on it.

seancorfield02:07:26

That's a good suggestion. I'd also suggest looking at environment libraries so that running code on a specific machine auto-selects the config, such as DB info, so you can't pick up the production DB on another tier -- and you can have your tests feature-flagged too so they cannot be run on production machines.

seancorfield02:07:19

We package up apps on CI so only source code ever gets onto production machines -- no tests -- and the production DB is never selected on any tier except production.

bartuka02:07:58

thnks for the suggestions guys. Today I'm using yogthos/config to handle my config.files. Similar to have it in env vars, but I have different profiles. But even this way, I am one var away from spinning this on prod profile

bartuka02:07:35

testing on databases were always something I haven't grasp yet. All the options seems bad to me 😕 @U04V70XH6 do you mind to share a bit about your CI pipeline?

seancorfield03:07:22

@UBSREKQ5Q What questions do you have?

seancorfield03:07:25

We have things set up so that the server/machine on which the code runs determines the configuration (via EDN files on each tier). So if your production credentials aren't in the EDN files, that process can't talk to the production system.

seancorfield03:07:04

We also have the idea of "tiers" baked in so specific machines/servers are identified as "safe" by default unless they're in those EDN files as something else.

seancorfield03:07:46

So various destructive processes can only run on machines identified as dev/CI.

seancorfield03:07:31

Of course, you can still screw up the EDN files on a given machine, but at least the defaults are fairly safe.