clojure-switzerland 2020-12-14

David Pham03:12:51

@mpenet How dare you turn down on your hometown?! Especially during « the Escalade »... :) I hoped Facebook Libra would have brought something to Geneva. There is Sixsq and RTS using Clojure in Geneva. I think there are a lot in the biotech scene there though. HPC could be a cool field.

paul.legato05:12:55

@sam What sort of environment do you find 30m from the center of Zurich? By that I mean is it a small rural town, a contiguous suburb of Zurich, something else?

mpenet06:12:32

I forgot sixsq! Great bunch. I didn't know rts is using clojure now, is this just for a few projects (rtslab?) or is it somewhat official?

sam07:12:33

A rural village in wine country in my case. But you can live in small towns, larger towns (Winterthur), suburbs, you name it. A lot of people do, not only for the quiet, but because Zurich itself has expensive rent. Job-wise I really think it's one of the top places to live. Salaries are high, and taxes moderate, which you can truly benefit from if you manage your living costs accordingly (these can be very high).

David Pham08:12:11

@mpenet I think there are even comments on the official webpage of Clojure! https://clojure.org/stories/rts

David Pham08:12:02

@paul.legato 30 mins of Zurich can be the middle of litteraly nowhere (especially in the south of Zurich, where you end up in the middle of the mountains).

David Pham08:12:37

I am still trying to check who uses Clojure in ZH though.

sam09:12:31

@neo2551 sorry, not entirely sure if it was mentioned or you are aware, but http://deep-impact.ch does their non-AI backends in Clojure (I used to work there).

sam09:12:02

They have also done some work with ClojureScript, but the major investment is in the backends.

sam09:12:52

I hope to use Clojure once my company starts to grow 🙂

mpenet09:12:33

http://finity.ai as well (lausanne)

mpenet09:12:54

they are also behind http://paper.li, which is also mostly clojure

David Pham09:12:36

@sam It would be good to research the AI with Clojure haha, this is where I am mostly interested 🙂 @mpenet Great, I did not know them! Maybe once I have to go back to Geneva, I will think about them.

David Pham09:12:07

@sam Where do you work now?

sam09:12:37

I have my own company, doing consulting and whatever code the client needs, http://saydo.co (very customer-oriented for now)

David Pham09:12:21

Did you advise any company on using Datomic?

sam09:12:02

The companies I have been working for up to now have not been in the context of being able to switch to Clojure or any special DB (in fact, I’m staring at some PHP code right now 😅). We did use Datomic at DI.

sam09:12:05

I’m quite cautious about recommending major technical movement 🙂 So much value in existing code!

mpenet09:12:37

there is also someone from lambdaforge in geneva, @grischoun 🙂

mpenet09:12:43

so not datomic, but datahike

sam09:12:25

@mpenet interesting, you saw datahike use in production?

sam09:12:52

@mpenet @neo2551 btw, where do you work currently?

mpenet09:12:15

They mentioned to me they have production uses. One of which is a project for the Swedish gov

sam09:12:40

Wow, not bad

mpenet09:12:44

I live close to Malmö atm, Sweden

sam09:12:40

Ah ok! That’s a country I still need to visit 🙂 I nearly went to Linköping for an exchange year, but life came in the way.

mpenet09:12:07

I don't really like it here.

mpenet09:12:16

we're planning to move back to ch or germany

sam09:12:25

too cold?

mpenet09:12:46

no, other reasons. The weather is not great but it's not the worst of it

mpenet09:12:44

We just don't want to remain here for too long, we have small children and we want them to grow up somewhere else

sam09:12:08

Ah! Yes, that is always a source of big questions being asked. My kids are also a major reason I am currently living here.

David Pham09:12:38

I am working at Vontobel

David Pham09:12:43

@sam

David Pham09:12:04

Yeah, this is also a reason why I am reluctant to move as well. Education is great in CH.

sam09:12:13

Not that the education system is the best, but when I combine all factors, it is hard to find a better place for a family than where I am..

sam09:12:26

Yes, it is good! Just not the best 🙂

David Pham09:12:31

I am also heistating between crux and datahike.

David Pham09:12:39

Where do you think education is better?

sam09:12:52

@neo2551 Vontobel is doing Clojure?

mpenet09:12:56

ch is a great place wrt to education

David Pham09:12:01

EPFL was a fun right?

David Pham09:12:07

I do remember Satelite time

sam09:12:13

I think the higher education is incredible here, yes!

sam09:12:22

Totally, many hours spend there 😄

sam09:12:43

Basic education is however a bit rooted in the past.

sam09:12:05

It depends a lot on the school, and we are fairly happy with the school now.

David Pham09:12:18

Yes I agree, but I believe that parents have also a big role in education as well, at least in CH I had this feeling.

sam09:12:28

Absolutely. I think there is a mentality of heavy guidance, and not as much free learning as modern pedagogy seems to advise. But this is arguing at a quite high level.

David Pham09:12:51

So to answer the Clojure question: basically, I am a mathematician doing AI/ML/Statistics for finance. I was frustrated with our tools, so I started to build small webapps with python dash/R shiny, and my big boss asked me to make client facing quality web apps, so I told them to it was impossible with his short term deadline to use traditional tools, and he should make the bet to let me use whatever I wanted. I already knew Clojure, so doing ClojureScript was blessing for me 🙂

David Pham09:12:46

Since then, we destroyed most metrics of effectiveness / costs, so I can still use ClojureScript and now they want me to build the Kafka infra for our team, so I am still going to do it with Clojure.

David Pham09:12:09

It is so much faster to learn JVM tools with Clojure.

sam09:12:16

Wow, very cool!

David Pham09:12:37

Thanks, but nothing crazy.

David Pham09:12:47

Just leverage on all the knowledge and kindness of our community.

David Pham09:12:17

The trick is our requirements are fairly small compared to real world constraints.

David Pham09:12:56

My audience will use a chrome based browser, they will have decent computers, they will probably be in Switzerland or sufficiently near.

sam09:12:07

Yes but still, congrats on the success of bringing in that degree of innovation to an enterprise env. I have heard a couple of stories that ended differently in that regard.

David Pham09:12:09

And in banks, anything bigger than an Excel is big data.

sam09:12:21

True 😄

mpenet09:12:26

yeah about edu, sweden is full of montessori/waldorf, but they just use the name, it's rare to have teachers with actual education/certificates in these

David Pham09:12:37

I think I embraced talk from Rich for the 10 years of Clojure: I focused all the arguments on the artefacts and not the constructs: is what I am doing working for the definition of my boss? Can I change? Modify? Debug? Yes to all of this and at crazy speed.

David Pham09:12:57

Nothing, absolutely nothing can beat live reloading while maintaing the state of your app in UI.

mpenet10:12:13

I am sold on attributes at the bottom, datomic, spec & all, took me a while to accept/understand it but I really think he's onto something. I wish we could use datomic at work, but licensing makes it a bit tricky atm

David Pham10:12:45

Maybe you try datomic free?

mpenet10:12:53

datahike is still young, and there's always the hope that datomic on-prem gets an open-source version at some point

sam10:12:01

Especially when you got a lot of that state. @mpenet re education, if you want actual montessory/waldorf you need a private school here, but the ones I heard about are really good. There are a lot of private Steiner schools, which are less expensive.

David Pham10:12:11

My biggest problem with Datomic is I fear I would max out the the size for table for on prem.

mpenet10:12:27

sam: it's free in sweden, my daughter is in a montessori preschool

sam10:12:41

Yes, that is not available here!

mpenet10:12:43

in ch it would be a montain of money

mpenet10:12:54

but quality would be much better for sure too

mpenet10:12:32

@neo2551 yes, I think with datomic you have to think very early how to shard data(bases) efficiently

mpenet10:12:55

I guess it depends on the volumes and uses cases too

mpenet10:12:29

but for instance nubank have multiple db per user

David Pham10:12:43

The problem could be (IMHO) circumvented if we could have separated databases on single machine and not loosing too much time when creating the indices.

mpenet10:12:45

since you can query cross-db that's also ok'ish

David Pham10:12:58

yes, exactly.

David Pham10:12:14

but then Datomic does not recommend to have multiple DB per transactor...

sam10:12:21

Datomic is even harder to push for in enterprise envs than Clojure in my experience. Customers often have pretty orthodox but strong opinions about their data.

David Pham10:12:45

I argue that the data is stored in Postgres or Kafka

David Pham10:12:02

(then I use Datahike or Crux haha)

mpenet10:12:11

crux doesn't scale right now

refset19:12:14

Hi @mpenet 🙂 so when you say "scale" here I assume you only mean "unpredictable horizontal scaling of queries"? Otherwise I have to disagree that Crux does not scale when compared to ~similar alternatives. I think there are many dimensions of scaling other than "unpredictable horizontal scaling of queries" where Crux is arguably very scalable (compared with the mentioned alternatives). For instance: • sustained write throughput (vs. writing to distributed indexes which are inevitably slow) • total index size (RocksDB is very happy handling 10s of TBs and queries remain fast without the need for gigantic in-memory caches) • snapshot-based scaling avoids many cold-cache issues • handling of large unstructured maps (tested with 10-100 MBs) ^ a benchmark report will be published in Q1 to put all this in context. To mitigate the local-index requirement we have added a snapshotting feature that allows new nodes to come online and sync quickly (i.e. they only have to replay the recent transactions since the last snapshot). Large values are never stored in the indexes. And to further mitigate the main concern of index duplication, I have proven the concept of using Redis as a shared index store which can in principle support ~infinite nodes running in Lambdas. Such a Lamdba-based setup is only viable with Crux because the query engine is lazy and therefore has low memory requirements (i.e. it doesn't compute the clauses/joins one-at-a-time, which is a memory-intensive approach): https://github.com/crux-labs/crux-redis (the main trade-off is that queries will be a little slower due to network hops)

mpenet19:12:39

the statement was a bit too broad sure. I meant size of full dataset/index. Sure you can have boxes with 10tb and put your index on them, but that's not very convenient for a few reasons (costs, operational among others) and you'll still be bound by limits in storage size per box.

mpenet19:12:15

but my crux knowledge is quite outdated, so maybe I missing the big picture

mpenet19:12:15

write throughput surely is fast, I guess it's essentially benchmarking kafka, write/read roundtrip is probably a different story, even then I expect it's fast enough in most cases

mpenet19:12:16

but the one reason I am not considering/using crux right now is because of the sharding issue, it feels like a pain point for long running clusters

mpenet19:12:21

about redis: interesting POC, tho redis is also problematic in its own ways, it's also one more dependency to manage/operate

mpenet19:12:54

the "doesn't scale" was only for the index issue. Another problem with that would be that spreading write load on indexes is not possible, If you have a lot of writes, surely kafka will keep up but all your nodes will also be busy writing index data. And in turn that would impact querying on all these nodes.

mpenet19:12:49

crux is way more mature than some of the other solutions I mentioned tho, both in tooling and the dev behind it, major props on that. I prefer the datomic design personally, but that's also a matter of different use cases.

mpenet19:12:17

@U899JBRPF I am actually wondering if you are working on sharding/partitioning at all, other than via redis?

mpenet19:12:54

it's a tough nut to crack but there are plenty existing solutions to get inspiration from

refset19:12:16

Thanks for the responses! It's certainly a subtle and complex discussion. We have it on our backlog to try Crux out with this "cloud-native" fork of RocksDB (by the Rockset team of ex-RocksDB Facebookers): https://github.com/rockset/rocksdb-cloud this is an insightful post https://rockset.com/blog/remote-compactions-in-rocksdb-cloud/ there are a few presentations on it also. Clearly Rockset are using this fork heavily themselves in their cloud offering and I think it would be a good bet for us, if it works like we think it will. This would go a very long way to solving all the issues you mention without needing to change the nature of the sorted-kv index structure of Crux itself. > Sure you can have boxes with 10tb and put your index on them, but that's not very convenient for a few reasons (costs, operational among others) and you'll still be bound by limits in storage size per box. Cost is definitely an important dimension for analysis and critique, and I believe was the biggest motivator for Rockset's fork. I think if "limits in storage size per box" are truly an issue then Crux is probably not the right fit for the problem at hand (e.g. EBS volumes go up to 16TB, and i3en has 60TB of local SSD). > I am actually wondering if you are working on sharding/partitioning at all We are certainly thinking about it often. Did you see Håkan's short segment about the roadmap in our Re:Clojure video? https://youtu.be/JkZfQZGLPTA?t=1325

mpenet19:12:13

Interesting. No I didn't have the time to watch the videos yet and I missed part of the qa when it was live

👍 3

mpenet10:12:17

all your data has to fit on every node

David Pham10:12:22

really?

mpenet10:12:29

you get replication sure, but no way to scale out for now

David Pham10:12:31

I thought they solved it with SQL stored

mpenet10:12:34

mpenet10:12:41

not as far as I know

David Pham10:12:04

What about S3 data store?

sam10:12:08

I gotta head back into my code, was good talking @neo2551@mpenet! 👋

mpenet10:12:10

dunno about this

David Pham10:12:19

@sam Thanks to you! See you again.

sam10:12:43

See you!

David Pham10:12:58

@mpenet https://opencrux.com/reference/20.09-1.12.1/kafka.html They say you can use S3 for the document store. So I guess it does scale?

David Pham10:12:27

I might be wrong though, I am still wondering what is the best solution.

mpenet10:12:56

indices have to be stored in lmdb or rocksdb

mpenet10:12:07

and iirc these need to be on every node

David Pham10:12:16

ok make sense.

refset19:12:00

Hi @neo2551 I wrote a response to the earlier comment that you may want to skim 🙂 https://clojurians.slack.com/archives/C08QBF7B5/p1607972774115600?thread_ts=1607940371.097800&cid=C08QBF7B5

mpenet10:12:29

and indices are big

David Pham10:12:31

yes true.

David Pham10:12:50

Did you check Datomic Cloud then? Does it solve that problem?

mpenet10:12:52

personally I prefer the design of datomic/datahike

mpenet10:12:03

too expensive, not open enough 🙂

David Pham10:12:07

haha agreed.

David Pham10:12:17

I can't even try it at work 😞

mpenet10:12:04

datahike is promissing, we ll see

David Pham10:12:26

yep. On the other hand at 5k per year per transoactor (less if you don't update them)

David Pham10:12:33

I think it could still be acceptable.

mpenet10:12:01

not always

mpenet10:12:03

I work for a cloud provider: they wouldn't agree to pay to have something hosted on aws, plus there's the question of data protection given it's an US entity

David Pham10:12:19

I was thinking of Datomic on Prem

David Pham10:12:20

sorry

mpenet10:12:43

sure, I guess we could, but it's not in my hands 🙂

grischoun12:12:36

Hi @mpenet, hi guys! Good to hear from swiss clojurers. As @mpenet mentioned, i live in Geneva and work with lambdaforge.

David Pham12:12:23

So cool! So you are from the team that works on datahike?

alpox12:12:21

Hi all! Its nice to see some activity in here. Im from close to Winterthur and working for a Startup in Zürich. Im interested in Clojure but sadly didnt find an opportunity to work with it as of yet. Its good to hear that there are some companies in Switzerland using it!

2020-12-14

Channels