Fork me on GitHub
#off-topic
<
2020-05-26
>
Drew Verlee00:05:00

What's the state of the art for machine assisted code review? For clojure? In general? I feel like at the heart of it, were asking if two programs are equal. Which I know to runs into well researched issues. Im just not sure if there aren't improvements to reading GitHub code pull requests as the primary way if communicating change. Which seems to be the industry standard.

phronmophobic00:05:01

I’m not sure you mean by: > I feel like at the heart of it although sounds interesting. I think if you ask the question coming from the opposite direction, “what would be the optimal way of communicating changes?“, it’s clear there’s room for improvement. some interesting directions: 1. literate programming has also been an subject that I don’t think is fully explored http://www.literateprogramming.com/knuthweb.pdf 2. I can imagine a future where a pull request is accompanied by a sub program that illustrates the code change (maybe something like an ipython notebook). The notebook or sub-program would provide an interactive way to see the change in action in-line. 3. integrating other mediums other than writing(eg. images, sketches, diagrams, interactive examples, audio, etc) has always been awkward. it’s not that it’s impossible, but it’s annoying enough that it’s very uncommon

Drew Verlee01:05:33

I see what you mean. in my experience those are things you can do in addition to code changes. I'm brainstorming how tools dedicated to understand code changes could be better. Day to day I see most shops doing pull requests and looking at git diffs on GitHub. I know we can do more, can but can we do less. E.g can those diffs be more "clojure"aware.

Drew Verlee01:05:09

Far warning I'm most of the way through this growler, so the above is mostly musings

🍻 4
phronmophobic01:05:36

currently, code-diffs provide no other context besides the text itself, you could overlay higher level concepts onto the diffs. metrics like: • algorithmic complexity (when it can be figured out statically) • coupling to other code • cohesion • various measure of complexity • benchmarks • comparisons against diffs in the past (eg. 1. do certain parts of program that are in totally different files always get changed together 2. is this a problematic part of the code based that gets bug fixes frequently) • overlay of overall code flow https://docs.microsoft.com/en-us/visualstudio/code-quality/code-metrics-values?view=vs-2019

phronmophobic01:05:20

other than, that I’m not sure what other types of examples you might be considering

phronmophobic01:05:18

not sure i’m helping or just going on about something unrelated

Drew Verlee01:05:56

it's related. Good ideas, my original thought was around something like lamda islands diff but more all clojure code not just collections (I'm not actually sure what the limits are)

phronmophobic02:05:23

oh yea. structural diffing would definitively be a win over text diffing

martinklepsch11:05:21

@U0DJ4T5U1 I worked on a structural diffing GitHub action at some point but didn’t quite finish it https://github.com/martinklepsch/autochrome-action — I feel like it might be close to be useful but isn’t quite there just yet

martinklepsch11:05:41

There’s an example of a diff here, once this is all running a bit more reliably I’d love to overhaul the styling a bit https://storage.cloud.google.com/autochrome-service.appspot.com/diffs/Lv6Ad4PDs6U2cVTBHOjd-.html

martinklepsch11:05:23

that link currently doesn’t work because I’m uploading these files using the wrong API or so. But here’s a screenshot:

👀 4
vemv14:05:27

"machine-assisted code review" brings to mind the notion of domain-specific linters. Linters like Eastwood or Kondo are super useful, but generally they only cover well-understood, uncontroversial aspects. While code reviews are often a bit more opinionated and/or specific to a given project. As I see it, to date still a good chunk of a code review is merely the mechanical application of some rules of thumb. So, I'd love to see that automated. Sadly atm crafting your own linters is bit of a hacky journey.

Drew Verlee14:05:39

Thanks very fascinating. What's defined as a structural difference? I feel like this is a topic that i would be excited to start collecting information about. I assume its been explored in the past but given the current state of things, its not clear any satisfactor improvements ever took hold. I feel like a good place to start would be putting something together like Parinfer did for structural editting. https://github.com/shaunlebron/history-of-lisp-parens (not sure thats the most up to date record).

phronmophobic22:05:51

I’m not sure structural diffing has a common definition, but the way I was using it was as opposed to text diffing. For example, • text diffing: 4 characters were removed at line 12, offset 4 and then the characters “(+ foo 10)” were inserted at line 12, offset 4 • structural diffing: a list with the contents of + , foo, 10 was inserted in the 4th top level form at the associative path, [4 :foo 2] Currently, git diffs focus on text diffing rather than structural diffing. obviously, there would be better ways to visualize this rather than describing it.

jumar05:05:06

[disclaimer I work on CodeScene] CodeScene has an interesting approach to code analysis in general and analysing code changes / pull requests in particular. You can check (somewhat obsolete) info here: https://empear.com/blog/codescene-ci-cd-quality-gates/ We have the same thing in our cloud version integrated with GitHub PR checks - it may look like this: https://github.com/jumarko/poptavka/pull/1/checks?check_run_id=412967808 The basic idea that you can identify the most risky PRs and the most risky changes (files, functions, ...) in those PRs and focus your review efforts on them.

Drew Verlee00:05:12

Thanks @U06BE1L6T I'll get arrive around to reading this tommorow.

Balaji Sivaramgari02:05:43

Need your suggestions to integrate Clojure (Leiningen) with Blackduck scanning tool

Balaji Sivaramgari02:05:51

Do you have any suggestions for me to integrate Leiningen with Black Duck security scanning tool?

jacklombard05:05:48

Hey all, how does HTTP/CDN caching work for pathom/transit?

jacklombard05:05:11

Can it be done?

jacklombard05:05:45

If yes, is purging the cache straight forward?

orestis07:05:00

You said Netlify was easy, but I didn’t understand how easy. Clojure is installed, JVM is installed, what magic is this.

🎉 24
dominicm14:05:40

I'm looking for event queues which support something like kafka's retention period. The basic idea is that I want to be able to replay the last 30 days worth of events. I'm struggling to find the terminology to navigate the message-oriented world effectively. Preferably something with dead letter queues built in.

Cory14:05:48

NATS Streaming is basically the only alternative I'm aware of. The difference in terminology is basically commit log (retention with offset) vs queue (ack with timeout).

dominicm14:05:48

Hmm. Feels like I want a hybrid. I'm in a position where I can replay old messages and delete my last 30 days of data. But I really want to treat it like a queue otherwise.

dominicm14:05:17

Maybe I need to treat the input to the queue as something with a commit log, and then feed that into the queue.

Cory14:05:39

In Kafka, the topic will present itself and behave as a queue to all consumers in the same consumer unless the offset is reset.

dominicm15:05:08

When I say queue, I really mean along with all the features of a queue. Like dead letter queues.

dominicm15:05:33

That's really the only feature I'm missing. Well, kinda. I also want to distribute my messages across many consumers.

dominicm15:05:50

So I guess what I want is a queue with "last N days replay" built-in.

dominicm15:05:34

https://cloud.google.com/pubsub/docs/replay-overview looks like this might be supported by Google's pubsub

Cory15:05:33

Yeah, that's the closest solution I know of to what you're looking for.

Cory15:05:05

I think that's only supported up to 7 days though.

dominicm15:05:02

Yeah. It is.

Cory15:05:43

If you use the Connect framework with Kafka you get DLQ for free

dharrigan15:05:24

We use Kafka with a DLQ. It's just another topic that things get dumped into if an exception (business or otherwise) is raised.

dharrigan15:05:37

i.e., foo topic and foo-dlq topic

dominicm15:05:42

@U11EL3P9U how do events get onto the dlq? Does the application forward them?

Daniel Tan15:05:00

i think you can use datomic as something like this as well?

dominicm15:05:19

I don't think datomic would be usable for this use-case.

Cory15:05:20

Yeah. https://medium.com/@sannidhi.s.t/dead-letter-queues-dlqs-in-kafka-afb4b6835309 covers your three options in dealing with this in Kafka.

dominicm15:05:04

I feel deeply suspect about doing this in software from exceptions. I can't put my finger on why though. It just seems like something that could still get lost. e.g. if you shut down the server between the exception being thrown & requeuing the event onto the dlq.

dominicm15:05:23

Yeah, that's the problem. Data consistency for cases like that.

dharrigan15:05:40

I have a Thread/setDefaultUncaughtExceptionHandler registered that catches all then shoves into the topic. Of course, there could the case that the issue is with kafka itself, but I haven't observed that yet.

Cory15:05:19

If you haven't committed it'll restart with the same message. As long as you commit as a final step, this is fault tolerant. It is at risk for a dev doing it wrong, so I'd probably create a library layer for my devs to use personally.

dominicm15:05:50

ah, so there's a commit thing that handles the case I'm thinking of.

dominicm15:05:22

the example in the article doesn't seem to cover commits.

Cory15:05:50

step 1: do the thing or on-error write to dlq step 2: commit

dominicm15:05:18

Hmm, you could get dupes in the dlq that way

dominicm15:05:03

Feels wrong to be thinking at this level. I really want to buy a prepackaged solution & not design something custom here. I have enough things to be thinking about without spending time modelling all the different ways in which I haven't handled errors correctly.

dominicm15:05:49

This also gets trickier if I want retries as well. Then I have to write to a replay queue first...

Cory15:05:40

All of your concerns are perfectly sane and reasonable. There isn't anything that fits all of your requirements out of the box though, at least to my knowledge.

dominicm15:05:02

Yeah, makes sense. It's a shame.

dominicm15:05:32

I wonder if I can do something simple by going the other way, use a queue and write to it via S3.

dominicm15:05:40

Then to "replay" I just retrigger the s3 events somehow.

dominicm15:05:44

Something like that anyway.

dominicm15:05:37

https://docs.aws.amazon.com/sns/latest/dg/sns-fork-pipeline-as-subscriber.html#sns-fork-event-replay-pipeline looks like AWS has a build-your-own solution using this approach of having durability feeding into a queue (although using sns)

Cory17:05:28

oh nice find

dominicm18:05:30

@UCHV4JZ7A do you use it? I'd love to know how you're finding it :)

adamfeldman18:05:29

@U09LZR36F I haven’t used it in production 🙂. I’ve been working to see if there are any operationally-simpler alternatives to Pulsar that would fit my needs (I’m not sure if I should be wary of running a few Zookeeper clusters myself, but at the moment I’m wary despite having some Kubernetes experience). Pulsar excites me because it decouples message storage from the message brokers (Pulsar’s data is stored by Apache Bookkeeper’s distributed ledger, which can use a mix of block and object storage e.g. EBS + S3). Decoupling the brokers from the storage makes it easier to retain events indefinitely, since you aren’t limited to the maximum size of disks that can attached to a single VM. You can have Pulsar store a minimal amount of events on disk, and automatically offload all but the “working set” to S3/GCS, saving on disk storage costs. To me Pulsar just seems much more flexible and feature-ful than Kafka (having not personally used either in production). FYI: https://pulsar.apache.org/docs/en/concepts-messaging/#dead-letter-topic But Confluent seems to be undertaking a major project to evolve Kafka in the same direction (https://softwareengineeringdaily.com/2020/04/07/ksqldb-kafka-streaming-interface-with-michael-drogalis, https://www.confluent.io/blog/project-metamorphosis-elastic-kafka-clusters-in-confluent-cloud/). At first-glance, I’m assuming my use-case is different than yours: I’m building a greenfield app. I’ve been doing a research project for awhile to evaluate tools for designing entire multi-region systems that are built around an event log (a la https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying, https://www.youtube.com/watch?v=B1-gS0oEtYc). So far I’m planning to build around Crux, and I’m working on figuring out what datastores to use with it, with a focus on minimizing operational complexity. Crux typically uses Kafka as a core component, and I don’t want to use Kafka. (am currently evaluating utilizing GCP’s multi-region serverless datastores with Crux,, e.g. Google Cloud Firestore (in Datastore mode), Cloud Pub/Sub, etc).

dominicm19:05:19

Crux has been on my mind, but they've recently decoupled the document storage anyway.

dominicm19:05:30

We're Greenfield, I'm looking to process external events (eg slack messages, emails, whatever. Webhooks). Update an analytical set, and then not have to store it indefinitely, but hold onto it long enough to make corrections.

dominicm19:05:35

We could copy the crux idea with some work. Store references in kafka, store events in s3, and periodically expire s3

adamfeldman20:05:34

👍 Exactly, Crux recently gained an s3 document store implementation https://github.com/juxt/crux/tree/master/crux-s3. If you’re already using something that supports JDBC, Crux can also keep its transaction log in there instead of Kafka (assuming your event volume doesn’t require Kafka for the transaction log).

dominicm20:05:56

I guess I still don't want to build something to expire s3, stuff like that. Boring stuff is better.

dominicm20:05:44

https://ksqldb.io/ this looks like it's exactly what I'm trying to achieve.

👍 8
Daniel Tan01:05:33

isn’t this similar to datomic?

Daniel Tan01:05:50

stream of events, immutable transaction logs etc

dominicm06:05:33

Datomic doesn't give you access to that stream of events. Your stream of events is in terms of datomic operations, as opposed to in terms of my events.

lilactown15:05:57

that bytepack service is very exciting

lilactown15:05:21

I have a lot of ideas for software tools that I can’t justify the time/$$$ doing it purely open source, but would love a way to do a split open source license + $ license if you use it for work

p-himik15:05:53

To be honest, I still don't really understand why any service is needed here. Split licenses (or whatever the correct term is) have been around for quite some time. Nothing prevents you from using them by yourself. The only hiccup is that you will have to set up some way for the customers to pay you.

lilactown16:05:29

yeah that’s the thing that I think bytepack is offering

lilactown16:05:38

it’s an easy payment processor + distribution channel

p-himik16:05:34

So I guess it would be useful for something like Cursive or even IDEA. But not as useful for something like Highcharts. Right?

lilactown16:05:53

that conclusion doesn’t seem supported by the website at all

lilactown16:05:32

it’s an integrated package repository + payment processor and a bunch more

p-himik16:05:44

I don't understand. Both IDEA and Cursive have licensing servers and payment processors. So two services. Highcharts has just one. At least, it was that way the last time I checked. There were no license keys, there was just a license to use it commercially, that's all. And you could by it. So just one service for payment. Now if Highcharts decided to switch to Bytepack, it would still be one single service. And I see no benefit. They would still use CDN. The library would still be OS and would still be packaged via NPM. So what would Bytepack be able to improve here?

p-himik16:05:39

OK, I guess I was thinking about it from the perspective of a company and not individual.

lilactown16:05:23

I’m just guessing based on their landing page the benefits, but I am imagining it would help with things like • I want to have a lead time where paying customers get new features sooner than OSS users • I want the ability to build custom features for enterprise customers • I want to deploy a lib that isn’t OSS at all and not have to run my own npm/maven infra or whatever

lilactown16:05:39

but I think that the landing page motivates it enough even if those things aren’t there at first

lilactown16:05:12

yes I’m coming at this from the perspective of a single developer building a tool that businesses could pay me money for, and I think that the landing page is from that perspective as well

p-himik16:05:31

I see, thanks!

lilactown16:05:02

these problems are less burdensome if you are a larger company that can afford to handle more of this on their own ofc

lilactown15:05:58

that is, an easy way to do it - dealing with licensing servers, payment processors, 💫

dangercoder16:05:45

Has anyone tried to build a CMS with Clojure(script)? I haven't tried it yet but I'm guessing Clojure would be a great tool for that.

Prakash18:05:21

I have actually worked on a couple of CMS applications using Clojure backend and Clojurescript frontend. It was fun and much more manageable than a similar system I had worked on before that was written in Groovy/Java

Diego Bassani19:05:54

sdfsdfewf

🐈 32
🐕 20