Fork me on GitHub
#pathom
<
2022-04-26
>
sheluchin13:04:37

I'm looking at ways to manage my ETL pipeline. A common solution from libraries like https://domino-clj.github.io/ and https://github.com/commsor/titanoboa is to define the parts of the pipeline in a dependency graph and follow that graph to perform the steps of the pipeline. I think there is some potential to use Pathom in this way. We already define a data dependency graph when using it. On the other hand, I know Pathom adds some overhead and this might make it difficult for a very large number of records, as ETL often includes, but there is that ::pco/final that maybe be of some help. Does anyone know if using Pathom for ETL like this has been attempted somewhere?

wilkerlucio14:04:05

I've considered that, I believe its possible to leverage the planner in Pathom to have the schematics on what to run, but for the runner it would do something very different, maybe generating spark statements or something

sheluchin20:04:41

I'm not familiar with spark statements but I'll read up about it. Indeed, maybe just using the planner could be a good step in the right direction.

λustin f(n)16:04:59

I am updating from pathom3 version 2022.02.01-1-alpha to 2022.03.17-alpha, 2022.04.20-alpha , but it is causing some existing mutation unit tests to fail with the message

ERROR in (create-comment-test) (planner.cljc:474)
Uncaught exception, not in assertion.
expected: nil
  actual: java.lang.AssertionError: Assert failed: Tried to remove node 24 that still contains references pointing to it. Move
      the run-next references from the pointer nodes before removing it. Also check if
      parent is branch and trying to merge.
(if node-parents (every? (fn* [p1__44092#] (not= node-id (get-node graph p1__44092# :com.wsscode.pathom3.connect.planner/run-next))) node-parents) true)
Is this a bug, or is it simply the new planner catching something odd we were doing from before that I could fix? How could I start digging in deeper to debug this?

λustin f(n)16:04:06

Ah. I should just use a more updated version, preferrably non-alpha 😅

λustin f(n)16:04:03

Ah. Everything is still alpha. This same error happens for me on version 2022.04.20-alpha

wilkerlucio19:04:16

hello @U7Y7601B2, yep, all alpha still 😅 can you give a me a repro? its possibly a regression, but need an example to check

wilkerlucio21:04:56

(anyway, this error should never happen, its presence means there is something wrong in the planner algorithm)

λustin f(n)04:04:46

Ah. Sounds worth extracting out a simple repro example from our system then.

👍 1
λustin f(n)18:04:05

Ok, I narrowed it down as much as I could.

λustin f(n)18:04:27

(ns repro
  (:require [clojure.test :refer :all]
            [com.wsscode.pathom3.connect.indexes :as pci]
            [com.wsscode.pathom3.interface.eql :as p.eql]
            [com.wsscode.pathom3.connect.operation :as pco]
            [com.wsscode.pathom3.connect.built-in.resolvers :as pbir]))

(pco/defresolver get-comment []
  {:comment/author {:user/id "user-id"}})

(def aliases (pbir/equivalence-resolver :comment/author :user))

(pco/defresolver user-resolver
  [{id :user/id}]
  {:user/avatar-filename "avatar-filename"})

(pco/defresolver avatar
  [{user             :user}]
  {::pco/input [{:user [:user/avatar-filename]}]}
  {:user/avatar user})

(pco/defresolver user-object-resolver
  []
  {::pco/output [{:user [:user/id]}]}
  {:user {:user/id "user-id"}})

(deftest repro-test
  (is
    (thrown?
      AssertionError
      (p.eql/process
        (pci/register [get-comment
                       aliases
                       user-resolver
                       user-object-resolver
                       avatar])
        {}
        [{:user
          [:user/avatar]}]))))

wilkerlucio19:04:02

thanks, can you please open an issue in Pathom 3 repo (https://github.com/wilkerlucio/pathom3/issues)?

👍 1
wilkerlucio19:04:47

@U7Y7601B2 I think I understand already the bug, its a situation where a node must be removed, but the algorithm wasn't expecting the node to have parents in this case, and your repro demonstrate a case where it does happen

wilkerlucio19:04:16

Pathom is trying to remove the node author->user-alias because it notices that this path can't fulfill the nested requirements

wilkerlucio19:04:02

but a node can't have parents when its removed, which is correct, so I think the way to go here is to remove the whole ancestor chain as well, this would translate in the user-object-resolver being the only valid path in this scenario

λustin f(n)19:04:22

Ah. Well I finished submitting the case for you 😅 https://github.com/wilkerlucio/pathom3/issues/136

wilkerlucio19:04:17

@U7Y7601B2 pushed a fix to main, can you try so I can confirm the fix?

λustin f(n)20:04:28

I am unfamiliar with how to bring in dependancies from an active branch rather than released :mvn/version could you point me to some docs or show me what a deps.edn import would look like?

wilkerlucio20:04:41

sure, one sec

wilkerlucio20:04:27

you can use this to import Pathom 3:

com.wsscode/pathom3        {:git/url ""
                              :sha     "28956c7f5d6dd259effc09567829c096932714a7"}

λustin f(n)20:04:02

Oof, I was so close to doing it right. Thanks!

λustin f(n)20:04:01

@U066U8JQJ That fixed it. Unfortunately for me, there are 2 other unit tests of ours that fail on updating pathom3 that are apparently unrelated. Ones with less obvious errors thrown in my face. Looks like I need to make more repros

wilkerlucio20:04:49

thanks for bringing those, happy to keep debugging and tacking those with you

nigel20:04:04

Hi. I’m just getting started with pathom and have a pretty basic question. I have the following code:

(def env
  (pci/register
   [(pbir/constantly-resolver :products
                              [{:product/id 1 :product/slug "test1"}
                               {:product/id 2 :product/slug "test2"}])]))
(p.eql/process env [{:products [:product/slug]}])
;; => #{:products [#:product{:slug "test1"} #:product{:slug "test2"}]}
(p.eql/process env [{:products [:product/id]}])
;; => #:{:products [#:product{:id 1} #:product{:id 2}]}
(p.eql/process env [{[:product/id 1] [:product/slug]}])
;; => {[:product/id 1] {}}
The first two queries above work as expected, but the last query doesn’t. In order to process
(p.eql/process env [{[:product/id 1] [:product/slug]}])
Do I need to create another resolver to match the product/id ?

λustin f(n)20:04:38

Yes, you need a resolver that explicitly knows how to look up a product/slug from a product/id. In your example this could be a resolver that takes :product/id and :products , then filters the products for one that matches.

🙌 1
λustin f(n)21:04:04

The resolver you have provides :products , but for all pathom knows there is no relation between :product/id and :product/slug (Other than them both being present in :products). They could be completely random data, or magic labels, or whatever.

👍 1
nigel21:04:46

Excellent, thanks.