Fork me on GitHub
#xtdb
<
2020-04-09
>
kongeor19:04:22

Hi! I'm on an older version of crux (19.07-1.1.1-alpha) using standalone config. Can I switch to latest version with the jdbc backend without losing data? Is there a way to migrate? Thanks!

refset21:04:30

Hello! :) Yes you can definitely migrate without losing data. There have been a couple of changes since then that you might need to account for though: 1) eviction no longer works at a document level 2) put and delete ops with a "valid time end" argument behave differently If you weren't using either of those features then you should be okay to migrate without doing any manual transformations. The simplest method is to read the successful transactions out into a file via tx-log and then submit them back into the new jdbc node. Worth noting that this won't preserve transaction times. I don't have a code snippet handy right now but I'll give it a go tomorrow!

refset21:04:24

You can validate it by checking the attribute-stats output matches across both versions

kongeor12:04:45

hey! thanks for the help. It's exclusively put txs so it's fine. It's also a toy project which just happened to run for quite a while now and I would be sad losing all it had collected.

kongeor12:04:49

what I did was

kongeor12:04:52

(let [data (crux/tx-log sys (crux/new-tx-log-context sys) nil true)]
      (clojure.pprint/pprint data ( "data.edn")))

kongeor12:04:12

and with a bit of hacking putting it back:

kongeor12:04:14

(doall
      (for [tx (clojure.edn/read-string (slurp "data.edn"))]
        (let [tx' (:crux.api/tx-ops tx)
              tx'' (mapv (fn [[tx _ data]] [tx data]) tx')]
          (when (seq tx'')
            tx''
            (crux/submit-tx sys tx'')))))

kongeor12:04:27

I have just a small offset in attribute stats which I'm not sure why is happening but it's an increase so it's probably fine 🙂

kongeor12:04:37

{:kino.album/artist-ids 38,
 :kino.album/release-date 36,
 :kino.track/artist-ids 53,
 :kino.album/images 108,
 :kino.album/total-tracks 36,
 :kino.album/name 36,
 :kino.track/number 47,
 :type 1,
 :kino.play/played-at 50,
 :kino.play/user-id 50,
 :kino.user/refresh-token 1,
 :kino.play/track-id 50,
 :kino.artist/name 42,
 :kino.track/explicit 47,
 :kino.track/name 47,
 :crux.db/id 176,
 :display_name 1,
 :kino.track/album-id 47}

kongeor12:04:41

I'm getting to

kongeor12:04:47

{:kino.album/artist-ids 41,
 :kino.album/release-date 39,
 :kino.track/artist-ids 55,
 :kino.album/images 117,
 :kino.album/total-tracks 39,
 :kino.album/name 39,
 :kino.track/number 49,
 :type 1,
 :kino.play/played-at 50,
 :kino.play/user-id 50,
 :kino.user/refresh-token 1,
 :kino.play/track-id 50,
 :kino.artist/name 41,
 :kino.track/explicit 49,
 :kino.track/name 49,
 :crux.db/id 180,
 :display_name 1,
 :kino.track/album-id 49}

kongeor12:04:55

in any case I'm not going crazy over this as I can see my data are there

kongeor12:04:31

thanks so much again! Take care!

kongeor12:04:19

btw I just tried the above with the existing version but I guess it will work when I upgrade too

✔️ 4
refset12:04:37

Nice! That difference is weird though...if you're happy to live not knowing, then no problem, but equally I can definitely help get to the bottom of it 🙂

refset12:04:58

I don't see why that script would be doing anything weird

kongeor12:04:33

well, I thought I could ... but know having someone willing to help I cannot 🙂 Should I grab the tx logs for a single key like :kino.track/name and see the diff before and after? Or is there a more efficient way to tackle this?

kongeor15:04:53

Not sure if the following makes sense:

kongeor15:04:13

the status for the following attr is:

kongeor15:04:14

:kino.track/name 49

kongeor15:04:40

(count
      (crux/q
        (crux/db (-> system :db :db))
        '{:find [e]
          :where [[e :kino.artist/name ?]]}))

kongeor15:04:11

not sure if this is expected

kongeor15:04:22

also, checking the history for each of those docs:

kongeor15:04:24

(->>
      (crux/q (crux/db (-> system :db :db))
        '{:find [e]
          :where [[e :kino.artist/name ?]]})
      (map first)
      (map #(crux/history sys %)))

kongeor15:04:47

also gives me a single history records for each one of those - I guess that expected as this is after the import

refset15:04:38

Ahhh, that's even weirder then! Good idea to check that the history of each only contains one entry. It's possible that there's something we've changed with ingestion that's caused attribute-stats to go haywire, like a race condition of some kind (it's run async after the main index-tx thread runs). I will try to repro on a vanilla data set.

kongeor16:04:58

Please note this is on the older version

kongeor16:04:02

I'll give this a go too on the latest version

kongeor08:04:23

hey again! I gave it a go 20.04-1.8.1-alpha and attribute-stats were consistent this time. Before and after the export and it matched the db key counts I did. This is with jdbc backend tho. Will try now with rocks.

kongeor08:04:54

{:crux.node/topology '[crux.standalone/topology
                                                  crux.kv.rocksdb/kv-store]
                            :crux.kv/db-dir "data"}

kongeor08:04:13

everything looks all right with this config as well, so I guess we can close the file on this one 🙂

kongeor08:04:00

as this was a fun process I'm thinking of writing a blog post. Someone may find this useful.

refset08:04:10

Cool, okay well I'm glad you're happy :) and a blog post would be awesome! We'll crack on with some more rigorous migration tooling also 😊