This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-12-15
Channels
- # adventofcode (80)
- # beginners (94)
- # biff (19)
- # cider (74)
- # clj-kondo (11)
- # cljs-dev (7)
- # clojure (110)
- # clojure-austin (3)
- # clojure-australia (1)
- # clojure-belgium (1)
- # clojure-china (1)
- # clojure-europe (83)
- # clojure-filipino (1)
- # clojure-hk (1)
- # clojure-indonesia (1)
- # clojure-japan (1)
- # clojure-korea (1)
- # clojure-my (1)
- # clojure-nl (1)
- # clojure-norway (4)
- # clojure-sg (1)
- # clojure-taiwan (1)
- # clojure-uk (2)
- # cursive (3)
- # data-science (8)
- # datalevin (8)
- # emacs (18)
- # etaoin (5)
- # graalvm (1)
- # holy-lambda (3)
- # honeysql (1)
- # jackdaw (9)
- # java (10)
- # jobs (3)
- # luminus (9)
- # malli (106)
- # off-topic (88)
- # polylith (8)
- # portal (2)
- # re-frame (50)
- # reagent (11)
- # reitit (74)
- # remote-jobs (1)
- # shadow-cljs (46)
- # tools-deps (26)
- # xtdb (49)
Morning! Any (recent) talk suggestions that I can safely listen to during a car drive?
morning
Hey guys. I have a 150MB SQL dump from some random database. I have no idea how the tables are set up or even what’s really in this data. How do I get from hugedump.sql
to an overview of what’s in this data? My colleague has been working on linking two datasets for several months in 2021, but she never bothered to check if she could export the data from the tool she was using. Now I have this dump, but most of it just says _binary
and then lists a bunch of binary data.
So it’s a dump as SQL DDL and Insert statements, but most of the data is serialized binaries?
I think I need to visualise the schemas and get some specific data out (if it’s in there).
Ah, so no DDL and the inserts depend on column order?
not sure if I know enough SQL to answer that. It has a bunch of CREATE TABLE statements in the beginning.
In this case, I’d just try and run the whole script against a dockerized, uninitialized MySQL Database and see what happens
Seeing a Flyway reference in there is worrying, it’s like migratus in the Clojure world and hints that there might be additional DDL files in a programs source code somewhere
Ah, nevermind, if it’s a dump it shouldn’t matter
sorry 😄
I’d use MySQL to be sure that the collation et al work, especially the binary format might not work in another DB. The bigger question might be, what the binary stores represent. If they are serialized Java Objects and you don’t have the source code anymore it might get tricky
I’ve been trying to get her to provide me with the export for 2 years… she finally delivers and it can’t be used. Now all I have is this dump 😛 will be interesting to see whether her work is completely wasted.
After a quick search, there seem to be tools that can restore deserialized java objects even if the class files are missing. Archeology time!
tried a couple of the leading bytes from the binaries against - https://en.wikipedia.org/wiki/List_of_file_signatures - no matches... which hopefully rules out serialized java objects (which start with AC ED)
This is the stuff that they don’t tell you about when they’re trying to recruit you with fancy tutorials for using pure functions and making beautiful GUIs in few keystrokes.
For some reason I always want to find out what’s in a binary data representation if I see one 🙂
So I managed to poke around the data for a bit and as feared all of the IDs I was looking for were NULL. So my colleagues have altogether spent probably a year wasting their time...
They can’t re-export in a different manner?
No this is the raw data. The data simply isn't there. If they had exported a sample of the stuff in 2020 like I asked, this wouldn't have been an issue...
Or, reshape the data in a manner that ensures that you get what you want upon export.
Welp, that’s terrible.
eek - how do you work for a year without ever checking that you are recording what needs to be recorded ?
the work they have done is that of manually linking two wordnets, but the software they used uses its own binary IDS to link the wordnet entries. There is a table that translates this makeshift binary ID to one we know for one wordnet dataset (the English one) but all of the IDS for the Danish data are NULL. We would have discovered this early if they had produced a sample, but this sample never materialised for 3 years...
Does the original data with that software still exist before export? Are the ids there correct or were they corrupted from the get go?
BTW I'd use a GUI to explore this. I personally use TablePlus on Mac - sometimes a GUI helps you see data you'd otherwise miss.
@UK0810AQ2 There was never any data corruption, AFAIK, just a lack of due diligence.
What a sucky situation @U4P4NREBY - hope my joke works for you.
Thank you @U6L5FFL95. 🥲
well this job turned out to be bigger than i guesstimated 🙈
no, i've been refactoring a library which has accreted function over 7 years, to give it a nice clean interface...
hopefully doing good work - it's open-sourced, and a distillation of all the learnings we've had working with a streams+promises system
remember the rule of spring cleaning - empty the drawer on to the kitchen floor and put things back in that you want to keep, rather than trying to remove things you don't like from the drawer i.e. in code: make a new namespace, copy stuff over bit by bit, rather than trying to cut stuff away from existing code it doesn't always work out but i find it's a better way to start. it also has the benefit that you can do multiple iterations and compare
i did exactly that on the main namespace @U0509NKGK , then ditched the old namespace once done
thank you 😊
it's definitely something
a memoriam to the first observable bug
Does anyone have an example of a git precommit hook that runs (for example) cljfmt?
There used to be one in this library. project.clj to install it: https://github.com/javahippie/clj-test-containers/blob/36f3820f4fe89cd68c399fa2ccbcbfd34c5e2d0e/project.clj