Fork me on GitHub
#beginners
<
2019-07-29
>
iagwanderson01:07:59

hi ppl, I am building a code to perform data validation on data from several different sources and formats. However, the main function has the same structure. Therefore I want to implement a protocol with the methods that are necessary to be implemented in order to have your specific data validated

chetan.falcon01:07:59

Would you be having a method each to validate each of your fields? or is it a method for validating rows from one source fully?

iagwanderson01:07:09

the methods operate on the rows entirely

iagwanderson01:07:26

something like validate, report, send, save

chetan.falcon02:07:06

Sorry I was thinking just singular functions purpose built for each type of data row you want to validate and return bools based on if you could validate and then I saw your next comment about records.

iagwanderson02:07:06

no problem, I just found my solution. I need to use map->MyRecord my-original-map

iagwanderson01:07:11

the idea sounds good to me, however my records are going to have a lot of fields. For example, in a XLSX file, I want to model each ROW as a custom record. I have some files with 30+ columns

iagwanderson01:07:52

are there better ways to achieve this other than typing the name of each column as input parameter for the record? Each row is already a map.

chetan.falcon01:07:14

Hi folks, I have a HugSQL question, would be great if someone could point me in the right direction. Basically I am getting in data from my select statement like this:

{:remarks nil, :timeprocessing "1994-01-01 00:00:00.000", :devicetype 3M, :staffcode 0M, :timedeletion "1994-01-01 00:00:00.000", :occurred 1M, :syseventdesig "Ticket not valid", :time "2019-01-14 09:45:22.340", :syseventno 490M, :devicedesig "201 Suncorp EXIT RHS", :devicenoprocessing 0M, :component nil, :carparkabbr "MAIN", :deviceabbr "201 SUNEXR", :week_id 201903M, :carparkdesig "Public Car Park", :dss_update_time #inst "2019-01-14T15:48:46.000000000-00:00", :loaded_new_yn "N", :operatorfirstname nil, :quantity 1M, :systemeventregno 5209M, :pdi_batch_id 697410M, :carparkno 0M, :deviceno 40M, :deleted_yn "N", :centre_no "0056000", :carpark_no "0056001", :operatorsurname "Unattended", :operatornoprocessing 0M}
I am getting several tens of thousands of such messages as a list of maps(each record being a clj map from select query(HUGSQL) that selects rows off of my table. This stalls my application and even crashes it sometimes(JVM memory full?). Is there a way for me to read it in chunks within the code without having to modify my select statement and put each chunk in a CSV file for eg. to further load that into a Cloud PubSub etc... I also see some issues with values gotten in the Clojure map; some fields are not inside double quotes like :devicetype 3M, this in my DB is just 3 and some fields have extra stuff :dss_update_time #inst "2019-01-14T15:48:46.000000000-00:00", but dss_update_time in my DB looks like: 14/Jan/19.

dpsutton01:07:33

In the database is it a numeric type 3 or a string “3”?

chetan.falcon01:07:28

it's just 3 like an int.

dpsutton01:07:38

Then you don’t want that inside quotes as a string. It’s as a value 3 rather than the string “3”

chetan.falcon01:07:31

Right, I was actually thinking why the 3M instead of just 3 in the map

dpsutton01:07:33

The M suffix indicates this is a java bigint. The #inst is a tagged literal for a date time. See https://clojure.org/reference/reader

dpsutton01:07:03

There’s a section on tagged literals

chetan.falcon01:07:45

Ah I see, thx @dpsutton. So that means I would have to clean these values out from the map in order to get to my desired data

chetan.falcon02:07:06

Any idea how I could handle huge numbers of rows and splice it in clj?

dpsutton02:07:20

I think they are already types you can easily work with

dpsutton02:07:41

you can see this in action by (java.util.Date)

chetan.falcon02:07:58

Thx, would you be able to say how I could just map it back to the dd/mmm/yy format?

dpsutton02:07:05

(java.math.BigDecimal. 3)

dpsutton02:07:36

do you want a string or a datetime object? what do you ultimately want to do with this data

chetan.falcon02:07:47

I want it to be string but in the original dd/mmm/yy format. I want to push these to a big-query table in Google Cloud with each field going to it's own column in the table

dpsutton02:07:13

you probably want to push a datetime or date type object into the cloud, right?

chetan.falcon02:07:54

Hmm, probably yes you're right.. in that case how could I achieve it with the value I have?

dpsutton02:07:07

it is the value you want currently

dpsutton02:07:50

its a datetime object. it is displayed as a tagged literal but that's just a convenience for you. it's (most probably) a java.util.Date object

chetan.falcon02:07:07

Ok so that after I map it to my cloud table would fall back to a datetimeobject?

chetan.falcon02:07:22

Cool. thx I ll give that a try and see

dpsutton02:07:03

i think you're missing the point using language like "fall back". It is a java.util.Date object

dpsutton02:07:12

(.getYear #inst "2019-07-29") the representation #inst "2019-07-29" is just a reading and writing display of just a java object.

chetan.falcon02:07:00

Ah ok, that's clear now thank you 🙏

dpsutton02:07:59

contrast that with the display of a class that doesn't a reader literal for convenience

(to-array [1 2])
#object["[Ljava.lang.Object;" 0x5fa47fea "[Ljava.lang.Object;@5fa47fea"]

chetan.falcon04:07:13

Thank you for all your help!

dpsutton02:07:32

much nicer 🙂

dpsutton02:07:35

as to your number of records problem. If you get all the rows in memory without printing them do you still have a problem? Its possible the data set fits in memory but if you're calling a function at the repl maybe printing 10k records is killing your process rather than the data itself?

dpsutton02:07:05

if not, check out the helpful docs http://clojure-doc.org/articles/ecosystem/java_jdbc/using_sql.html about reducible-result-sets. But i don't do a ton of data manip like this so maybe ask the helpful folks in #sql

seancorfield02:07:45

Happy to help if you ask in #sql -- both clojure.java.jdbc and next.jdbc can stream results from the DB and let you process them with reduce etc. So you can process a data set that is much larger than memory. But you have to do a bunch of DB-specific stuff to make it happen, and that can be different for each different DB you work with.

gagan.chohan04:07:46

hi, can't find it anywhere, this errors out

(defn po [vol {:keyA [id name age] }])
but if i replace :keyA with :keys , it doesn't

jumar04:07:16

That's expected. :keyA doesn't make any sense in "destructuring" syntax

lennart.buit05:07:58

you can do it the other way around tho, if you have a map with :keyA that you wish to destructure to another name, e.g. keyB you can do this: (defn po [vol {keyB :keyA}]). Destructuring takes a bit of getting used to, heres the guide: https://clojure.org/guides/destructuring

lennart.buit05:07:05

ah; hiredman also linked the same guide in a different channel

markgdawson06:07:04

I'm using CIDER and EMACS as my editor. I often find myself building up functionality one form at a time, and progressively nesting them with the intention of putting this in a defn or let. I find myself dealing with this by using defn in the namespace to set up "variables" with the same name those found in the inner scope, so that I can evaluate forms as if I was in the scope/context of the defn/let. How do others manage this? Is there an easy way to create a throwaway scope that I can work within (like a let expression), which I can discard once the function is written? Or can I somehow tell cider I want to now be evaluating forms inside a given scope?

chrisjswanson06:07:14

@markgdawson you may get some use out of clj-refactor.el

chrisjswanson06:07:25

i don't use it personally but i was looking at it a while back for similar stuff

chrisjswanson06:07:36

just never got to learning it

markgdawson06:07:21

Thanks @chrisjswanson, I'll take a look. How do you currently do this without clj-refactor? I'm sure there must be some obvious tricks that I'm missing here. Surely it can't be the case that littering scope with def forms is the accepted way of doing this 🙂

crispin06:07:20

I use the comment macro to wrap some inline tests after the function. Then move into that comment area and c-x c-e the parts as needed, some defs, and run functions

chrisjswanson06:07:36

you can just build the form in the scope as you currently do, then have it automatically extract the current form to a function

chrisjswanson06:07:11

again i don't actually use it so i'm just guessing it might help, but i got the idea it was intended for just such a case as you mention

chrisjswanson06:07:55

yea i do the comment and c-x c-e thing a lot myself

chrisjswanson06:07:20

also getting familiar with paredit so you can move around forms easily and copy stuff back and forth from the repl helps a lot too

markgdawson06:07:37

I see, that's definitely useful for saving typing, but I was hoping for something that would do two things (ideally). First make it easier for me to build up the scope of a function just by calling it, something like the breakpoints in CIDER which will stop and let me evaluate forms in the current scope, but where I can do that in a buffer (where the code will eventually live), not in the mini-buffer. Secondly, the idea of littering my namespace with def expressions, i.e. binding symbols solely for develop in my REPL session, which will not be there in a production run, makes me uncomfortable. Having said that, maybe I'm missing "the clojure way" or something like that. For example, let's say I want to write a function:

(defn read-file [filename]
    (slurp filename))
I would start by setting up something like the scope of the function in the current namespace, e.g.
(def filename "name-of-file")
Then develop and test some forms, by doing things like:
(slurp filename)
And evaluate as I go along (e.g. C-x C-e). After developing a whole bunch of nested forms, I wrap them in a function. I now have a bunch of symbol bindings in my namespace which are not there in production. So later on, I could (potentially) trip myself up by writing, for example:
(defn do-something []
  (do-something-with filename))
As well as that, let's say I want to write code in nested let expressions or scopes, then I end up binding all the symbols for the context in which I want to develop. Like I said, I'm probably just missing "the clojure way" here, but I feel there should be a better way.

chrisjswanson06:07:43

For that type of thing, I usually build let forms in the repl and once i'm happy with it, copy the relevant sections into defn's

chrisjswanson06:07:26

there's less of a feel of littering the code file - the repl is temporary and building a let form keeps everything in scope while you're getting the logic right

chrisjswanson06:07:13

maybe there is a better way though, if so i'm not aware of it

chrisjswanson06:07:10

like so (in the repl):

(let [filename "name-of-file"]
        (do-something-with filename))
and then once it works, just copy the form to the code file and change let to defn and remove the "name-of-file" string

chrisjswanson06:07:46

you can also use letfn in a similar way

markgdawson06:07:25

That seems better than what I outlined for sure. I've found the following in the CIDER debugger docs, that seems to suggest that what I'm thinking of exists: " Additionally, all the usual evaluation commands such as C-x C-e or C-c M-: will be scoped to the current lexical context while the debugger is active, allowing you to access local variables. "

markgdawson07:07:31

This is exactly what I would like, but I can't seem to write new code after the CIDER breakpoint is hit because cider hijacks keys (e.g. i, n, h etc) when in debug mode. That would be very useful if I could somehow find a way to use this on new code....

chrisjswanson07:07:12

yea i don't know how.. actually i pretty much don't use debug, i just write a lot of small "pure" functions and use the repl a really lot, and so far i've been pretty productive that way. not having to step debug is actually one of the great benefits of clojure and repl driven development in general

chrisjswanson07:07:54

if i'm really trying to figure out what's going on in some section of code that i can't easily test in the repl, sometimes i'll capture values with a log atom

chrisjswanson07:07:30

maybe i'm missing the point of what you're trying to accomplish though , not sure

chrisjswanson07:07:56

i have no doubt lots of people are making great use of the debugger

markgdawson07:07:59

Thanks @chrisjswanson, that's useful to hear. As a relative newcomer, maybe what I'm looking for is in part just baggage from other languages...!

chrisjswanson07:07:39

it's always possible. i came to clojure from java and ruby , and definitely spent the first 6 months trying to figure out why it wouldn't do things in ways that i was familiar with

chrisjswanson07:07:49

now i see it as protecting us from anti-patterns

chrisjswanson07:07:23

gotta run , glad to help . hope you enjoy your clojure journey

markgdawson07:07:41

Thanks, much appreciated.

jaihindhreddy11:07:04

agents can't exert back-pressure right?

jaihindhreddy11:07:55

If yes, a program and go OOM because too many things are there on the agent's queue. Is that possible?

alexmiller12:07:57

There is no back pressure for agents

alexmiller12:07:09

Th queues can be arbitrarily long

auroraminor19:07:21

is there a form that would be more idiomatic between these choices given the potential of an empty string: (when-not (empty? _)) or (when (not-empty _)) or (when (seq _)) or something else?

seancorfield19:07:37

You should prefer (when (seq _) ,,,) over any sort of not empty (from an idiom point of view).

seancorfield20:07:58

Some times, where you are computing the potentially empty sequence (or string), it is worth doing

(when-let [s (not-empty (some-complex-expression))]
  ,,, s ,,,)
to avoid recomputing the expression.

alexmiller21:07:51

for strings, I would use clojure.string/blank?

jarvinenemil19:07:59

how can i get MyClass.class to work in Clojure? I am using jackson ObjectMapper due to some interop stuff. Do I need to define a record or a deftype? and then do (class MyType)?

seancorfield20:07:50

@jarvinenemil In Clojure MyClass is the same as Java's MyClass.class.

seancorfield20:07:08

Classes evaluate to themselves, i.e., an Object of type Class

jarvinenemil20:07:18

Perfect. I'll use records since i dont need to mutate stuff. Thanks Sean!

jarvinenemil20:07:24

i ended up creating a json-string of my map and then reading it as a string. much simpler. 🙂