Fork me on GitHub
#beginners
<
2023-10-21
>
CarnunMP14:10:58

Is it possible to overwrite just the last line in a text file (say, a big csv) without overwriting the entire file? Or, for that matter, just the nth line?

andy.fingerhut14:10:06

There are file I/O APIs that let you do "jump" to arbitrary byte offsets in a file and then do read/write at that position. Unless you somehow know the length of every line, though, there is no magic way to know where the N-th line begins and ends without reading through all bytes looking for the line terminators.

👍 1
💡 1
andy.fingerhut14:10:49

The methods I am thinking of also do not insert/delete bytes in the middle of the file, so you can replace bytes, but if you want to reduce or increase the number of bytes of a line in the middle of the file, those APIs would still require you to overwrite the rest of the file after that increase/decrease-size point in the file.

CarnunMP14:10:41

Huh. Fiddly!

CarnunMP14:10:37

I was hoping at least the former case would be as simple as something like 'enter append mode, jump back one line, start writing'. 😅

CarnunMP14:10:55

Hmm. Can you really only jump an arbitrary number of bytes @U0CMVHBL2? Not lines?

andy.fingerhut14:10:37

I mean, there might be custom libraries / file formats that store an index of where each line of the file begins, but if they did not, how would. you know where the N-th line of the file was?

💯 1
andy.fingerhut14:10:48

A straightforward approach that doesn't involve fancy work is to read the original file, copying it to another file as you go, counting lines, replacing any parts in the original with different parts in the new file you are writing. When you are done with producing the output file, then you can decide whether you want to do file renaming, or Unix hard or symbolic link manipulation, to make the new file appear in the file system with the name of the original file. Obviously that requires temporary space, and a linear time scan through the whole file.

👍 2
andy.fingerhut14:10:14

For "small enough" files, though, it is very reliable and does not rely on any special libraries.

andy.fingerhut14:10:06

Database systems avoid this by having those extra indices all of the time, which the database system implements for you. Even there, many string types you have to declare as a particular maximum length when you create the database schema.

👍 1
andy.fingerhut14:10:08

Analogy: If I gave you a normal book, you can quickly jump to a particular page, and then a line number within that page, and then a character number within that line. But what if I gave you a book and asked you to find the 1257-th paragraph?

👌 3
CarnunMP15:10:42

Really appreciate the ELI5 @U0CMVHBL2! Thank you. :))

nitin02:10:25

Why not add n+1th line instead as it would be the next nth line? i.e. simply append new data to exisiting file? is that slow/memory heavy too?

andy.fingerhut12:10:48

@nitin Are you asking about adding a new line to the end of the file? If so, yes, Linux/Unix supports efficiently appending to the end of an existing file, without having to copy the entire thing.

yes 1
andy.fingerhut12:10:34

If you are asking about adding an n+1th line in the middle of a file that has more than n lines to begin with, that is not efficiently supported, in any way that I know of, with plain Linux system calls.

Fredrik Meyer14:10:46

Something I’ve sometimes wondered. Say I have installed Clojure with brew for example. What happens when I run a project with a specified Clojure version it is deps.edn?

practicalli-johnny14:10:10

The project dep.edn Clojure dependency version takes precedence over the Clojure dependency provided by the Clojure CLI install. E.g the brew install. Clojure is a library, so any version could be used by adding to the project deps.edn file or specificing the Clojure dependency on the command line when starting a repl https://practical.li/clojure/clojure-cli/#precedence-order

seancorfield17:10:07

When you "install Clojure with brew" you are installing the CLI, not Clojure-the-language. When you run clojure it figures out what version of Clojure-the-language you want and fetches that into the ~/.m2 local maven cache if necessary. You can use any version of Clojure-the-language all the way back to 1.0 with any version of clojure the CLI.

til 4
👍 1
Ben Quigley17:10:39

that's a really helpful explanation to me too coming from Python where we don't have such a rigorous (/ any) distinction between the interpreter/compiler and the CLI.

Fredrik Meyer18:10:14

Thanks for all the answers!

seancorfield18:10:33

One "trick" that I've found very helpful for quickly testing code across multiple versions of Clojure is that I have aliases in my user deps.edn file for every version: https://github.com/seancorfield/dot-clojure/blob/develop/deps.edn#L92-L116 And then I can do stuff like:

> for v in 0 1 2 3; do clojure -M:1.$v -e '(clojure-version)'; done
"1.0.0-"
"1.1.0"
"1.2.1"
"1.3.0"
> for v in 2 3 10; do clojure -M:1.$v -e '(type (* 10000000000 10000000000))'; done
java.math.BigInteger
Exception in thread "main" java.lang.ArithmeticException: integer overflow
        at clojure.lang.Numbers.throwIntOverflow(Numbers.java:1374)
        at clojure.lang.Numbers.multiply(Numbers.java:1738)
        at user$eval1.invoke(NO_SOURCE_FILE:1)
        at clojure.lang.Compiler.eval(Compiler.java:6465)
        at clojure.lang.Compiler.eval(Compiler.java:6431)
        at clojure.core$eval.invoke(core.clj:2795)
        at clojure.main$eval_opt.invoke(main.clj:296)
        at clojure.main$initialize.invoke(main.clj:315)
        at clojure.main$null_opt.invoke(main.clj:348)
        at clojure.main$main.doInvoke(main.clj:426)
        at clojure.lang.RestFn.invoke(RestFn.java:421)
        at clojure.lang.Var.invoke(Var.java:405)
        at clojure.lang.AFn.applyToHelper(AFn.java:163)
        at clojure.lang.Var.applyTo(Var.java:518)
        at clojure.main.main(main.java:37)
Execution error (ArithmeticException) at user/eval1 (REPL:1).
integer overflow

Full report at:
/tmp/clojure-13435902839890701181.edn
The latter shows the performance-related change in 1.3 compared to 1.2 where overflow used to auto-promote but was changed to faster non-promoting math (so you get an exception) and we can also compare the error reporting between 1.3 (a stacktrace with a bunch of internal stuff) to a clearer error pointing at user code.

Hasan Ahmed18:10:39

Hello, my app uses datomic as a data store. I usually have a different format/schema for my api (so it's better for devs using other languages). so I end up creating two functions for each entity, one to transform from "datomic" schema to a "user-friendly" schema and the other function transforms data from the opposite direction. for instance, I convert this: {:company/name "something" :company.contact/first-name "Hasan"} to this: {:name "something" :contact {:first_name "Hasan"}} As the number of entities grow, I found myself writing too many of those transformation functions, so I think, it's time for an abstraction or to change the way I am dealing with this (I feel that how I am handling this is cumbersome for no reason). Any libraries for bidirectional transformation (that you'd recommend) or tips for an idiomatic solution or a change in current approach?

phill21:10:54

A mapping between single-keyword and get-in/assoc-in path could let a single pair of functions translate back and forth according to your present (patternless) arrangement, e.g.,`{:company/name [:name], :company.contact/first-name [:contact :first_name]}`. But it seems a bit irregular that "contact" should be a distinct entity in one scheme but not the other. If that irregularity were resolved, then a lexical translation might be possible. Howwwever, it is sometimes handy to have a layer of fudge between a public API and a database schema, lest reasons to evolve one of them cause unwelcome ripples in the other.

🙌 1
ghadi12:10:05

an underutilized but powerful strategy is to annotate your datomic schema attributes with directives about how to transform, and then make your transform automatic in userspace

🙌 1
ghadi12:10:00

remember datomic attribute definitions are themselves entities

🙌 1
ghadi12:10:49

this can be as sophisticated as you want it to be

🙌 1
Hasan Ahmed16:10:22

@U0HG4EHMH thanks, i tried to give out a simple example to make the idea a clear. I am actually trying out polylith and the layer of fudge might be the components interface, I have not considered the interface to be that layer yet but now I am 🙂 I am new to all of this, and one thing that drew my attention in your reply was "patternless". Do you have time to elaborate on this further? Like how to design better to allow straightforward translation

Hasan Ahmed16:10:58

@U050ECB92 this sounds a bit complex for me at this stage but I am in love with the idea. If you can point me to a resource or dish me a simple example, I'd be grateful.

Michael Brennan19:10:01

How do I get out of a loop in a REPL?

Bob B20:10:10

if you mean something like an infinite loop in a command-line REPL, ctrl + c should do it. If it's something like calva, there's an 'interrupt running evaluations" command that should end whatever's currently executing

daveliepmann20:10:42

in CIDER there's cider-interrupt, C-c C-b