Fork me on GitHub
#clojure
<
2018-10-08
>
kingcode00:10:26

What is the fastest way to tell whether one vector is a subsequence of another?

kingcode00:10:52

The two vectors beginning are compared… would comparing subvectors of the size of the smallest be fast enough?

andy.fingerhut01:10:33

This is very similar to determining whether one string is a substring of another. If you want to do that in guaranteed linear time, even when the substring and the one you are checking to see if it is a substring of have repetitive patterns in them, you need to do some kind of preprocessing. e.g. imagine if your substring to check if it was inside of another was "01010101012", and the string you were checking inside of had many occurrences of "010101011". The most straightforward approach would find out that there was a mismatch starting at the first "01", then again at the second "01", and at the third, and then the fourth. Each of those 2nd through 4th checks wastes time, that if you were cleverer in your algorithm you could skip past all of those without rechecking.

andy.fingerhut01:10:51

That is what the Boyer-Moore string matching algorithm does -- be clever about jumping ahead after a partial match failed.

❤️ 8
andy.fingerhut01:10:14

If you don't care about guaranteed linear time, and/or if repeated elements are not common in the vectors you are dealing with, then just scanning the larger vector for occurrences of the first element of the smaller one, and wherever you find it, compare the subverter of the larger one to the smaller one, would be linear time.

andy.fingerhut01:10:56

(clarifying -- if the vectors are guaranteed to have no repeated elements, then the simple approach is linear time. If they can have repeated elements, the simple algorithm can take quadratic time in the worst case).

noprompt02:10:22

The Z-Algorithm is another approach too although I think it’s a bit less sophisticated than Boyer Moore.

noprompt02:10:15

The cool thing about these algorithms is that you can find matches in sublinear time.

roklenarcic12:10:11

Use Knuth-Morris-Pratt for short search elements

roklenarcic12:10:21

Boyer-Moore for long ones

roklenarcic12:10:44

similarly for "alphabet" sizes

djtango13:10:00

hey folks does anyone know why clojure.core/map would have different performance characteristics from my-map if the source is identical?

Andrea Imparato13:10:47

I can't probably help you because I'm a clojure newbie

djtango13:10:38

as in my own copy of exactly the same function that I personally defined

djtango13:10:53

(which is a copy-paste of the clojure.core/map source)

lsenjov13:10:46

Possibilities: aot from a library, or it's been running longer so the jvm has had a chance to optimise it more

Andrea Imparato14:10:49

I'd say the performance should be identical but I'm not a clojure expert @djtango 🙂

Andrea Imparato14:10:07

btw, why do you want to have a copy of the original function?

Alex Miller (Clojure team)14:10:30

the core one is AOT compiled and direct-linked

djtango14:10:32

thanks Alex!

borkdude14:10:30

Am I right that a running future doesn’t throw when cancelled?

bronsa15:10:17

user=> (def a (future (loop [] (recur))))
#'user/a
user=> (future-cancel a)
true
user=> @a
Evaluation error at java.util.concurrent.FutureTask.report (FutureTask.java:121). CancellationException null

borkdude15:10:42

@bronsa I have some future that somehow continues running, but I can’t reproduce it in this example:

(do
    (def f (future
             (try
               (while true
                 (Thread/sleep 100)
                 (debug "interrupted?"
                        (Thread/interrupted))))
             10))
    (Thread/sleep 1000)
    (future-cancel f)
    (println (future-cancelled? f)))
Here Thread/interrupted always returns false. In my other future, I see (Thread/interrupted) is true after I cancelled it. When do you have to check Thread/interrupted

bronsa15:10:20

@borkdude cancelling a future doesn't cause interruption of the thread, you indeed have to manually check Thread/interrupted and stop the thread manually

bronsa15:10:32

not sure what you're trying to do with that example

bronsa15:10:14

(do (def f (future (while (not (Thread/interrupted)) (Thread/sleep 100) (binding [*out* *err*] (println "interrupted?" (Thread/interrupted)))))) (Thread/sleep 500) (future-cancel f))

borkdude15:10:15

@bronsa will Thread/interrupted return false after future-cancel or am I wrong here?

borkdude15:10:43

and it will throw during some actions like Thread/sleep, so it depends I guess

bronsa15:10:32

yeah possibly no printing is to be expected

lgillette18:10:34

@seancorfield @noisesmith Thanks for the help the other day regarding connecting to Redshift! I got it to work, it is now simple, short and elegant code as Clojure solutions usually are. Turns out I overlooked the port number, and had the wrong one. Also, I removed the extraneous connection parameters you mentioned. Thanks a ton!

seancorfield18:10:06

Cool! Thank you for reporting back on your success!

seancorfield18:10:40

Also, let me know -- as the maintainer of clojure.java.jdbc -- if there are things I could do to make interaction with RedShift easier (since it's not a system I use myself).

lgillette18:10:29

@seancorfield Will do, I'm just getting my feet wet with Redshift interaction via Clojure. I'm hoping to move much of the ETL type work I do at my company into pure Clojure. I am hoping Clojure and REPL will a good environment for me and potentially others to work in interactively to quickly test things out with the data, vs all the clunky hodgepodge of shell and SQL scripts we are using now. Also, creating easily reproducible pipelines is a goal. I'm looking at Drake and Honey SQL libraries as well in the effort.

lgillette18:10:49

While we are on this subject, I'm a bit disappointed that there seems to be no idiomatic way to create a persistent db connection object. I know there's with-* functions etc, but when working at the REPL all day it would be nice to be able to just keep the db connection alive (much like I do with my very fast psql cli prompt) and be able to pass it into jdbc functions, e.g., (jdbc/query persistent-db-conn "select foo from bar"). Anyone have suggestions for such a workflow?

seancorfield18:10:54

At work we use c3p0 to create a connection pool. I gather HikariCP is another good option.

seancorfield18:10:13

I would expect you could use a connection pool with RedShift?

lgillette18:10:58

Cool, I'll take a look and see. I glanced at c3p0 just earlier today but haven't tried it out yet.

seancorfield18:10:35

Wrap it in a Component. Then you can

(defonce system nil)
...
(alter-var-root #'system component/start)
... work with (:db-spec system) ...
(alter-var-root #'system component/stop)
...

seancorfield18:10:10

(assuming start creates the connection pool and assocs into the Component as :db-spec)

noisesmith18:10:46

yeah, I found using c3p0 directly via interop and putting it in a component Just Works ™️

seancorfield18:10:56

Write it once and forget it, yes. We have a DataSource Component that hides all the Java interop but still provides optional dials and knobs to tune the connection pool if needed.

ska19:10:09

Hi. Is anyone using a library to generate Markdown from a Clojure datastructure similar how hiccup does it? I'm aware of merkki and marge both of which look more like a proof of concept than a library really being used. But maybe I'm wrong? Happy to learn more and to contribute. https://github.com/jarcane/merkki https://github.com/markwoodhall/marge

markwoodhall20:10:51

I've not seen merkki before, I started playing around with marge because of a side project I used it on. I know of a couple of organisations that have given feedback about using it.