Fork me on GitHub
#clojure-europe
<
2023-03-21
>
schmalz08:03:09

Morning all.

simongray08:03:42

today’s task: web scraping a long list of URLs from an Excel document using Etaoin and docjure

simongray08:03:10

I wonder if the document part of the HTML file can/should be converted into markdown…? the point is to create a dataset for NLP

borkdude09:03:59

cc @UE21H2HHD Look, an etaoin user :)

simongray09:03:08

I like it so far, but it almost does too much 🙂

simongray10:03:15

I have discovered that combining etaoin and pmap destroys my CPU 😮

borkdude10:03:56

maybe you should not be using pmap ;)

borkdude10:03:12

pmap for side effects isn't recommended

simongray10:03:24

but I need speeeeeeeed gotta_go_fast

simongray10:03:05

seems to work OK with a try-catch in the mapped function… aside from the throttling

orestis10:03:00

Try claypoole instead which gives you pmap like stuff with knobs

🙏 2
2
orestis10:03:32

We’re using pdoseq

simongray12:03:47

claypoole was a good solution

lread13:03:11

Glad you are making use of Etaoin @U4P4NREBY! If you run into problems, or make delightful discoveries, please do drop by #C7KDM0EKW and share!

🙏 2
ray09:03:40

Good morning

🌱 8
😍 2
Ben Sless16:03:25

Linux + custom emacs build = pain

borkdude16:03:19

who told you to use linux + custom emacs build?

borkdude16:03:58

or are you just venting about emacs in general? :)

Ben Sless16:03:25

You don't compile your own Emacs? 😁

Ben Sless16:03:47

Two things, one easy, other hard The glob pattern needs to find a .so file LD path needs to add Emacs's libs (harder)

borkdude16:03:20

@UK0810AQ2 The above isn't about dynamic modules

👍 2
borkdude16:03:45

That is #C04V4LQF6V7

👍 2