Fork me on GitHub
#clojure-europe
<
2020-10-28
>
dharrigan06:10:48

Good Morning!

synthomat06:10:16

Good morning :spock-hand:

slipset07:10:18

Good morning

javahippie09:10:33

What algorithm would you choose to compare post addresses? My current project is using Jaro-Winkler, and it’s horrible 😕

javahippie09:10:57

I believe every approach that uses a string distance is a bad one for structured addresses. Leave out a ZIP code, and the distance explodes. Have similar words with different meanings, and you get a false match

javahippie09:10:30

Normalizing the data is, of course, not possible

Ben Hammond09:10:45

equivalence of esszets and double s is a personal favourite

Ben Hammond09:10:07

similiarly for umlauts

javahippie09:10:48

Yeah. Or “P.O. Box”, “Post Box”, “P.O”… “Street”, “Str.“, … Just blindly applying a function to a string won’t do good, you’d need to intelligently tokenize normalize. Oh, and all of the addresses are global, of course. Germany, China, U.S.A and Chile alone don’t have comparable address formats in general

javahippie09:10:08

The main issue is, that every time the software matches something wrong, somebody creates an issue I have to investigate 🙈

😱 3
synthomat09:10:29

that algorithm sounds like a case for a new SaaS business

javahippie09:10:35

:thinking_face:

dominicm09:10:18

Bucks and Buckinghamshire is a fun one too

RAMart09:10:36

> that algorithm sounds like a case for a new SaaS business Good luck with the GDPR compliance...

javahippie09:10:35

The issue is, that the customer cannot understand that “I want to find a business partner with a similar address in my database of 3.000.000 addresses” is not an easy problem to solve.

synthomat09:10:28

what’s wrong with gdpr? as far as I understood it’s only about adresses and not names

RAMart09:10:44

The address send to your SaaS could be anything. Including names, persons and the like.

synthomat09:10:29

@javahippie “similar address” is very broad 🙂 does it need to go by street?

javahippie09:10:45

It’s “Company, Street/P.O Box, Zip Code, City, Country”. But can be anything somebody in an office enters. Things that also appear sometimes: District, building, floor, office number…..

javahippie09:10:10

Never said that to a customer but.. I believe they need an AI 😄

RAMart09:10:05

Good luck debugging the AI when the customer reports the next "wrong match". Hm... I use this "good luck" phrase too often. 🙈

ordnungswidrig12:10:00

insert universal greeting

borkdude12:10:02

Howdy! Made this script to detect code using some spec pattern: https://gist.github.com/borkdude/a391146ad81a06c28fb97ccdc1f64d44 I'm considering of building this out to a library.

borkdude12:10:44

Note sure if spec would be the way to go or malli. I guess that's a typical 2020 Clojure problem. As of now, it would be spec, but in the future malli might be more flexible

slipset12:10:54

@borkdude while you're here. I was thinking about something we've probably discussed before: clj-find-usages

slipset12:10:39

Which would be something I could invoke (a bit like clj-kondo) from emacs which would statically analyze my project and find the usages of some symbol.

slipset12:10:01

Basically, my problem is that the find-usages in Cider is not Working(TMO)

borkdude12:10:02

I think some plugins already do this. It's possible using clj-kondo's analysis output.

borkdude12:10:33

@slipset One example is https://github.com/didibus/anakondo which provides completions, but could in theory also jump to definition. Maybe it can be extended with usages as well

borkdude12:10:58

I see it's on their roadmap. Maybe you could help @didibus

borkdude12:10:50

But for the spec tool I'm considering, I possibly want to support patterns using fully qualified symbols so alias usage will match on that as well

borkdude12:10:37

e.g. (require '[foo :as f]) (f/dude) and searching with foo/dude will give you the match for (f/dude)

raymcdermott19:10:39

there are also higher level language bindings though it seems like the Java lib hasn't been touched for a while

javahippie19:10:54

That’s nice! Will suggest this, it should be possible to wire it into the pipeline. Thanks!

3
otfrom20:10:12

as I saw on the #announcements woah

otfrom20:10:26

I think we all know kung fu now

otfrom20:10:26

and congrats 🙂

dominicm21:10:16

Mailing lists: a cool way to receive updates or kinda lame? Thinking for software projects.

otfrom21:10:45

mailing lists are cool

otfrom21:10:54

everyone has email