Fork me on GitHub
#datomic
<
2017-09-28
>
apsey15:09:11

Has anyone compared performance of transactors comparing backend storages such as Cassandra vs DynamoDB?

apsey15:09:28

I am mostly interested in transaction functions, indexing job and write parallelism

mkvlr18:09:42

should datomic strings always be utf8? We’re seeing umlauts like ü come out as ? after transacting them and reading them again…

mkvlr18:09:38

same is true when importing from an utf8 encoded postgres table. I’ve read recent jdbc drivers should pick up the encoding automatically.

mkvlr18:09:04

maybe there’s also something with pedestal that we’re missing and it’s all correct in datomic… We are setting the charset=utf-8 in the Content-Type header though

favila18:09:11

@mkvlr In a scenario involving a peer encoding is not an issue since strings are shared in a type-safe way. Your problem is at some higher layer

favila18:09:20

you can confirm by interacting with the peer in a repl

favila18:09:03

(the datomic data in postgres is stored as a blob--it is opaque to the sql server so things like column encoding don't matter)

mkvlr18:09:24

@favila yes, that’s true for the datomic side, but we’re migrating our data from actual postgres tables into datomic using a script called from the repl

favila18:09:11

the strings you transact may be decoded incorrectly

favila18:09:31

or you may be encoding them incorrectly in the http response

favila18:09:06

(d/pull db [:string-attr] suspect-id) will tell you if it is http's fault

mkvlr18:09:12

alright, so maybe more on the #pedestal side

favila18:09:28

if you see bad characters, then the problem is with what prepared the string for transacting

favila18:09:53

if you see good characters, it's pedestal's fault

mkvlr18:09:57

any gotchas with the repl? do I have to set utf8 encodings there?

favila18:09:59

(or something)

favila18:09:24

I've never had to set encodings with repls, but I use nrepl all the time

favila18:09:34

maybe other repl types that is a concern

favila18:09:28

test by seeing what "\u00DC" prints I guess

mkvlr18:09:25

@favila hmm, on staging (through telnet) I only see ?, locally it works, alright, so it’s not datomic, thanks! 🙏

mkvlr18:09:17

might it be a JVM thing?

favila18:09:35

what is your staging repl? repl socket server? something else?

mkvlr18:09:23

@favila yes just clojure.core.server/repl

favila19:09:23

ugh it uses default charset, and not configurable

favila19:09:33

what does (Charset/defaultCharset) say in your telnet repl? @mkvlr

favila19:09:03

if it mismatches locale charmap in your terminal you will have problems

favila19:09:26

and all of this is different from what encoding of strings pedestal may do

mkvlr19:09:29

user=> (Charset/defaultCharset)
CompilerException java.lang.RuntimeException: No such namespace: Charset, compiling:(NO_SOURCE_PATH:1:1) 

favila19:09:48

(java.nio.charset.Charset/defaultCharset)? @mkvlr

mkvlr19:09:10

nothing good:

mkvlr19:09:13

user=> (java.nio.charset.Charset/defaultCharset)
#object[sun.nio.cs.US_ASCII 0x56a76e18 "US-ASCII"]

favila19:09:20

ah, so ascii

favila19:09:31

anything not 7-bit will get stripped

favila19:09:07

so that's why you have ?

mkvlr19:09:18

should that be fixed by setting export LC_ALL=en_US.UTF-8?

favila19:09:00

that affects locale too, so I'm not sure

favila19:09:22

to just alter default encoding I think starting java with -Dfile.encoding=UTF-8 will do

favila19:09:05

locale affects other stuff in addition (like number formating, money signs, etc), so if there's code that relies on the defaults it will break

favila19:09:16

but maybe that's what you actually need to do

favila19:09:01

IMO any API where charset/locale/etc is an optional variable is broken

favila19:09:06

unfortunately that's every api

mkvlr19:09:30

alright, will try LC_ALL and java -Dfile.encoding=UTF-8

favila19:09:52

you should only need one or the other I think

mkvlr19:09:24

yep, will try LC_ALL first

mkvlr20:09:14

@favila can confirm LC_ALL is working, thanks again! 🙏