Fork me on GitHub
#datomic
<
2017-10-05
>
uwo15:10:47

while a single thread in a single process is responsible for writing transactions, the transactor can still take advantage of multiple cores for other purposes, right?

favila16:10:46

datomic peer can really light up the cores IME

favila16:10:15

until it's io bound

neverfox19:10:39

I’m trying to get Datomic working in k8s with Cassandra as the storage service, my peers cannot find the transactor after connecting to storage. I’ll describe the setup:

neverfox19:10:35

Here’s the transactor config:

protocol=cass
    host=0.0.0.0
    alt-host=datomic
    port=4334
    license-key=<redacted>
    cassandra-table=datomic.datomic
    cassandra-host=cassandra
    cassandra-port=9042
    memory-index-threshold=32m
    memory-index-max=512m
    object-cache-max=1g

favila19:10:46

@roman host and alt-host are used by peers to find the transactor

neverfox19:10:46

The transactor is running in its own pod and there’s a service called datomic with port 4334 that other pods can connect to

neverfox19:10:11

yes, I know but…

neverfox19:10:35

the peer reports that it’s trying to connect to the transactor at localhost and alt-host nil

neverfox19:10:44

despite what the config says

neverfox19:10:36

It’s my understanding that the transactor stores its connection info in storage and the peer picks it up there. But it appears not to get the configured information. Is that not correct? Why does the peer not try datomic:4334?

neverfox19:10:44

It’s as if the transactor’s config is not applied to storage.

favila19:10:44

a different transactor or config is running; peer is connecting to wrong storage; peer cannot resolve "datomic" to an address?

neverfox19:10:50

There is without a doubt no other transactor or no other Cassandra because in both cases I’m launching fresh pods in fresh clusters. As for the peer not resolving it, it’s not even trying it:

neverfox19:10:02

clojure.lang.ExceptionInfo: Error communicating with HOST localhost on PORT 4334 {:alt-host nil, :peer-version 2, :password "ckGYIzP97L6l+ERbbcnZuzCiSB/v3S1HfzUZdyIFLdE=", :username "f8HVTTNhf8bmknxqrO2TINx2xmPH5KAJLwQ5q/Cs0J8=", :port 4334, :host "localhost", :version "0.9.5561.59", :timestamp 1507230053031, :encrypt-channel true}

neverfox19:10:15

It believes alt-host is nil for one thing.

favila19:10:24

that could be because it didn't resolve

favila19:10:43

what is the connection string peer is using?

favila19:10:58

transactor log will echo the connection string on startup

neverfox19:10:13

datomic:<cass://cassandra:9042/datomic.datomic/<redacted>>

neverfox19:10:35

fyi, there’s a service called cassandra and the peer connects to it (clear from logs(

neverfox19:10:41

But that’s interesting what you said about nil meaning it didn’t resolve.

neverfox19:10:55

Does that host need all three datomic ports to work?

neverfox19:10:22

i.e. 4334, 4335, and 4336?

favila19:10:56

I think not. I think 4335 is dev transactor storage access, 4336 is the h2 GUI console when using dev storage

neverfox19:10:09

That’s what I thought

neverfox19:10:16

so that’s not the problem then

favila19:10:26

what if you don't use 0.0.0.0 as the host

favila19:10:40

does anything change?

neverfox19:10:54

I first tried with localhost

neverfox19:10:05

do you mean leave it out?

favila19:10:13

I mean use something routeable

neverfox19:10:35

hmm, given that the IP is dynamic, how?

favila19:10:45

generate the transactor.properties on startup

favila19:10:27

I'm mostly suspicious that the wrong config file is pulled or another transactor is running against that storage

neverfox19:10:28

But here’s what’s strange. I’ve run this in k8s before just fine with this setup when it was the dev transactor without having to do anything that complex.

neverfox19:10:42

only when the storage is separate am I running into this.

favila19:10:10

dev transactor's "storage" is the peer, so your connection string routes to that peer already

neverfox19:10:13

There’s only one transactor pod. I don’t know where it’s even possible for there to be another.

favila19:10:25

outside the k8 cluster

neverfox19:10:28

that’s true, good point

favila19:10:30

maybe a forgotten test

neverfox19:10:35

the cluster is locked down

neverfox19:10:54

there’s no way in without port forwarding and I’m the sole person using the cluster

favila19:10:11

maybe you could kill all transactors you know about, try to connect a peer, see if you get a different error

neverfox19:10:14

but you’re right that it’s a mystery

favila19:10:22

if you get same error, then there's definitely a transactor somewhere

neverfox19:10:57

Same error, but that doesn’t make any sense. This is a completely fresh Minikube with nothing on it by Cassandra. Is it possible that a transactor that is no longer running but was once running connected to Cassandra and left it’s connection info there and it’s just not getting updated?

neverfox19:10:08

when I launch a fresh one

neverfox19:10:58

nothing but cassandra and the peer, that is

favila19:10:41

I don't know. it's not what I would expect to happen

neverfox19:10:48

I appreciate your help however

favila19:10:00

ps axf | grep datomic doesn't show anything? you mentioned port forwarding. maybe you tried a transactor outside the cluster earlier and forgot about it?

favila19:10:17

you can also inspect the cassandra table itself, see if it's getting written to

favila19:10:46

transactors write at least once a second to heartbeat

neverfox19:10:50

That’s reasonable but I’m not currently port-forwarding

neverfox19:10:01

I just mean in theory the only way in is such

favila19:10:10

you an also double-check your paths for your config file

neverfox19:10:18

That’s a good idea

favila19:10:20

(for the transactor startup)

neverfox19:10:40

Wouldn’t it have failed to start though if that had been wrong?

neverfox19:10:45

BEcause of the license

favila19:10:58

maybe you have two?

favila19:10:06

some editing shuffle

neverfox19:10:19

these are good suggestions

favila19:10:20

or forgot to save

favila19:10:26

just covering bases

neverfox19:10:35

no, I appreciate it

favila19:10:43

localhost and alt-host nil are suspicous

neverfox19:10:52

I know, right?

neverfox19:10:44

the ps checks out

favila19:10:44

can also confirm in the logs that the transactor did actually startup and connect to cassandra. maybe it never did and the settings in there are from an earlier test, like you suggested

favila19:10:06

after that, I'm out of ideas

neverfox19:10:41

well, here are the logs:

neverfox19:10:50

Launching with Java options -server -Xms4g -Xmx4g -XX:+UseG1GC -XX:MaxGCPauseMillis=50
Starting datomic: ...
System started datomic:

neverfox19:10:52

that’s it

favila19:10:13

that's the systemd logs, not the transactor logs

neverfox19:10:02

that’s just what is produced to stdout

neverfox19:10:08

I should be looking for a file then?

favila19:10:56

I'm trying to find the defaults

neverfox19:10:14

does the name of the properties file matter, i.e. the config file?

favila19:10:17

as long as it matches what was supplied, no

neverfox19:10:01

Right, and I’m giving it config/transactor.properties and that’s the only file at that location

neverfox19:10:35

next up is examining the table

favila19:10:06

bin/logback.xml has the log config

neverfox19:10:53

duh it’s just the log dir

neverfox19:10:09

everything in there seems clean

neverfox19:10:03

it discovers and connects to the Cass cluster nodes

neverfox19:10:35

also datomic.transactor - {:event :transactor/start, :args {:cassandra-port 9042, :cassandra-table "datomic.datomic", :log-dir "log", :alt-host "datomic", :protocol :cass, :rest-alias "cass", :memory-index-max "512m", :cassandra-host "cassandra", :port 4334, :memory-index-threshold "32m", :data-dir "data", :object-cache-max "1g", :host "0.0.0.0", :version "0.9.5561.59", :encrypt-channel true}, :pid 1, :tid 12}

neverfox19:10:46

so it knows the alt-host here

favila19:10:50

so something is wrong with the peer or the networking setup for the peer

neverfox19:10:01

this is what’s in the Cass table: {:key "[\\"0.0.0.0\\" \\"datomic\\" 4334 \\"XnH1k0PQkm/Hz/Y4ISZpE7fpHcBMP7ui8pz8wwNcPXk=\\" \\"afmWR0zouI8Bee4gld/5zM48H4pecIzmHeNjeCODSfI=\\" 1507233334931 \\"0.9.5561.59\\" true 2]"}

neverfox19:10:12

I think you’re right.

favila19:10:41

yeah that all looks good, transactor is definitely connecting

favila19:10:27

you can look in the peer (java) logs too for more info

favila19:10:36

you should see events like :peer/get-connection :coord/lookup-transactor-endpoint, and :peer/hornet-connect and :peer/hornet-connect-failed

favila19:10:51

these will tell you more than the exception

neverfox20:10:17

I’m currently suppressing them in logback but that’s easy to change

neverfox20:10:35

now it’s just working

favila20:10:42

hah, that's great

timgilbert20:10:46

I'm having some trouble trying to start a dev transactor inside of docker-compose. Well, it's not trouble exactly...

timgilbert20:10:47

I'm able to start the transactor and connect to it OK and it seems to work, but every time I connect to it I get bunches of these tracebacks in the peer:

ERROR 16:54:30.264 o.a.activemq.artemis.core.client: AMQ214016: Failed to create netty connection
java.nio.channels.UnresolvedAddressException: null
	at sun.nio.ch.Net.checkAddress(Net.java:123)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
	at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:208)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:203)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1226)
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:549)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:534)
	at io.netty.handler.ssl.SslHandler.connect(SslHandler.java:438)
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:549)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:534)
	at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
	at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:549)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:534)
	at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:516)
	at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:970)
	at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:215)
	at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:402)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
	at java.lang.Thread.run(Thread.java:745)

timgilbert20:10:03

My transactor props looks like this:

protocol=dev
host=0.0.0.0
alt-host=dev-datomic
port=4334
...and the docker-compose bit looks like this:
services:
  dev-datomic:
    image: ""
    ports:
      - "4334-4336:4334-4336"
    volumes:
      - "datomic-data:/opt/datomic/data"

timgilbert20:10:27

From the peer I'm connecting to datomic: