Fork me on GitHub
#datalevin
<
2023-04-24
>
vlad_poh00:04:45

How do i create a datalevin docker image for Apple Silicon?

Huahai01:04:03

There’s a way to create multi-platform docker image, but I have not get around to do it. https://docs.docker.com/build/building/multi-platform/

Huahai01:04:10

If you have done it, PR is welcome to close this: https://github.com/huahaiy/docker-datalevin/issues/2

vlad_poh02:04:09

I copied the dockerfile and tried to build it locally but kept getting the following error when i try to run it

2023-04-24 02:29:26,191 INFO spawned: 'datalevin' with pid 84
Loading initial Timbre config from: :default
Exception in thread "main" 2023-04-24 02:29:27,257 INFO success: datalevin entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
java.lang.UnsatisfiedLinkError: liblmdb.so: cannot open shared object file: No such file or directory
Library names
[lmdb]
Search paths:
[/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib, /usr/local/lib, /usr/local/lib/aarch64-linux-gnu, /lib/aarch64-linux-gnu, /usr/lib/aarch64-linux-gnu, /opt/java/openjdk/lib, /opt/java/openjdk/lib/server]
        at jnr.ffi.provider.jffi.NativeLibrary.loadNativeLibraries(NativeLibrary.java:111)
        at jnr.ffi.provider.jffi.NativeLibrary.getNativeLibraries(NativeLibrary.java:85)
        at jnr.ffi.provider.jffi.NativeLibrary.getSymbolAddress(NativeLibrary.java:64)
        at jnr.ffi.provider.jffi.NativeLibrary.findSymbolAddress(NativeLibrary.java:74)
        at jnr.ffi.provider.jffi.AsmLibraryLoader.generateInterfaceImpl(AsmLibraryLoader.java:141)
        at jnr.ffi.provider.jffi.AsmLibraryLoader.loadLibrary(AsmLibraryLoader.java:87)
and here is the Dockerfile
FROM  eclipse-temurin:17.0.6_10-jre-focal


RUN echo "#!/bin/sh\nexit 0" > /usr/sbin/policy-rc.d

RUN \
  echo "===> install Datalevin ..."  && \
  apt-get update && \
  apt-get install -y supervisor unzip wget && \
  wget  && \
  unzip dtlv-0.8.12-macos-latest-aarch64.zip -d /usr/bin/ && \
  rm dtlv*.zip &&  \
  wget -O /opt/datalevin.jar  && \
  apt-get clean && \
  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

COPY ./docker-entrypoint.sh /

ENV DATALEVIN_ROOT=/data DATALEVIN_PORT=8898

VOLUME ["/data"]

EXPOSE 8898

ENTRYPOINT ["/docker-entrypoint.sh"]

CMD ["supervisord"]
I only changed the zip file.

Huahai03:04:48

From the error, it think it is on aarch64 linux

Huahai03:04:55

docker is linux, so this is reasonable

Huahai03:04:41

you can do some investigation on how to tell docker is trying to run in apple

Huahai03:04:30

we have an issue to support aarch64 linux, https://github.com/juji-io/dtlvnative/issues/1

Huahai03:04:20

as far as I know, amd64 linux docker works fine on apple sillicon

Huahai03:04:55

BTW, what’s your planned usage? run it as a server? as command line? these are two cases that are supported differently in datalevin docker

Huahai03:04:25

the server is running the JVM datalevin, the command line is running GraalVM datalevin. They are completely different

😮 2
Huahai03:04:11

it seems that you were trying to run server, it should just work. I have colleages that run that for local development.

Huahai04:04:30

amd64 docker linux image is enabled on applie silicon by Rosetta, so it should work, no need to build your own image

Huahai04:04:03

docker run --rm -ti --platform linux/amd64 huahaiy/datalevin

Huahai04:04:10

that’s it

👍 2
denik18:04:10

@huahaiy is it normal for heap memory usage to grow over time even after GC?

denik18:04:35

running this code repeatedly lead to ever increasing heap usage. heapdump attached. version "0.8.12"

(defonce test-conn
  (d/create-conn nil))

;; run this many times while watching heap
;; in VisualVM and running GC
(dotimes [i 10000]
  (d/transact! test-conn [{:foo i}]))

denik18:04:31

each eval of the dotimes expression adds about 6megs of heap that won’t get cleared up in later GC cycles

denik18:04:22

this leads to the JVM increasing heap until ultimately failing with an outOfMemory error

Huahai19:04:59

increasing heap should be normal, as the cache is getting bigger

Huahai19:04:55

each add 6 mb sounds excessive though, if that’s the case, our datascript benchmark won’t run successfully, as it transact 100k datoms one at a time, which is more than what’s the above code does

denik19:04:21

hmm, it also happens with this code where each run should replace previous values

(defonce test-conn2
  (d/create-conn nil {:dev-id {:db/valueType :db.type/string
                               :db/unique    :db.unique/identity}}))

(dotimes [i 10000]
  (d/transact! test-conn2 [{:dev-id (str i)}]))

denik19:04:29

that shouldn’t bloat cache, right?

denik19:04:01

re: https://clojurians.slack.com/archives/C01RD3AF336/p1682363155793199?thread_ts=1682362090.719989&amp;cid=C01RD3AF336 100k tx also works. depending on the hardware this can run for a while until running out of memory

denik19:04:14

hmm, is there a way to disable cache to see if that is the issue?

Huahai19:04:24

i will need a reprdocible test case to investiagte

Huahai19:04:07

as far as I can tell, transact 10k datoms shouldn’t be a problem, as the benchmark does more than that and actually do it more than 10 times

denik19:04:07

requires visualVM to observe heap

Huahai19:04:36

i use visualvm to optimize code,

Huahai19:04:00

basically, this is how i optimize code: run datascript bench, and watch visualvm

Huahai19:04:36

mostly i focus on cpu at the moment, but it never went out of memory for me

Huahai19:04:05

there must be something else going on

denik19:04:13

I assume the hardware can handle the benchmark. however, it does seem that cache never clears unless of course the app is restarted

Huahai19:04:55

cache is thrown out for every new transaction

Huahai19:04:21

it’s an immutable cache

Huahai19:04:59

basically it’s tonsky’s cache, which I found to be more performant than a mutable one that I wrote, so I kept his

Huahai19:04:44

i think there’s something else going on, file an issue with code, I will investigate. I have found dotimes to be problematic before

Huahai19:04:43

i have not looked at the source code of dotimes, but here’s my observation: it’s faster than other looping structures in clojure, but it is problematic. I had hard time getting the concurrent write to work correctly with it, changing to another structure, no problem

denik19:04:46

also interesting, searchengine code is active even though the schema does not imply it

Huahai19:04:02

that’s interesting

denik19:04:44

adding issue

denik19:04:03

side note, I only used dotimes for a simple repro

denik19:04:57

this shows up in all kinds of other ways, e.g. transacting a large vector many times, in my app

Huahai19:04:00

To disable cache, call datalog-index-cache-limit with 0

Huahai22:04:46

I tried the code. Cannot reproduce the behavior you describes.

Huahai22:04:45

I even run this 100 times in a dotimes, the heap memory stays flat with tiny increase at about 1GB. So I don’t know what’s going on with your setup.

Huahai22:04:17

this is after click “Perform GC” button:

Huahai22:04:48

So it does reclaim heap.

Huahai22:04:28

The search engine is not active, it just takes fixed 160MB memory without using it. We could make it smaller in the future, or not initialize it if there’s no :fulltext attributes.

👌 2
Huahai22:04:00

Filed an issue for this.

Huahai22:04:21

click “Perform GC” button again:

Huahai22:04:02

Looks well behaved to me. Closing the issue.

Huahai00:04:00

It does have many more instances of various objects that shouldn’t be there. Will investigate.

Huahai02:04:26

Fixed. Will be in the next release. It turns out that we stored many versions of “store” in caches because their hash values are different.

👌 2
🚀 2
Huahai02:04:34

Thanks for reporting. This is an important catch.

🚀 2
denik04:04:58

I was running various tests today trying to better reproduce the issue. It’s a tough one since heap growth is often minimal. Anyway glad you were able to identify the underlying issue and resolve it!

👍 2
Mark Wardle21:04:26

I see lmdbjava has switched to using zig to make it easier to crosscompile the required binaries for distribution. I must admit I haven’t seen how you do this for datalevin as I seem to recall you switched away from using the lmdbjava binaries in favour of a different approach, but you might be interested? I hadn’t really looked at zig before but it looks as if it makes it very easy to cross compile. https://github.com/lmdbjava/lmdbjava/commit/288bb09f00fdfe8f4d2d062aabc33e5dcbef8af8

Huahai21:04:37

zig is something I am keeping an eye on. As to cross-compile, it is of course nice. But the whole thing still need to be tested on the real platforms to find out all the corner cases, as there are so many things that can go wrong, not just the native lib. So I don’t think cross compiling the native lib buys too much.

Huahai21:04:39

Maybe for JVM version of datalevin, we can take the cross-compiling approach, , so the library can be used in more platforms, but I don’t think it works for GraalVM datalevin.

Mark Wardle13:04:21

It looks as if it might make glibc issues easier as well.

Huahai23:04:52

Sounds good