Fork me on GitHub
#datascript
<
2018-02-27
>
thedavidmeister01:02:34

datascript doesn't let you query old versions anyway, because it's all in memory

thedavidmeister01:02:44

why can't you just transact new maps into entities as they become available?

devn03:02:36

@thedavidmeister the data is basically a dump of a couple database tables. right now, it is just a snapshot of the tables at a point in time. the difference between the current version and the previous version might include updated entities, and without just having the delta of updated and created entities, we'd need to look at every map and create-or-update, no?

thedavidmeister03:02:56

if you have no sense of "identity" for each map

thedavidmeister03:02:02

every list of maps is a new list

thedavidmeister03:02:19

there is no meaningful concept of "update" 😕

devn03:02:25

they have IDs 🙂

thedavidmeister03:02:38

well then can you map the IDs to :db/id in datascript?

devn03:02:51

i don't know why i didn't think of that

devn03:02:56

but lol yes, i could

devn03:02:06

i was adding tempids to everything like a bozo

thedavidmeister03:02:33

i mean, i don't know if datascript supports your ids

devn03:02:39

they're ints

thedavidmeister03:02:41

but if they're numbers i don't see why it wouldn't work

thedavidmeister03:02:00

although it's probably technically not supported

devn03:02:03

what about across different tables though? if one of them has a colliding ID

devn03:02:29

🙂 yeah

thedavidmeister03:02:32

whatever your definition of identity is

thedavidmeister03:02:41

if your definition of identity is the ID, and the IDs collide, then they are the same entity, by definition

devn03:02:59

i suppose i could get clever and do something like use an offset for each table to prevent collision

devn03:02:04

table 1: (- (:id {:id 1}) 10000) table 2: (- (:id {:id 1}) 20000)

thedavidmeister03:02:21

datascript does something like that internally to differentiate between eids and txn ids

thedavidmeister03:02:53

but really, this is pretty contextual at this point

thedavidmeister03:02:03

there's probably no perfect answer

devn03:02:08

though i do have 100s of thousands of some of these maps, so it might need to be more clever than that

thedavidmeister03:02:28

that should still be fine

devn03:02:35

yes, basically im looking to try and fit a whole lot of batched data into a tight, queryable package

thedavidmeister03:02:36

js max int is something like 10^15

thedavidmeister03:02:52

can it be two different dbs?

devn03:02:13

per table, or are you talking about splitting and querying across dbs?

devn03:02:33

i will need to do joins occasionally across the imported data

thedavidmeister03:02:43

this is getting to where i'm not 100% sure

thedavidmeister03:02:00

i have a feeling it's possible to stick multiple dbs into a query

thedavidmeister03:02:03

but i haven't done it

devn03:02:21

yes, i knew this was possible in datomic

thedavidmeister03:02:24

not sure if datascript follows that

devn03:02:29

any thoughts on speeding up the import?

devn03:02:28

right now im just doing a (d/with-db (d/empty-db) (concat stuff more-stuff)) in a background thread, and swapping the value

thedavidmeister03:02:01

so you're building a new db from scratch every time

devn03:02:19

yes, and so i guess what you mentioned above would be worth trying.

thedavidmeister03:02:19

why not just (transact! conn [...])

thedavidmeister03:02:38

provided you have the db/id bit working, it should ignore anything that is the same

thedavidmeister03:02:56

i have no idea which is faster

thedavidmeister03:02:05

but transact! certainly seems more idiomatic

devn04:02:58

finding out now 🙂

devn04:02:26

loading 150k maps takes ~42 seconds

devn04:02:45

basically the same with d/db-with

thedavidmeister04:02:53

it's probably doing similar things under the hood

thedavidmeister04:02:04

what about incremental updates?

thedavidmeister04:02:11

is it any faster than the initial load?

devn04:02:23

i only added one record, but it didn't appear so no

thedavidmeister04:02:41

it took 42 seconds to add 1 record?

devn04:02:19

oh, no, what im saying is i had a collection, and i changed a record in it but left it's ID the same, and then ran transact! over the whole collection again

devn04:02:00

i was curious to see if it'd be able to quickly tell it didn't need to do anything with most of the maps

thedavidmeister04:02:18

i wonder how long it takes to compare those maps outside datascript

thedavidmeister04:02:34

maybe you can do the comparison yourself and just transact the diff

devn04:02:05

i also wonder if the types on the value side of a map have anything to do with performance

thedavidmeister04:02:15

like, clojure.set/difference or something

devn04:02:23

like are instants more expensive?

thedavidmeister04:02:59

maybe, i'd imagine iso8601 strings are faster to compare, but i don't know

devn04:02:56

I can probably slim some of these maps down.

devn04:02:07

I bet that would have some appreciable impact

thedavidmeister04:02:07

yeah i mean, at some point you just run into the limits of comparing lots of data

devn06:02:40

From the datascript tutorial in reading: > Datom has (-hash) and (-eqiv) methods overloaded so only [e a v] take part in comparisons. It helps to guarantee that each datom can be added to the database only once. Does this mean that a retracted datom cannot be added after retraction?

devn06:02:32

Are retraction and excision synonymous?

devn06:02:09

> When datom is removed from a DB, there’s no trace of it anywhere. Retracted means gone.

devn06:02:07

That makes it sounds like excision and retraction are the same, but perhaps I’m just confused by the role of :added in light of the comment about e a v comparison operations.

thedavidmeister07:02:52

datascript has no history

thedavidmeister07:02:21

i think a lot of the datascript API is to line up with datomic API rather than achieve a specific goal for datascript itself

thedavidmeister07:02:54

if you wanted to sync datascript with datomic via transaction reports you'd need :added i think

rauh07:02:47

@devn There is also conn-from-datoms in case you want to manually import. It's pretty fast. I use it to bootstrap my DB on CLJS.

fellshard16:02:57

Datomic maintains history; you could rewind and look at the database at a given time, meaning all datoms retracted between that previous time and now would be visible again. Retraction is basically adding new information: "This datom no longer holds true; it used to up 'til now, but now it does not." Excision is much more dangerous, because it removes any trace that the datom was ever there, both currently and historically. Datascript maintains no history, as thedavidmeister said, so it doesn't necessarily make that distinction