Fork me on GitHub
#xtdb
<
2023-11-21
>
Soumil Shekdar10:11:02

Business Intelligence Tools for XT? We were wondering what the best methods to enable BI on XT are. Currently we use looker which accesses data ETL'd primarily from Relational DBs into Amazon Redshift. However with XT, and even document stores generically had a few questions on what solutions could be used to enable BI: 1. Does looker support defining views or model on non relational databases? - it would be possible for us to just form a relational DB with fields + schema we primarily care about, but I would assume that sorta beats the point of Documents 2. Is there a different suit of BI tools for Document stores? 3. How would we represent timestamps? i.e. would valid time v.s. system time just be values in the document itself or would "valid time" be the primary timestamp? Just examples + thoughts on how to ETL XT + enable BI (ideally through looker) would be appreciated 🙂

refset15:11:16

Hey @U05EY098T09 I've not used Looker before, but I think you have quite a few options to consider. For instance, are you storing anything more complex in XTDB than JSON types? If not, you could ETL everything into a Redshift "documents" table. I'm aware that there's a Mongo integration for Looker, and that probably showcases the state of the art in what's possible (given Mongo is also a document database) although I don't know how realistic it would be to do something similar with XT directly - like you probably are better off ETL'ing the data into Redshift in one form or another than trying to construct a direct integration. But I'd be happy to discuss the options in depth if you like?

refset15:11:46

How fresh do you need the data to be available for BI?

seancorfield17:11:49

I'm curious if XT 2.x and SQL would be a possibility, depending on the timeline of the project?

refset17:11:00

> I'm curious if XT 2.x and SQL would be a possibility, depending on the timeline of the project? definitely! even 1.x with the Calcite-based SQL module could be a possibility 🙂

1
Soumil Shekdar11:11:02

> anything more complex in XTDB than JSON types What do you mean by complex? Our documents do have a defined schema, but there are many different schemas. So for example lets say we were storing car information. We have schemas for:

type car {
  wheels string
}

type suv extends car {
  all_wheel_drive bool
}

type ev extends car {
  battery_capacity float
}
I could think of two ways to ETL: 1. Create tables for suv, ev, car based on fields in the schema - so the transform layer will query documents and form rows 2. Replicate the entire document in redshift, and let the BI tool define tables, views, etc. for analysis > although I don't know how realistic it would be to do something similar with XT directly - like you probably are better off ETL'ing the data into Redshift in one form or another Bit confused here - so what you would recommend is ETL to redshift as documents, and then looker interfaces with documents in redshift right?

Soumil Shekdar11:11:06

> But I'd be happy to discuss the options in depth if you like? Yes, also okay with a quick chat if simpler (although think having a log in slack would be useful for folks) I was also curios what might be the best way to enable lets say a lambda or glue job to have low latency (local network) access to our XT instance? since we would ideally not want to expose the XT instance as an external server, and then make API calls for Query right?

refset11:11:47

> What do you mean by complex? ah I just mean you will need to have a good understanding of the CASTs needed to handle the non-JSON types (e.g. the float in your schema) on read - per https://www.googlecloudcommunity.com/gc/Modeling/Defining-JSON-Objects-as-LookML-Dimensions/m-p/579279 > 2. Replicate the entire document in redshift, and let the BI tool define tables, views, etc. for analysis I've no concrete idea how well this works in practice, but it's probably what I would try/research first > so what you would recommend is ETL to redshift as documents, and then looker interfaces with documents in redshift right? 👍 > the best way to enable lets say a lambda or glue job to have low latency (local network) access to our XT instance? is XT running embedded in a JVM app, or are you attempting to do everything over HTTP? If it's the former, then you might want to expose curated endpoints. And if it's the latter than you might want to write some kind of transient service that connects via HTTP, queries whatever it needs, and serves a static file response. In terms of networking/permissions/IAM etc. I would treat it like a regular app server regardless

thanks3 1
Soumil Shekdar17:11:12

Thank you - this is helpful will complete this thread with our architecture + learnings once imlpemented

🙏 1