Fork me on GitHub
#beginners
<
2022-05-13
>
leifericf07:05:40

I feel like sharing some thoughts from my recent experience using Clojure to talk to Google’s APIs. Using https://developers.google.com/api-client-library/java via Clojure has been a sobering experience. Not because Clojure is flawed, but because the Java library is clunky and difficult to use. More specifically, I struggled to figure out which classes to import and instantiate and how to combine them to achieve the desired result. I primarily worked with Java documentation and Java code examples, “translating” these to Clojure. It’s as if the Java library is its own “mini domain-specific language.” After struggling for a few days, I wound up using https://github.com/googleapis/google-api-python-client for authentication and simple HTTP requests (as opposed to the Python API client library) for the rest of the stuff. I have the most experience with Python. As a dynamically typed and interpreted language, using Google’s client library for authentication was way more straightforward than using the equivalent Java library. My main takeaways from this experience: • Clojure supports Java libraries, and the syntax for doing so is very concise and consistent! ◦ Two things that tripped me up were / (as opposed to .) for accessing static fields and methods and $ for accessing inner classes. • It’s relatively easy to pull Java libraries via deps.edn and use them. • The Java libraries themselves can be pretty complicated and unintuitive to use. ◦ More specifically, it’s challenging to figure out which classes to import from the Java library and how to combine them. ◦ Layers upon layers of classes and special objects… Try to use one; it requires three more. And so on. • Clojure does not “fix” poorly designed Java libraries, but it makes it easier to explore and play around with said libraries in the REPL to learn how they work. • The resulting code is way shorter after porting it from Java to Clojure (less code). • Interacting with APIs via Clojure’s REPL is smoother than re-running scripts with print statements in Python. • Using simple HTTP requests and basic data structures requires less effort than client libraries (even from Python). • Trying to learn new programming languages and methods for real-world business projects is exceedingly stressful with impatient business stakeholders breathing down your neck. Bear in mind that I have no prior experience with Java whatsoever. But I have some experience with C# .NET, which is quite similar. I also toyed with the idea of using Google’s API client for Python via Clojure 😅 But I have limited time to experiment with this project (impatient business stakeholders). When I’m done with this data science project in Python, I might spend some time porting it to Clojure and writing a blog post to highlight the differences. Perhaps it could serve as a “reference” for an end-to-end data science project, starting from ingesting data from an external API, storing and preparing the data for machine learning, training some machine learning models, using said models for inference, and visualizing the data to business stakeholders. This project involves retrieving 260.000 ratings and reviews for 450 physical stores (from 4 different Google accounts) and doing some NLP-ish stuff to figure out what customers are talking about (topic modeling) and their experience (sentiment analysis). Then visualize this information to the people responsible for our in-store shopping experience. The end goal is to find ways to improve the customer experience based on their actual feedback and measure improvements.

gratitude 7
🙏 3
plexus08:05:50

Sorry to hear this has been such a struggle. The google Java libraries are a perfect example of what happens when you take totally reasonable HTTP APIs, and try to wrap them into something that feels familiar to Java developers. There's a layer underneath where it's all just data and very close to the philosophy of Clojure, but then there's this OO layer on top which is pretty awkward and tedious to work with from Clojure. I don't know if you're aware because it's not very well advertised, but there's a project https://github.com/ComputeSoftware/gcp-api, inspired by Cognitect's aws-api, that provides a much more natural interface for using GCP APIs from Clojure. Sadly it's quite common that you have to peel away a few layers when using Java APIs, to get to the simpler stuff underneath.

👀 2
leifericf08:05:56

Oh, cool! I was not aware of that project. I’ll check it out! I think I’ll stick to using simple HTTP requests, as I only need three methods (accounts, locations, and reviews). The latter method is not publicly available and must be enabled through an application process for an organization. The setup in Google Cloud Console was also quite cumbersome (enabling the right APIs, creating the right users, granting the correct roles for access, and so on). After mucking about for a while, I discovered that the client library was only necessary to get the OAuth 2.0 credentials and an access token. The rest can be done with simple HTTP requests. I used the Python client library to get the OAuth 2.0 credentials and serialized the object to disk for reuse. According to Google support, their reviews API does not support service users, so I had to use a “personal” JSON access token and the “installed app flow.” This means that a human being must refresh the credentials when needed, so retrieving the data from Google’s API cannot be fully automated for an enterprise setup. Their documentation does not specify this, so it took a while to figure out why their API was not returning any data or any error message (just an empty list). Such is the life of a “full-stack industrial data scientist” in retail. We’re expected (unrealistically) to function as back-end developers, data engineers, machine learning engineers, statisticians, business analysts, data visualization experts, and front-end developers. 😂 This is why I think Python and Clojure, as general-purpose languages with a wealth of libraries, are fantastic tools for this kind of “hacky/ad hoc” work.

❤️ 1
Bart Kleijngeld12:05:22

Interesting to hear about your journey. I would definitely be interested in reading that blog post. If you end up writing it, please share!

👍 2
1
seancorfield16:05:00

Google's API docs are horrendous IMO and their Java libraries are all over the map! We use several Google cloud APIs and they're all different. The "main" API we use has "deprecated" notices all over it -- but it's what nearly all Google's own docs link to and the "new" version that the API's docs link to... we never managed to get it to work at all. Several of their APIs claim you need to use specific auth workflows but we've found that the GOOGLE_APPLICATION_CREDENTIALS env var pointing to a .json credentials file we generated for one service actually works for (nearly?) all of their Java SDKs, despite what the various docs say. It's a giant mess!

👍 2
JohnJ16:05:54

some java libs are hell, makes me want to quit clojure hehe

😅 1
leifericf16:05:01

I feel like I’ve done myself (and Clojure) a disservice by choosing this as my first real project 😂

didibus17:05:44

Welcome to Java 😂 That's why Java devs love Clojure, to get away as much as possible from just that and make it easier with interactive REPL and less code. I'd be interested in your blog post. I'd also enjoy you talking about trying to use the Python API using libpython-clj from Clojure and compare the pain points.

👍 2
kennytilton14:05:05

Tbh, @UL05W6AEM, you have merely become a true Lisper: when a Lisper wraps a foreign library, they don't just wrap it, they also create a much more accessible library.

popeye16:05:35

I have a function which has map inside map as below, Is there anyway where we can improve this ?

(map (fn [row]
             (map (fn [name]
                            ;; logic) (vals row))) rows)

seancorfield16:05:42

(for [row rows name (vals row)] logic)

popeye16:05:55

for is causing performance issue, lot of data is freezing my page

popeye16:05:18

is it good to have map inside map?

seancorfield17:05:56

The two expressions above should be pretty much equivalent. No idea what you mean about "lot of data is freezing my page"

popeye17:05:09

rows has lots of data! so using map is better or for ?

seancorfield17:05:37

They are pretty much identical.

seancorfield17:05:54

Is this Clojure or ClojureScript?

popeye17:05:04

in clojurescript

seancorfield17:05:33

Ah, I wonder if for is different in cljs? I don't do any cljs.

seancorfield17:05:06

I would expect them to be the same though. Maybe ask in #clojurescript?

popeye17:05:35

since map is lazysequesce , Ithink it is better to make use of map , anythoughts?

yuhan17:05:57

I think the original code should be translated as

(for [row rows]
  (for [name (vals row)]
    logic))
Not sure if that's causing your issues

🙌 1
yuhan17:05:16

for also returns lazy sequences

seancorfield17:05:35

For is also lazy. They are basically identical.

seancorfield17:05:48

@UCPS050BV ah, yes, there is a difference in nesting - my bad.

seancorfield17:05:39

My version produces a single sequence of results. The nested map and nested for versions produce a sequence of sequences. Sorry.

🙌 1
yuhan17:05:03

Yup, having multiple sequences in a for binding vector is equivalent to outer mapcat 's

yuhan17:05:14

I'm not sure if cljs has `*print-length*` , which could help with performance