Fork me on GitHub
#aws
<
2023-05-10
>
Audy20:05:59

I’m getting this error and having trouble authenticating to access dynamoDB 🧵

{:__type com.amazon.coral.service#UnrecognizedClientException, :message The security token included in the request is invalid., :cognitect.anomalies/category :cognitect.anomalies/incorrect}

Audy20:05:22

I have a role, with sts and dynamodb permissions as well as an AWS_WEB_IDENTITY_TOKEN_FILE. I’m trying to use those fetch credentials.

Audy20:05:28

This is deployed in k8s eks if that matters.

Audy20:05:33

Here are some code that shows how its fetching credentials:

(defn- default-credentials-provider []
  (let [provider          (DefaultAWSCredentialsProviderChain.)
        credentials       ^AWSCredentials (.getCredentials provider)
        access-key-id     (.getAWSAccessKeyId credentials)
        secret-access-key (.getAWSSecretKey credentials)]
    (credentials/basic-credentials-provider
     {:access-key-id     access-key-id
      :secret-access-key secret-access-key})))

(defn assumed-role-credentials-provider
  "make a credentials provider that can assume a role"
  [role-arn web-identity-token]
  (let [sts (aws/client {:api :sts
                         :credentials-provider (default-credentials-provider)})]
    (credentials/cached-credentials-with-auto-refresh
     (reify credentials/CredentialsProvider
       (fetch [_]
         (when-let [creds (:Credentials
                           (aws/invoke sts
                                       {:op      :AssumeRoleWithWebIdentity
                                        :request {:RoleArn          role-arn
                                                  :WebIdentityToken web-identity-token
                                                  :RoleSessionName  (str (gensym "some-session-"))}}))]
           {:aws/access-key-id     (:AccessKeyId creds)
            :aws/secret-access-key (:SecretAccessKey creds)
            :aws/session-token     (:SessionToken creds)
            ::credentials/ttl      (credentials/calculate-ttl creds)}))))))

defn create-provider 
  []
  (assumed-role-credentials-provider (System/getenv "AWS_ROLE_ARN") 
                                     (slurp (System/getenv "AWS_WEB_IDENTITY_TOKEN_FILE")))

Audy20:05:45

very similar to how the assume_role_example in the docs show, https://github.com/cognitect-labs/aws-api/blob/main/examples/assume_role_example.clj, but I’m going the AssumeRoleWithWebIdentity

Audy20:05:34

Then when trying to access dynamodb:

(def dynamodb-client
  (aws/client {:api :dynamodb
               :region "us-east-1"
               :credentials-provider (create-provider)}))

Audy20:05:43

with a simple list table command

(defn list-tables
  []
  (aws/invoke dynamodb-client {:op :ListTables}))
I get the error in the original post. seems like its fetched credentials, but the token is invalid. I’ve also seen the error of Unable to fetch credentials. See log for more details. so its a step above that at least

Audy00:05:32

This may not be a Clojure issue, but maybe a deployment issue, where there maybe a disconnect or something that is preventing me from authenticating to AWS ¯\(ツ)/¯ but if someone has experienced this, let me know

Audy02:05:37

It was a dependency issue picard-facepalm the aws-api is asking for an older version of data.xml

dchelimsky16:05:29

Which version of aws-api, and which version of data.xml?

Audy16:05:43

I am using [com.cognitect.aws/api "0.8.656"] and had [org.clojure/data.xml "0.0.8"], but I just removed the data.xml dependency which I wasn’t really using directly

dchelimsky22:05:24

So you're all set at this point?

Audy15:05:29

Yes I am able to access dynamodb 👍

👍 2
Audy19:05:40

Now authenticating, I’m having trouble with one of the operations TransactGetItems

(def c (aws/client {:api :dynamodb
                    :region "us-east-1"}))

(aws/invoke c
            {:op :TransactGetItems
             :request {:TransactItems [{:Get {:TableName "my-table"
                                              :Key {:pk {:S "some-id"}}}}]}})
I’m pretty sure I have the shape correct in the request but still getting this error pk is the name of the column for my partition key, so that :Key value seems right
{:__type "com.amazonaws.dynamodb.v20120810#TransactionCanceledException", :CancellationReasons [{:Code "ValidationError", :Message "The provided key element does not match the schema"}], :Message "Transaction cancelled, please refer cancellation reasons for specific reasons [ValidationError]", :cognitect.anomalies/category :cognitect.anomalies/incorrect}

Audy23:05:34

looks like with this op, you need the sk as well… I think what I want here is BatchGetItem, so disregard this above

Audy22:07:09

While I got this to work on EKS, I’m now seeing The security token included in the request is invalid after long running pods ~24h. It seems that maybe the way I’m fetching credentials-provider is incorrect? What can cause this to not work after some time? I’ve verified that the AWS_WEB_IDENTITY_TOKEN_FILE got refreshed and has a new token…

gordon17:07:54

Take a look at this gist to see if it's helpful to you, I wrote it a couple years ago to solve a similar problem: https://gist.github.com/gws/130ad8bfec5495c25c3dbc0ed2a69d42

Audy04:07:55

Hey gordon, thanks for the reply! I will definitely take a look. What is baffling is that after 2d7h of the pods running, I just hit my API expecting it to fail with the error, but didn’t… I’m scratching my head over here thinking-face

Audy18:07:35

Took a look at the gist and I’m basically doing the same thing, except I’m explicitly using AssumeRoleWithWebIdentity API call instead of running through the chain-credentials-provider. would it be beneficial to have it go through the different credentials provider options knowing that the web token refreshes correctly and is a valid form of obtaining temporary creds to access AWS services? I definitely can try that as that is might last option left. It did fail again btw 😢 Just odd that after its long running, it fails, which tells me something is caching and not refreshing.

Audy18:07:37

The only thing I can think of is the cached-credentials-with-auto-refresh that may not be refreshing?

gordon19:07:23

I'm pretty sure you're pulling the token out of AWS_WEB_IDENTITY_TOKEN_FILE once when you construct the provider

gordon19:07:33

You need to be slurping that again on each refresh

gordon19:07:27

Note the location of slurp in the gist I linked

👀 2
Audy14:08:12

Hmm ok, interesting, I’ll change it up similarly to how its done on the gist and hope this one works. 🤞

scottbale16:08:27

Sorry to be late to this thread. @UPFES57NE did @U1GEY70F5’s suggestion help? (Thanks @U1GEY70F5!)

2
Audy16:08:07

Hey scott, thanks for replying. So far, so good. My pods have been running for ~36h and still accessing AWS services. The real test will be tomorrow to see if its still working. Curious though as to why gordon’s code of maybe where the slurp is versus the code snippet I posted makes a difference.

scottbale17:08:51

@UPFES57NE > I’ve verified that the AWS_WEB_IDENTITY_TOKEN_FILE got refreshed and has a new token… So the key thing, then, is that the AWS_WEB_IDENTITY_TOKEN_FILE file has been updated. But your program snippet is only slurping the AWS_WEB_IDENTITY_TOKEN_FILE once, at the start. Whereas @U1GEY70F5’s example re-`slurp`s the AWS_WEB_IDENTITY_TOKEN_FILE every time the periodic refresh triggers a new fetch of credentials from STS service. That fetching of a new temporary credential from STS is going to need the up-to-date web identity token.

2
Audy18:08:31

Yea that makes sense, I would’ve thought that the slurp at the top level function when it gets called should suffice. Like when accessing a service like dynamo, when creating the client, it calls the create-provider fn where the AWS_WEB_IDENTITY_TOKEN_FILE gets slurped

defn create-provider 
  []
  (assumed-role-credentials-provider (System/getenv "AWS_ROLE_ARN") 
                                     (slurp (System/getenv "AWS_WEB_IDENTITY_TOKEN_FILE")))

(def dynamodb-client
 (aws/client {:api :dynamodb
              :region "us-east-1"
              :credentials-provider (create-provider)}))

Audy18:08:44

then the token gets passed down to :AssumeRoleWithWebIdentity operation

scottbale19:08:33

It may be that your dynamodb client just hasn't happened to fail yet, I think the same situation applies. The slurp results in a copy of the AWS_WEB_IDENTITY_TOKEN_FILE being brought into memory, like a snapshot of the file at that moment. But if that file is later being updated (and I'm assuming that's happening by some mechanism outside the scope of your program that you've posted), then the in-memory value is out of date and another slurp needs to happen.

Audy20:08:35

I guess I’m assuming that every request that comes in, would call the create-provider and re-`slurp` the token file regardless of where slurp is storing the token. It may seem like thats not the case if @U1GEY70F5’s code works tomorrow still. So in a way, having the slurp at the entry function of creating a provider is almost like caching the token for all incoming request when the app initializes. And having the slurp at the low level operation of when its fetching creds will actually read the token file…

Audy20:08:00

What is also weird is in the early stages of developing this application, I had a delay in the clients:

(def dynamodb-client
 (delay
   (aws/client {:api :dynamodb
                :region "us-east-1"
                :credentials-provider (create-provider)})))
And I would notice that it would fail between 48h-55h, which tells me that it slurped the new token

Audy20:08:25

and the invocation/operation would be like

(defn query
  []
  (aws/invoke @dynamodb-client
              {:op      :Query
               :request {…}}}}))
and I had it working and continually developed the application, which would re-deploy new changes and not until I had a stable version is when the security token issue arose.

scottbale20:08:42

> I guess I’m assuming that every request that comes in, would call the create-provider ... The create-provider function is only called once, at the time the client is created. That function returns an object that reifies the CredentialsProvider protocol, and it's the fetch function of that protocol that gets invoked with every request.

scottbale20:08:45

In other words, the credential provider is an object that repeatedly fetches new credentials...but the provider itself was created once and initialized with data (the role-arn and the web-identity-token) whose values do not change (in the current implementation).

scottbale20:08:56

The workaround that @U1GEY70F5 has suggested is, instead of passing around a web-identity-token value, instead pass around a java.io.File reference to the AWS_WEB_IDENTITY_TOKEN_FILE and re-read that file (via slurp) whenever fetch is invoked, ensuring your program always has the most recent contents of that file.

Audy20:08:52

Gotcha! That makes total sense! Man, I’ve been banging my head trying to figure out where its being “cached” cause thats what it seemed like it was doing

👍 2
scottbale20:08:55

I think the use of delay would only postpone the problem, the problem being that sooner or later a fetch is attempted with a web identity token value that no longer matches what's in the AWS_WEB_IDENTITY_TOKEN_FILE file.

👍 2
Audy17:08:51

Just want to update/maybe close the loop on this, my pods have been running for 2d13h, and its nice and healthy. Thank you @U1GEY70F5 and @U07PUGBA6 for chiming in on this issue.

woohoo 2
clojure-spin 2
Audy22:07:09

While I got this to work on EKS, I’m now seeing The security token included in the request is invalid after long running pods ~24h. It seems that maybe the way I’m fetching credentials-provider is incorrect? What can cause this to not work after some time? I’ve verified that the AWS_WEB_IDENTITY_TOKEN_FILE got refreshed and has a new token…