Fork me on GitHub
#data-science
<
2022-08-30
>
Kamuela11:08:35

When using tablecloth, can you select columns by index rather than by name? Something like (tc/select-columns #{1 2 3 4 5} ds)

genmeblog11:08:31

I don't think so. It's a hashmap (a dataset).

Kamuela11:08:04

I found this in the docs, and note the "not recommended" > Select one column using an index (not recommended) > (nth (tc/columns DS :as-seq) 2) ;; as column (iterable)

genmeblog11:08:19

Certain operations change the order of columns.

genmeblog11:08:44

I need to verify in the source code though

genmeblog13:08:12

Ok, columns are stored in the list. So when you call (tc/columns ds :as-seq) you'll get the java.util.List and you can use nth on this. I cannot guarantee that the order after set of operations is persisted. The API provides selection by name only.

jsa-aerial19:08:43

I'm unclear as to why you would want to do this, but an obvious trick would be to have your own slim wrapper function over tc/select-columns for this plus a map of int->colname. This would work even if order is not preserved after some set of operations.

Kamuela20:08:22

@U06C63VL4 it's a situation where the tabular data coming in doesn't have a headers column, so I'm trying to figure out the best way to add such a thing with tc

jsa-aerial20:08:15

There are two ways to handle that: 1. Go into the data and add a column header row - assuming this is csv/tsv 2. Specify the :header-row? false option. This will cause each column to have a name of this format column-x, where x is an integer. So, column-1, column-2, etc. You can then use tc/rename-columns to rename these to whatever you want

😮 1
Kamuela19:08:28

Whoa, that second option is amazing!