Fork me on GitHub
#data-science
<
2023-09-24
>
zane17:09:52

For tablecloth.api/select-rows, what are the valid options?

tablecloth.api/select-rows
 [ds]
 [ds rows-selector]
 [ds rows-selector options]
  Select rows using:

  - row id
  - seq of row ids
  - seq of true/false
  - fn with predicate

genmeblog17:09:22

There are four: • :select-keys • :pre • :result-type • :parallel?

zane17:09:40

Is this documented somewhere?

zane17:09:44

(Thanks!)

genmeblog17:09:56

Nope, will fix it in the next release.

genmeblog17:09:10

Only :pre is described in the docs.

genmeblog17:09:35

Wait, also :select-keys

genmeblog17:09:42

:parallel? set to true to process grouped dataset paralelly.

genmeblog17:09:47

:result-type can be set to :as-indexes to get ids of rows instead of actual rows

genmeblog17:09:50

(defonce flights (tc/dataset ""))

(tc/select-rows flights #(> (get % "dep_delay") 1000))
;; =>  [8 11]:
;;    | year | month | day | dep_delay | arr_delay | carrier | origin | dest | air_time | distance | hour |
;;    |-----:|------:|----:|----------:|----------:|---------|--------|------|---------:|---------:|-----:|
;;    | 2014 |     2 |  15 |      1003 |       994 |      DL |    JFK |  DEN |      242 |     1626 |   12 |
;;    | 2014 |     2 |  21 |      1014 |      1007 |      DL |    JFK |  MCO |      139 |      944 |    8 |
;;    | 2014 |     4 |  15 |      1241 |      1223 |      AA |    JFK |  BOS |       39 |      187 |   13 |
;;    | 2014 |     6 |  13 |      1071 |      1064 |      AA |    EWR |  DFW |      175 |     1372 |   10 |
;;    | 2014 |     6 |  16 |      1022 |      1073 |      AA |    EWR |  DFW |      178 |     1372 |    7 |
;;    | 2014 |     7 |  14 |      1087 |      1090 |      DL |    EWR |  ATL |       97 |      746 |    8 |
;;    | 2014 |     9 |  12 |      1056 |      1115 |      AA |    EWR |  DFW |      198 |     1372 |    6 |
;;    | 2014 |    10 |   4 |      1498 |      1494 |      AA |    EWR |  DFW |      200 |     1372 |    7 |

(tc/select-rows flights #(> (get % "dep_delay") 1000) {:result-type :as-indexes})
;; => (32306 37131 82591 131877 134039 158830 211512 230042)

(tc/select-rows flights '(32306 37131 82591 131877 134039 158830 211512 230042))
;; =>  [8 11]:
;;    | year | month | day | dep_delay | arr_delay | carrier | origin | dest | air_time | distance | hour |
;;    |-----:|------:|----:|----------:|----------:|---------|--------|------|---------:|---------:|-----:|
;;    | 2014 |     2 |  15 |      1003 |       994 |      DL |    JFK |  DEN |      242 |     1626 |   12 |
;;    | 2014 |     2 |  21 |      1014 |      1007 |      DL |    JFK |  MCO |      139 |      944 |    8 |
;;    | 2014 |     4 |  15 |      1241 |      1223 |      AA |    JFK |  BOS |       39 |      187 |   13 |
;;    | 2014 |     6 |  13 |      1071 |      1064 |      AA |    EWR |  DFW |      175 |     1372 |   10 |
;;    | 2014 |     6 |  16 |      1022 |      1073 |      AA |    EWR |  DFW |      178 |     1372 |    7 |
;;    | 2014 |     7 |  14 |      1087 |      1090 |      DL |    EWR |  ATL |       97 |      746 |    8 |
;;    | 2014 |     9 |  12 |      1056 |      1115 |      AA |    EWR |  DFW |      198 |     1372 |    6 |
;;    | 2014 |    10 |   4 |      1498 |      1494 |      AA |    EWR |  DFW |      200 |     1372 |    7 |

genmeblog18:09:37

:pre and :parallel has a meaning only for grouped dataset, :select-keys can be used to limit keys passed to a selecting function, it can help during processing a dataset with a lot of columns.

genmeblog20:09:08

good to call @UDRJMEFSN here

chrisn23:09:04

What is the rest of the stack trace?

zane00:09:43

@UDRJMEFSN See the last message in the thread.

chrisn00:09:52

Got it - yep - looks like issue with TMD - will be in touch