data-science

luke 2024-12-06T17:12:46.410789Z

Getting started using dtype.next and tech.ml.dataset.... really awesome stuff, thank you @chris441! One question I can't really find from the docs... is there any kind of optimized support for columns whose values are arrays (i.e, the whole column is a tensor?) Or is an object column containing an array best approach at this time?

genmeblog 2024-12-06T17:44:07.974049Z

There is a tensor datatype. https://cnuernber.github.io/dtype-next/tech.v3.tensor.html

luke 2024-12-06T17:54:06.514749Z

Yes, but you can't use an instance of it as a column:

Execution error (IllegalArgumentException) at tech.v3.dataset.io.mapseq-colmap/column-map->dataset$fn (mapseq_colmap.clj:122).
No matching clause: :tensor

genmeblog 2024-12-06T17:55:48.956959Z

You can't indeed. But you can convert dataset to tensor and reverse.

chrisn 2024-12-06T22:26:26.429259Z

That is a bug - mapseq-colmap should support the tensor argtype 🙂 - for its purposes it should interpret that as a reader.

chrisn 2024-12-06T22:27:27.306509Z

Well - I guess it depends which dimension you want the dataset to expose as the column rows.

chrisn 2024-12-06T22:29:51.498289Z

user> (def tens (dtt/->tensor (partition 3 (range 36))))
#'user/tens
user> tens
#tech.v3.tensor<object>[12 3]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]
 [15 16 17]
 [18 19 20]
 [21 22 23]
 [24 25 26]
 [27 28 29]
 [30 31 32]
 [33 34 35]]
user> (ds/->dataset {:a tens})
_unnamed [1 1]:

|                            :a |
|-------------------------------|
| #tech.v3.tensor<object>[12 3] |
| [[ 0  1  2]                   |
|  [ 3  4  5]                   |
|  [ 6  7  8]                   |
|  [ 9 10 11]                   |
|  [12 13 14]                   |
|  [15 16 17]                   |
|  [18 19 20]                   |
|  [21 22 23]                   |
|  [24 25 26]                   |
|  [27 28 29]                   |
|  [30 31 32]                   |
|  [33 34 35]]                  |
user> (ds/->dataset {:a (dtt/rows tens}))
Syntax error reading source at (REPL:81:40).
Unmatched delimiter: }
user> (ds/->dataset {:a (dtt/rows tens)})
_unnamed [12 1]:

|                         :a |
|----------------------------|
| #tech.v3.tensor<object>[3] |
| [0 1 2]                    |
| #tech.v3.tensor<object>[3] |
| [3 4 5]                    |
| #tech.v3.tensor<object>[3] |
| [6 7 8]                    |
| #tech.v3.tensor<object>[3] |
| [9 10 11]                  |
| #tech.v3.tensor<object>[3] |
| [12 13 14]                 |
| #tech.v3.tensor<object>[3] |
| [15 16 17]                 |
| #tech.v3.tensor<object>[3] |
| [18 19 20]                 |
| #tech.v3.tensor<object>[3] |
| [21 22 23]                 |
| #tech.v3.tensor<object>[3] |
| [24 25 26]                 |
| #tech.v3.tensor<object>[3] |
| [27 28 29]                 |
| #tech.v3.tensor<object>[3] |
| [30 31 32]                 |
| #tech.v3.tensor<object>[3] |
| [33 34 35]                 |
user> (ds/->dataset {:a (dtt/columns tens)})
_unnamed [3 1]:

|                                 :a |
|------------------------------------|
| #tech.v3.tensor<object>[12]        |
| [0 3 6 9 12 15 18 21 24 27 30 33]  |
| #tech.v3.tensor<object>[12]        |
| [1 4 7 10 13 16 19 22 25 28 31 34] |
| #tech.v3.tensor<object>[12]        |
| [2 5 8 11 14 17 20 23 26 29 32 35] |
user> 

luke 2024-12-07T03:56:13.100969Z

When you say it's a bug, do you mean it'd be helpful for me to put together a minimal test case & submit a report (and possibly try to debug myself?) Or do you have a pretty good idea of what's going on? (using dtt/rows is a good workaround though, at least semantically... I don't have a big enough brain to reason about whether there are perf gains to be had from using a true tensor))

2024-12-10T14:09:25.991549Z

Create an issue here pointing to this thread might help already: https://github.com/techascent/tech.ml.dataset/issues

respatialized 2024-12-06T17:29:41.681699Z

data analysis and visualization in this report brought to you by tablecloth, vega-lite, and clerk: https://hellgatenyc.com/nypd-shotspotter-data-report/

🖤 3
Nick McAvoy 2024-12-06T18:56:51.913579Z

Do you have a way to unlock the article so we can read it?

✔️ 1
respatialized 2024-12-06T19:16:22.508389Z

https://bds.org/assets/files/Brooklyn-Defenders-ShotSpotter-Report.pdf

❤️ 1
Nick McAvoy 2024-12-06T19:18:32.548859Z

Thanks, and great work! Loved your conj talk

🙏🏻 1
Harold 2024-12-08T16:53:11.681679Z

NICE!