Fork me on GitHub
#data-science
<
2023-09-12
>
respatialized14:09:54

I am wondering about custom indexing in tech.ml.dataset. The repo history has a https://github.com/techascent/tech.ml.dataset/pull/214 that references the ability to add custom indices (including my use case, spatial indices) to a column, but as far as I can tell the functions implementing those capabilities were https://github.com/techascent/tech.ml.dataset/commit/bd47ec2f10e1f92e45a48d0e93b5d21c6d6f63bb when the project migrated over to HAMF. Is there a new way of using custom indices for a column?

genmeblog14:09:33

I think the approach I described https://github.com/techascent/tech.ml.dataset/pull/214#issuecomment-804790512 is still valid. You can create an external index and use it to select data. For spacial data STRtree structure may be helpful.

respatialized14:09:19

Yes, it seems like the approach you describe in this https://gist.github.com/genmeblog/18d6ed84224cbbef656adac4d85cc7ec can still work – seems much better than what i was doing before, which was storing the index object itself as Column metadata.

👍 1
respatialized17:09:31

I guess the fact that joins aren't there is a useful exercise for the reader (me) 🙂

genmeblog17:09:07

Do you need joins on indexes?

respatialized17:09:58

For stuff like point-in-polygon joins between point and areal data, yeah

respatialized17:09:36

One could also imagine a version of asof-join based on distance, etc

👍 1