Fork me on GitHub
#announcements
<
2020-06-24
>
Crispin05:06:35

Spire 0.1.0-alpha.14 released https://github.com/epiccastle/spire/releases/tag/v0.1.0-alpha.14 Fixed - download of directory without :recurse prints no error or warning - download module progress bar malformed - remove flashing output - user module does not create users home directory - user module does not alter :groups if user already exists Changed - rewrote :default output module - remove download module :flat. module should always be flat downloads - remove old spire.transfer namespace

💯 15
🎉 6
chrisn18:06:54

https://github.com/techascent/tech.ml.dataset is now at a full 2.0+ release. It is out of beta. tmd is a data frame library for clojure with efficient cpu and https://gist.github.com/cnuernber/26b88ed259dd1d0dc6ac2aa138eecf37 performance characteristics. We have great support for csv, tsv, xlsx, sequences of maps, parquet, arrow, and nippy which is extremely fast especially for large things. We also have https://github.com/techascent/tech.ml to https://xgboost.readthedocs.io/en/latest/jvm/ and the https://haifengl.github.io/. If none of that means anything to you then put another way you can load really, really large csv files (millions of rows) and work with them interactively then save them out to nippy files or postgres. We make that very easy and support most of the normal clojure.core paradigms (filter, sort, group-by) along with a whole host of special things related to dataset processing from pandas or R's data.table. You can also just take a sequence of maps and make a dataset out of them, do some work, and then get a sequence of maps back to do your next thing 🙂. https://github.com/techascent/tech.ml.dataset

👍 75
💯 27
📊 6
💥 12
gunar18:06:25

thanks for the dumbed down explanation

6
chrisn18:06:12

My pleasure-jargon is no fun for any of us 🙂.

martinklepsch21:06:07

Yeah, love this style of announcement 👏

jsa-aerial21:06:08

Also, if you like a nice dplyr like api over tmd @U1EP3BZ3Q has created a very comprehensive piece of work: https://scicloj.github.io/tablecloth/index.html. Some of this is still evolving with tmd. I can vouch for all of this as being seriously great stuff! Data science in Clojure is making some serious advances

bananadance 6
respatialized19:07:54

very exciting to see. I wasn't able to tell from my cursory look at the source code - is sparse support back on the table for this or is that still waiting on upstream changes from tech.datatype?

respatialized19:07:00

extremely interested in destroying R on its own turf for iterated matrix operations distributed across many cores.

chrisn20:07:18

I think destroying R in anything that R cares about is going to be damn hard but you can find the linear algebra libraries that support sparse operations from here: https://java-matrix.org/ Given how far along these systems are I would recommend getting your data into those libs and giving it a shot. Are you working on recommendation systems? Where is the need for sparse coming from?

respatialized21:07:58

The label propagation algorithms I've worked with often benefit from sparse data (gigantic distance matrices, etc).

seancorfield18:06:18

seancorfield/next.jdbc {:mvn/version "1.0.478"} -- database access via JDBC -- lots of enhancements since my last #announcements post! The highlights of recent releases include: - A new namespace next.jdbc.types with functions to "type-hint" values so parameters are set with a specific SQL type -- see https://cljdoc.org/d/seancorfield/next.jdbc/1.0.478/api/next.jdbc.types - A new function next.jdbc/with-options wraps a connectable with default options so you don't have to provide those options to every single SQL operation on that connectable (a much-requested feature!) - Official support for the jTDS driver (for SQL Server), for MariaDB, and for PostgreSQL 12.2.0 (previously PostgreSQL 10.11 was the targeted test version) - A new namespace next.jdbc.datafy to provide datafy/`nav` over various JDBC data types, making it possible to lazily navigate through the metadata of your database -- see https://cljdoc.org/d/seancorfield/next.jdbc/1.0.478/api/next.jdbc.datafy - Access to row-number, column-names, and metadata when reducing over the result of plan, as well as Indexed access to column values - Built-in support for Stuart Sierra's Component library (if it is on the classpath) that provides a connection pooled datasource with the start/`stop` lifecycle - Improved support for timeouts (and a whole new section of docs that talk about them) - Lots of documentation improvements! Follow-up in #sql

thanks2 75
parrot 30
bananadance 3