Fork me on GitHub
#announcements
<
2020-07-15
>
chrisn15:07:34

We have released version 3.06 of https://github.com/techascent/tech.ml.dataset. https://github.com/techascent/tech.ml.dataset is a system of dataset processing similar to Python's pandas or R's data.table where we aim to combine the performance and concision of DataFrame processing with functional paradigms and a deep integration into the Clojure language. DataFrame processing is jargon for operating on rectangles of data like an SQL table or a CSV or XLSX file. We include those pathways but extend this concept to encapsulate our favorite Clojure datastructure, a sequence of maps. We store that data in columnwise as opposed to rowwise records and this leads to substantial memory savings and some different performance tradeoffs. These past few releases (and the reason for bumping to 3.X) is that we are carefully integrating the core types deeper into the Clojure language. The base dataset object now derives from `clojure.lang.IPersistentMap` and `java.util.Map` which means that clojure.core functions meant to work on maps also now work on datasets - keys, vals, get, assoc, dissoc, and destructuring all are supported. If you destructure a dataset you get back the columns. Columns themselves efficiently implement `clojure.lang.Indexed` which means `nth` works well on them and they also override `clojure.lang.IFn` so they are also functions of their indexes. Further notes: • We have a few timeseries-specific functions - `left-join-asof` which is a take on Pandas' `merge-asof` and `fill-range-replace` which is a take on Pandas' `reindex`. • The https://github.com/scicloj/tablecloth has come a long way. We really suggest you check it out if you want a very high level interface to working with data. • Nippy https://github.com/techascent/tech.ml.dataset/blob/master/docs/nippy-serialization-rocks.md means you can efficiently load/save not only datasets themselves but heterogeneous datastructures that may have datasets in leaf nodes. • Documentation is now up on https://cljdoc.org/d/techascent/tech.ml.dataset/CURRENT/doc/readme thanks to some very much appreciated prodding from @metasoarous. • We have an API https://github.com/techascent/tech.ml.dataset/blob/master/docs/quick-reference.md for operations we found to be very commonly used. Enjoy!

👍 54
9
🤯 18
👏 12
🎉 9
🚀 3
cljdoc 3
metasoarous16:07:05

Kick ass! Thanks for the announcement, and all your work on this @chrisn!

genmeblog19:07:06

I've just updated tablecloth to catch up .dataset

djanus18:07:06

http://Soup.io, a repository for lolcontent, will be discontinued in a few days. I took the opportunity to use Clojure to download local copies of my friends' soups. Here's the resulting tool: https://github.com/nathell/soupscraper

erwinrooijakkers19:07:15

For those that use Logback and want to scrub sensitive data from logs there is https://github.com/mediquest-nl/logback-masking-pattern-layouts. Also with an encoder to mask the logs to stdout in GCP Stackdriver format. Usage in the XML-configuration of Logback via default regexes (for example passwords and names) or custom ones.

<layout class="nl.mediquest.logback.MaskingPatternLayout">
  <pattern>%.-250msg</pattern>
  <useDefaultMediquestReplacements>true</useDefaultMediquestReplacements>
  <regex>some-xml-encoded-regex</regex>
  <replacement>a-replacement</replacement>
  <regex>some-xml-encoded-regex</regex>
  <replacement>a-replacement</replacement>
</layout>
MaskingPatternLayout created using gen-class’s :extends :exposes-methods and :methods 🙂 so that add the end of the ceremony the logged message and regexes can be fed to string/replace.

Huahai20:07:05

I ported Datascript to LMDB, give Datalevin a try: https://github.com/juji-io/datalevin

👀 39
🎉 39
metal 3
👏 9
David Pham04:07:45

Any example except from tests?

Huahai18:07:31

README now contains examples, and API doc is linked as well

🎉 3