FWIW, Tidyverse is fine, but not necessary for R.
Help me out here, please, from my dated understanding of the R ecosystem, thatโs there tidyverse came up and lives, isnโt it?
Could you rephrase the question? The Tidyverse, written by the amazing and prolific Hadley Wickham, is a set of R packages that enhances the functionality of base R for data processing. It includes ggplot2, lubridate, dplyr and quite a few other packages. IMHO, ggplot2 (which actually pre-dates the notion of a tidyverse) is the most useful, and offers in depth composable plotting capabilities. The original concept of a data frame comes from base R, but some users found it confusing to use and Hadley wrote several new packages oriented around data analysis for tables (tibbles) of data with an easier (i.e. more verbose) syntax which has become popular and part of the "tidyverse". I myself prefer another library named data.table that is more similar in syntax to base R and also more performant, those less verbose. Like Clojure, the syntax of data.table has a learning curve but your code will be the same regardless of package versions. Tidyverse functions, on the other hand, are great but often less performant and may require syntax tweaks across versions over time
ah. now I understand your message. I read it as โitโs not necessary for Rโ as if tidyverse was available (as-is) in another language. but you mean itโs โnot necessary for Rโ, as in, you can do without it, in R. thanks for clarifying!
everyone is different and when some people refer to R itself they mean R + Tidyverse. I rarely use Tidyverse unless I am adding to someone else's code that already uses it. In my view, data.table is what I prefer. Ymmv. https://github.com/Rdatatable/data.table
Thanks for this discussion. Regarding the lecture miniseries we are organizing, do you find any topics or packages in the R ecosystem (tidy or not) that you think would inspire us to build Clojure equivalents?
I do. The data.table paradigm is quite powerful: # FROM[WHERE, SELECT, GROUP BY] # DT [i, j, by] is worth emulating
could become: (from where select groupby)
where the input is a datatable and so is the output, a different datatable
neatRanges might be interesting to implement as it has useful methods commonly needed when working with date ranges https://github.com/arg0naut91/neatRanges
Nice. Would you be interested in discussing data.table in a meetup?
Sure. Depending on when it is
Wonderful. The best time will be around the current hour, but on a Friday a few weeks away. That is the main time we are assigning to the R4Clj meetings. But if this does not work, we can always set up another hour for a special meeting with your presentation.
Okay, lmk when you have a hard calendar date.
Thanks. Maybe Feb 14 or 28? Or later? Would you like to give a long or short talk?
Nice, many thanks. Yes, let us leave the date open for now and talk again a little later, as the series of meetups continues.
There are also several excellent Matt Dowle talks... e.g. I like this one but there are more recent ones too https://m.youtube.com/watch?v=qLrdYhizEMg
Here are a few examples of using data.table. I used the same datasets used in the 100-walkthrough for tech.ml.dataset:
r
options(width = 300)
library(data.table)
## install.packages("remotes")
## remotes::install_github("HenrikBengtsson/R.utils")
## Let's try to mirror the analysis for tmd:
d <- data.table::fread("")
head(d)
str(d)
mycols <- c("SalePrice", "1stFlrSF", "2ndFlrSF")
d[1:5, mycols, with=FALSE ]
## remotes::install_github("ycphs/openxlsx")
library(openxlsx)
d_xls <- as.data.table(openxlsx::read.xlsx(""))
class(d_xls)
str(d_xls)
data.table::setnames(d_xls, old= c("Date"), new= c("Date_string"))
d_xls[, date := as.Date(Date_string)]
str(d_xls)
d[, c("Id", "OverallQual", "SalePrice")]
d_stocks <- data.table::fread("")
## MSFT price moments
d_stocks[symbol == 'MSFT', .(N = .N, min_price = min(price), mean_price = mean(price), median_price = median(price), max_price = max(price))]
d_stocks[symbol == 'MSFT', .(N = .N, min_price = min(price), mean_price = mean(price), median_price = median(price), max_price = max(price)), by = "symbol"]
d_stocks[symbol == 'MSFT', .(N = .N, min_price = min(price), mean_price = mean(price), median_price = median(price), max_price = max(price)), by = c("symbol", "date")]
d_stocks[symbol == 'MSFT'][, year := year(as.Date(date, c('%b %d %Y') ))][, .(N = .N, min_price = min(price), mean_price = mean(price), median_price = median(price), max_price = max(price)), by = c("symbol", "year")]
d_stocks[, year := year(as.Date(date, c('%b %d %Y') ))][, .(N = .N, min_price = min(price), mean_price = mean(price), median_price = median(price), max_price = max(price)), by = c("symbol", "year")]
data.table::fwrite(d_stocks, file = "the-stocks.tsv.gz", sep="\t", compress = "gzip")
## Same moments for all symbols
d_stocks[, .(N = .N, min_price = min(price), mean_price = mean(price), median_price = median(price), max_price = max(price)), by = "symbol"]
## Sorting
d_stocks[, .(N = .N, min_price = min(price), mean_price = mean(price), median_price = median(price), max_price = max(price)), by = "symbol"][order(-mean_price)] Hope that is helpful
I can give a short explanation of how to use it. it is a great package if you use R. Lately, I am using Python though because employers demand it. Your original question was "Would you be interested in discussing data.table in a meetup". Feb 14 or 28 are okay, but in my time zone these are work hours so I won't be sure until the dates get closer. Evenings or weekends have less risk of being preempted. I would also need to get set up on whatever platforms you are using that may or may not work on my pop!os system