Fork me on GitHub
#data-science
<
2017-11-28
>
aria4202:11:34

Wanted to let everyone know about "flare" a dynamic neural net library in Clojure. It currently can do some pretty complex models and on a simple bi-directional LSTM is 3x faster than PyTorch with a clean Clojure interface: https://github.com/aria42/flare

seonhokim02:11:08

Hi great to hear that. As a general question, does it have different position from Cortex? https://github.com/thinktopic/cortex

aria4217:11:57

Cortex is very much like Keras. It's a really useful library, but it's at a higher-level than say TF or PyTorch; Flare is similar to PyTorch where you can construct new tensor operations instead of composing layers. My personal interest is being able to build novel models. I also personally think the Flare API is a little cleaner but that's very subjective

seonhokim13:12:27

Thank you so much. I’d like to try Flare 🙂

bbss16:11:52

https://deepmind.com/blog/population-based-training-neural-networks/ interesting stuff, using evolutionary algorithms to find hyperparams

Ryan Radomski17:11:43

What are some ways you all are able to build up large databases without breaking the bank? I find Clojure is really suitable for data aggregation which is fantastic, but web scrapping appears to be a legal gray area.

gigasquid17:11:16

Here are some datasets that are already out there that might be good

gigasquid17:11:13

There is also this google doc for Datasets for Machine Learning that I bookmarked but I can’t remember where it came from

dhirensr18:11:25

hey ,any one here uses OPenNLP? or any other text mining library

gigasquid18:11:47

@dhirensr I’ve used StandfordNLP

gigasquid18:11:17

wonder how it compares

gigasquid18:11:09

of course someone has already looked into that 🙂

Ryan Radomski18:11:22

@gigasquid Thanks much! I'm also really interested in aggregating novel data for people to use. Does anybody know if building sets like these is a viable part time job or is that unheard of?

Ryan Radomski18:11:54

I found quite a few postings on indeed and upwork to get myself started and I think technically speaking I could knock them out of the park, but most of what people were asking violated terms of services of various companies

aaelony18:11:57

@radomski isn't that how maxmind got started? by collecting datasets part-time.. (https://www.maxmind.com/en/geoip2-databases)

Ryan Radomski18:11:32

Maybe I should shoot them an email. I asked Semantic3 and they didn't have a response for me

aaelony18:11:57

factual too?

Ryan Radomski18:11:26

Do you happen to remember where you found that out about those two organizations? These are two valuable leads for me. If you have more information I would appreciate it

aaelony18:11:15

general industry knowledge 5+ years ago

aaelony18:11:43

I'm sure there are more

aaelony18:11:09

lots of businesses sell data products

aaelony18:11:31

I suppose perhaps the best way would be to put yourself in the shoes of someone looking to purchase such data and see if it already exists...