This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2024-05-10
Channels
- # ai (11)
- # babashka (14)
- # beginners (10)
- # biff (3)
- # clj-kondo (6)
- # clojure (34)
- # clojure-austin (1)
- # clojure-europe (23)
- # clojure-gamedev (16)
- # clojure-korea (2)
- # clojure-norway (13)
- # clojure-romania (1)
- # clojure-uk (4)
- # community-development (26)
- # cursive (11)
- # emacs (19)
- # fulcro (105)
- # gratitude (1)
- # honeysql (16)
- # jobs (1)
- # malli (17)
- # off-topic (5)
- # portal (8)
- # reitit (8)
- # remote-jobs (1)
- # shadow-cljs (30)
- # xtdb (10)
- # yamlscript (14)
Is this the main AI channel on Clojurians at this point? Last I checked there was another one that was partly to do with helping to improve how well Clojure is represented in tools like copilot. I don't see that now, though. Anyway! For anyone interested in interpretability aka 'WTF is actually going on inside these models', I wanted to point out these two new overview papers. Obviously academic papers aren't the most accessible way to learn about stuff (& I can point to more accessible intros to most of it), but it's cool to have some broad overviews of the field. https://arxiv.org/abs/2404.14082 is relatively high-level, and focuses specifically on mechanistic interpretability, aka 'figuring out an actual bottom-up deterministic / mechanical understanding of some small feature.' https://arxiv.org/abs/2405.00208 goes into greater technical depth, broadens the scope to interpretability techniques in general (ie not just mech interp), and has an interesting section on what those techniques have actually allowed us to learn. They're both ~30 pages (plus references etc). I haven't read all of either one, but skipping around in them, they both seem pretty good! PS -- I've shifted my primary focus this year to doing AI safety / alignment research. If anyone's interested in knowing more about that area, reply in-thread or PM me, I'm happy to share info. The field needs more smart people, especially programmers!
Many thanks for these updates about mech interp, I hope to look š. #C054XC5JVDZ is probably the channel you remembered, and I think it would appreciate this kind of updates.
> If anyone's interested I'm curious to hear more. At some point, it'd be nice to explore using tools like https://github.com/TransformerLensOrg/TransformerLens from the Clojure REPL.
Oh right, #C054XC5JVDZ, thanks! My eyes went right past it because they were looking for 'AI'. I'll post it there as well. > I'm curious to hear more. For sure! I'm totally happy to help provide info & answer questions; like I said, I'd love to see more people involved š For getting started playing with mech interp, there's a pretty wide consensus that the https://www.arena.education/ notebooks are the best place to start, in particular https://arena3-chapter1-transformer-interp.streamlit.app/~/+/. I also personally like Neel Nanda's https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/main/demos/Exploratory_Analysis_Demo.ipynb that picks a smallish problem in mech interp and uses TransformerLens to start exploring it. It's a nice feature of the interpretability landscape that people tend to publish notebooks so that you can start by just duplicating & running them and then starting to tweak from there; https://www.lesswrong.com/posts/iSJrd3TE6Pd3ctyaD/useful-starting-code-for-interpretability is a list I put together of such notebooks. They're all Python, sigh, but I don't think it would be too hard to rewrite one in Clojure. If you're considering doing something with TransformerLens, you may also want to look at http://nnsight.net/, which a lot of people have been liking lately -- I haven't tried it myself. Just LMK if there's other info that would be of use!
That is so helpful! I haven't looked into this area for a few months, looking to come bark. Many thanks, @U077BEWNQ.
Entirely my pleasure! Don't hesitate to hit me up with questions. A couple of other things that could be of use: ā¢ Neel Nanda has some great youtube videos on various mech interp topics, and is also fairly entertaining as a bonus. ā¢ There's a mech interp slack where a lot of the core folks hang out (https://join.slack.com/t/opensourcemechanistic/shared_invite/zt-2ieq5rm9j-QQnffO4iQsn4kRjFJitxwg) and also a discord that seems like a more hobbyist vibe although I haven't spent much time there (https://discord.gg/JjpMH5rf). ā¢ Dictionary learning with sparse autoencoders is very much the new hotness; because of polysemanticity, LLM neurons often aren't that interpretable, but using SAEs to learn features (in an unsupervised way) results in much more interpretability. There's some debate about how faithful SAE features are, but I'd say most researchers are pretty convinced. If that's new to you, take a look at https://transformer-circuits.pub/2023/monosemantic-features/index.html (and https://transformer-circuits.pub/2022/toy_model/index.html if you want more background); they're both long but pretty readable IMHO. ā¢ https://www.neuronpedia.org/ is the coolest ever toy for mech interp -- you can look at neurons or SAE features in GPT-2-small, see what texts most activate a neuron/feature, see what neurons/features are most activated by a particular text, etc. It's cool enough that some research can be done using just Neuronpedia, although of course there's lots of additional benefit to digging deeper into circuits and doing experiments to check causality.
you may be interested to read our approach on solving the AI safety problem: https://www.stardog.com/blog/safety-rag-improving-ai-safety-by-extending-ais-data-reach/
SRAG seems interesting, though 'hallucinations are bad for business and threaten AI acceptance' is definitely not the sort of safety I'm thinking of š
Unrelatedly: > Dictionary learning with sparse autoencoders is very much the new hotness Fascinating new paper out today from Anthropic where they apply this technique to a production model, Claude 3 Sonnet (the middle-sized of their Claude 3 models). Summarized here: https://www.anthropic.com/research/mapping-mind-language-model