Welcome to the newsletter!
Here we’ll be doing a monthly news roundup inclusive of my biased and wrong takes, plus maybe a remark about a non-current paper I happened to have read this month.
In this welcome edition, I won’t attempt to summarize all the news I missed before the newsletter existed, just the most recent notes.
There are no benefits to subscribing, the free tier has it all. But I enabled paid subscriptions just in case people want to signal support, to check how it affects my motivation to write these, and to see what happens.
Let me know changes I can make to maximize the impact of the newsletter by commenting here or emailing quinndougherty92 <at> gmail.
Thanks to Ben Goldhaber for the baseball metaphor idea and reviewing this post, and thanks to Steve Omohundro for suggesting the newsletter’s name, but all demerits go to me (the points go to me too btw).
Where are we? What is guaranteed safe AI?
Guaranteed-safe AI is a collection of research agendas to make the technological transition go well. There’s a video explainer here, but the main resource is the preprint position paper Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems.
Personally, I had a few years of telling people that formal verification is not the droid we’re looking for before the latest push for GSAI, so you might be wondering if my worldview changed. In my earlier writing, I was mostly responding to a very strong “just capture alignment properties in ‘specs’ and prove the implementation ‘correct’”, however I’m attracted to GSAI proposals because they have a swiss cheese matter-of-degree mindset that I don’t usually see when people get optimistic about the stronger version. Separately, proof assistants have played a big role in my career, so I’m biased toward theories of change that might leverage my experience.
Atlas Computing talk
The monthly seminar featured Evan Miyazono of Atlas Computing, which is available on youtube. The GSAI position papers are fine, but they need to be injected with a roadmap, blueprint, or next steps. Ideally, such a roadmap would go all the way down to estimates of headcount needed for different tactics. Wouldn’t it be great if someone was doing this? Anyways, major thanks to Atlas for doing this! Expect soon a document from Atlas that’s been incubating as a google doc all month.
Podcast episode
Nora Ammann and Ben Goldhaber appeared on the Cognitive Revolution podcast to discuss GSAI. The junction of world models, specifications, and assured implementations is so lofty and hairy that good science communication isn’t easy, so we need to see more of this.
QAISI launch with job description
The Quantitative Safety AI Initiative has a website up with a job posting for an AIS Research Engineer
The job says it will mostly be focusing on the Tegmark & Omohundro preprint, even though two additional authors of the position paper of mammoth author list are research leads for the orgs. Dawn Song, the one QAISI research lead not on that position paper, does a lot of security and cryptography work, but is no stranger to AI safety (she co-advised Dan Hendrycks, off the top of my head). Excited to see what the new org gets up to!
Guarantees-based mechanistic interpretability team preprint
This paper isn’t 100% relevant, since GSAI tends more toward formally verifying artifacts of models than formally verifying models. However, it represents a natural thought you’d have when you try smashing formal verification into machine learning so I’m including it. This is not a commitment to make the newsletter a big tent for any FV/ML crossover episode overall, but it can be sometimes. Gross et. al. lay the groundwork for formal proofs of model performance (lower bounds of accuracy). They accomplish this with mechinterp tricks like ablations, but since we’re dealing in proof we need pessimal ablations or ablating with worse cases rather than mean or random ablations.
I say “groundwork” because while the proofs in the current paper are “formal” by many standards, they are not yet formal in the sense of machine checked. But one step at a time, the current work seems like a big leap forward to me, and I’m skeptical of the marginal value add of putting those proofs in Coq.
See this splainer / companion piece by the authors plus this supplemental by Jacob Hilton of ARC (which funded the work).
Progress in automating mathematics
This is a very active area with either tangential or moonshot safety stories, so I’ll never be exhaustive in covering the latest happenings, but how about a tasting.
AlphaProof
Deepmind: AI achieves silver-medal standard solving International Mathematical Olympiad problems. Coupling a language model specializing in Lean with the AlphaZero RL algorithm, AlphaProof is competitive with IMO silver medalists, and very close to gold. The blog doesn’t link to a preprint, omits details about how the IMO data was Leanified, and omits the proportion of synthetic and self-play data vs other data. So it remains to be seen how significant this will be– I don’t have a strong inside view about benchmark hacking, but I know enough about the problem area to be skeptical of everything.
Remember that proofs are programs. We should expect some framing of the automated mathematics problem to do the conceptual heavy lifting for the automated software engineering problem. Reasons you’d hesitate to buy up all the yes right now is because IMO in lean is not that framing. Lean (especially Mathlib) is, contra your assumptions when you walk into a dependent type theory, optimized for classical mathematics. Without an analysis of the constructive capacities of AlphaProof’s capabilities, you shouldn’t assume that proofs are programs, machine learning may not walk down the halls of Curry-Howard just yet.
GSAI paper club?
I’m considering a monthly meetup to eat through papers. Comment here or email quinndougherty92 <at> gmail to express interest.
Shortlist of papers we could cover (not all of them timely)
Feel free to suggest others. Much of me would like to do a centralized summarizer (often me) format, but otherwise we could do a longer meetup where we read together on the call before discussing.
Funding opportunities (jk)
ARIA is currently closed, and SFF (which has some GSAI folks as panelists this year) deadline was a few weeks ago.
Neel Nanda, leading a manifund budget, informed me over discord that he’s pessimistic about GSAI and is unlikely to make grants in the area. I don’t know what other Manifund regrantors think, but there’s still a million or so in play for AI on manifund so could be worth a shot.
Coming up
You can sign up for August’s GSAI seminar here, from the authors of this post. We’ll have to see what progress has been made in any prototyping that’s been done– the post I found to be pretty pie-in-the-sky.
See you next month
Like and subscribe, etc. Also, I am not of the twitterati so I don’t know how to hang out in specific places to make this newsletter better, but it’d be cool to cite other peoples’ quick takes from time to time. To do this, I need your twitter account recommendations, so shoot those over.