Notion Iframe

This link section is inspired by the ones from my favourite bloggers such as gwern, guzey or nintil. It presents a semi up-to-date list of my most interesting reads of the last few months.

October 2023

Phi-1.5 Model: A Case of Comparing Apples to Oranges?

https://pratyushmaini.github.io/phi-1_5/

Flash-Decoding for long-context inference

https://pytorch.org/blog/flash-decoding/

RingAttention

https://arxiv.org/abs/2310.01889
The urge to go full tri dao et al and port that thing from Jax to a CUDA/Triton kernel…
This would not only enable RingAttention to scale the sequence length by the number of devices used during training, but potentially also achieve higher a Model FLOPs utilization than FlashAtention-2 by computing the full transformer block in a blockwise manner in one kernel
You could fine-tune a CodeLLaMA 7B to a 4million token context window with just 32x A100s to literally fit every code repository in the context…

It’s time to be a definite techno-optimist

https://a16z.com/the-techno-optimist-manifesto/

June 2023

Large Language Models can Simulate Everything

https://kliu.io/post/llms-can-simulate-everything/
It might be time to build a General LLM Company — a virtual company of LLMs, with each “employee” specialized into a particular task.

Large Language Models as Tool Makers

https://arxiv.org/abs/2305.17126
In similar fashion to the recent Voyager paper

Blockwise Parallel Transformer for Long Context Large Models:

https://arxiv.org/abs/2305.19370
Created the the urge in me to go full Tri Dao et al and write a custom kernel for this neat trick of applying blockwise computation also to the FeedForward network

May 2023

Jason Wei’s response to emergent abilities of LLMs are a mirage argumgents: https://www.jasonwei.net/blog/common-arguments-regarding-emergent-abilities

April 2023

Scaffolded LLMs are not just cool toys but actually the substrate of a new type of general-purpose natural language computer

https://www.beren.io/2023-04-11-Scaffolded-LLMs-natural-language-computers/

March 2023

Is ChatGPT 175 Billion Parameters? Technical Analysis

https://orenleung.super.site/is-chatgpt-175-billion-parameters-technical-analysis
Interesting counterarguments in the comments: https://twitter.com/O42nl/status/1631820805972668416

A step towards self-improving LLMs

https://finbarr.ca/self-improving-LLMs/

Alexey Guzey’s Lifehacks: https://guzey.com/lifehacks/

huge L for Chomsky: https://scottaaronson.blog/?p=7094

“like the Jesuit astronomers declining to look through Galileo’s telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn’t predict and that doesn’t fit their worldview.”

The Waluigi Effect of LLMs: https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post

I stopped myself from reading the waluigi post until today because I don’t really think its beneficial for the space to make up such words where no one outside the LW sphere understands anything (even tho the term is quite self explanatory). But I have to admit its a really good post. Go check it out.

Could you train a ChatGPT-beating model for $85,000 and run it in a browser?

https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-browser/

July 2022

The effective altruist work ethic and the spirit of utilitarianism https://www.dwarkeshpatel.com/p/ea-billionaires?r=2jrlm&s=w&utm_campaign=post&utm_medium=web

The Track Record of Futurists Seems ... Fine https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/

Balaji’s new book the Network State: https://thenetworkstate.com/

June 2022

Gwern’s GPT-3 2nd Anniversary predictions: https://www.reddit.com/r/mlscaling/comments/uznkhw/gpt3_2nd_anniversary/

Check here for a summary: https://twitter.com/johannes_hage/status/1530898189162782721

April 2022

DeepMind releases new scaling laws that contradict with the ones from OpenAI: https://arxiv.org/abs/2203.15556

PaLM - 540B parameter model by Google AI: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html

DALL-E-2 by OpenAI: https://openai.com/dall-e-2/

Second order effects of the rise of large language models:https://twitter.com/russelljkaplan/status/1513128005828165634

March 2022

Georg Hotz - Ride or Die: https://return.life/2022/03/07/george-hotz-comma-ride-or-die/

Super realistic AI takeoff scenario by gwern, based on current models and scaling effects: https://www.lesswrong.com/posts/a5e9arCnbDac9Doig/it-looks-like-you-re-trying-to-take-over-the-world

Examples of barbell strategies for everyday life: https://dwarkeshpatel.com/barbell-strategies/

Deep Neural Nets - 33 years ago and 33 years from now → Andrej Karpathy reimplemented one of the first neural net papers by LeCun from 1989 and analysed if we made any fundamental progress: https://karpathy.github.io/2022/03/14/lecun1989/

Directory of all Large Language Models: https://docs.google.com/spreadsheets/d/1gc6yse74XCwBx028HV_cvdxwXkmXejVjkO-Mz2uwE0k/edit#gid=0

BigScience published several interesting blog posts this month on how they are training their 176B parameter language model: https://bigscience.huggingface.co/blog

February 2022

Slate Star Codex analysis of AGI timelines: https://astralcodexten.substack.com/p/biological-anchors-a-trick-that-might

Thesis on Sleep by Alexey Guzey: https://guzey.com/theses-on-sleep/

Why Tyler Cowen with Emergent Ventures has been so successful in curating talent and why the first batch of YC and the Thiel Fellowship were so successful: https://www.highmodernism.com/blog/talentcuration

January 2022

Motivation for the roaring 20s. We choose to solve problems like alignment (+aging) not because they are easy but because they are hard!

https://www.lesswrong.com/posts/BseaxjsiDPKvGtDrm/we-choose-to-align-ai

December 2021

Cool newsletter by Sonia Joseph: https://mirror.xyz/soniajoseph.eth/AsFhFt-JOjqdyb6GCVhcNCIK6whBIZpcu_XbSvpc6W8

Sequence to understand the relationship between Progress Studies and Effective Altruism: https://www.highmodernism.com/sequence

New AGI Workshop at Mila Quebec with a bunch of videos: https://sites.google.com/mila.quebec/scaling-laws-workshop/schedule

WebGPT: Improving the factual accuracyof language models through web browsing: https://openai.com/blog/improving-factual-accuracy/

Gopher: DeepMinds 280B parameter model with new SOTAs across the board: https://deepmind.com/blog/article/language-modelling-at-scale

First published work by Aleph Alpha: https://arxiv.org/pdf/2112.05253.pdf

New developments on nuclear reactors in Wyoming! https://www.terrapower.com/natrium-demo-kemmerer-wyoming/

October 2021

Deep Learning Diminishing Returns → Important piece but I don't agree with a lot of stuff in this

https://spectrum.ieee.org/deep-learning-computational-cost

Whole Brain Emulation → No Progress on C. elgans After 10 Years:

https://www.lesswrong.com/posts/mHqQxwKuzZS69CXX5/whole-brain-emulation-no-progress-on-c-elgans-after-10-years
Also check out the Reddit discussion about the WBE topic:

https://www.reddit.com/r/slatestarcodex/comments/q0hlyh/whole_brain_emulation_no_progress_on_c_elegans/

For fellow ML Engineers - How to Train Really Large Models on Many GPUs?

https://lilianweng.github.io/lil-log/2021/09/24/train-large-neural-networks.html

Your chance to invest in a Longevity company that went trough YC with a reasonable valuation:

https://wefunder.com/gerostate.alpha

How to Train Large Deep Learning Models as a Startup

https://www.assemblyai.com/blog/how-to-train-large-deep-learning-models-as-a-startup/

The Vitalik Buterin Fellowships in AI Existential Safety:

https://grants.futureoflife.org/

530B parameter language model by Microsoft + NVIDIA: https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/

Also check out this awesome thread about the model: https://twitter.com/BlancheMinerva/status/1447560921530896389

State of AI Report 2021: https://www.stateof.ai/

Bryan Johnson measuring all his 70+ organs to maximally reverse the quantified biological age of each:

https://blueprint.bryanjohnson.co/

Awesome video about the Scaling Laws: https://www.youtube.com/watch?v=StLtMcsbQes

Just Ask for Generalization by Eric Jang: https://evjang.com/2021/10/23/generalization.html

September 2021

In What Sense is Matter 'Programmable'? → A lot of interesting ideas inspired by David Deutsch ideas in the Beginning of Infinity: https://jaredtumiel.github.io/blog/2021/08/14/programmable-matter.html

Founder of NeuraLink is building a new company called Science: https://maxhodak.com/nonfiction/2021/09/03/science.html

Summary of Sam Altman Q&A on AGI predictions and GPT-4: https://www.lesswrong.com/posts/aihztgJrknBdLHjd2/sam-altman-q-and-a-gpt-and-agi

August 2021

New chip cluster that will make 120 trillion parameter models possible (almost 100x from GPT-3): https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/

Scott Alexander on AGI risks: https://astralcodexten.substack.com/p/highlights-from-the-comments-on-acemoglu + his answers to the comments are super interesting: https://astralcodexten.substack.com/p/highlights-from-the-comments-on-acemoglu

We need founder-led Biotech companies: https://www.pillar.vc/news/the-future-of-biotech-is-founder-led/

If Einstein Had The Internet: An Interview With Balaji Srinivasan:

https://sotonye.substack.com/p/if-einstein-had-the-internet-an-interview

July 2021

Must read on the arguments for a slow AGI takeoff: https://sideways-view.com/2018/02/24/takeoff-speeds/

Prompt design of neural networks (learning how to talk to an AI) will be a superpower in the future. Adding "dramatic atmospheric ultra high definition free desktop wallpaper" to the prompt for CLIP produces much more realistic images: https://ai-weirdness.ghost.io/the-art-of-asking-nicely/

There will be a dope Ethereum documentary: https://ethereumfilm.mirror.xyz/3SV8gLXHIW8Ot45h3RL7aOgDINxN2hjLfFVOvyatB2A

Funniest AI blog post I've ever read: https://blog.eleuther.ai/year-one/

How to apply for an insane amount of free TPUs as an ML Engineer: https://blog.gpt4.org/jaxtpu

Building Europes AGI: https://www.aleph-alpha.de/

Putting the power of AlphaFold into the world’s hands: https://deepmind.com/blog/article/putting-the-power-of-alphafold-into-the-worlds-hands

June 2021

https://breakthroughinitiatives.org/ → Research initiative that wants to go to Alpha Centauri by 2060 via an ultra-light uncrewed space flight at 20% of the speed of light

https://jacobjackson.com/cross-entropy/ → inexperienced engineers tend to undervalue simplicity → a justification of the cross entropy loss

This link section is inspired by the ones from my favourite bloggers such as gwern, guzey or nintil. It presents a semi up-to-date list of my most interesting reads of the last few months.

October 2023

Phi-1.5 Model: A Case of Comparing Apples to Oranges?

https://pratyushmaini.github.io/phi-1_5/

Flash-Decoding for long-context inference

https://pytorch.org/blog/flash-decoding/

RingAttention

https://arxiv.org/abs/2310.01889
The urge to go full tri dao et al and port that thing from Jax to a CUDA/Triton kernel…
This would not only enable RingAttention to scale the sequence length by the number of devices used during training, but potentially also achieve higher a Model FLOPs utilization than FlashAtention-2 by computing the full transformer block in a blockwise manner in one kernel
You could fine-tune a CodeLLaMA 7B to a 4million token context window with just 32x A100s to literally fit every code repository in the context…

It’s time to be a definite techno-optimist

https://a16z.com/the-techno-optimist-manifesto/

June 2023

Large Language Models can Simulate Everything

https://kliu.io/post/llms-can-simulate-everything/
It might be time to build a General LLM Company — a virtual company of LLMs, with each “employee” specialized into a particular task.

Large Language Models as Tool Makers

https://arxiv.org/abs/2305.17126
In similar fashion to the recent Voyager paper

Blockwise Parallel Transformer for Long Context Large Models:

https://arxiv.org/abs/2305.19370
Created the the urge in me to go full Tri Dao et al and write a custom kernel for this neat trick of applying blockwise computation also to the FeedForward network

May 2023

Jason Wei’s response to emergent abilities of LLMs are a mirage argumgents: https://www.jasonwei.net/blog/common-arguments-regarding-emergent-abilities

April 2023

Scaffolded LLMs are not just cool toys but actually the substrate of a new type of general-purpose natural language computer

https://www.beren.io/2023-04-11-Scaffolded-LLMs-natural-language-computers/

March 2023

Is ChatGPT 175 Billion Parameters? Technical Analysis

https://orenleung.super.site/is-chatgpt-175-billion-parameters-technical-analysis
Interesting counterarguments in the comments: https://twitter.com/O42nl/status/1631820805972668416

A step towards self-improving LLMs

https://finbarr.ca/self-improving-LLMs/

Alexey Guzey’s Lifehacks: https://guzey.com/lifehacks/

huge L for Chomsky: https://scottaaronson.blog/?p=7094

“like the Jesuit astronomers declining to look through Galileo’s telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn’t predict and that doesn’t fit their worldview.”

The Waluigi Effect of LLMs: https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post

I stopped myself from reading the waluigi post until today because I don’t really think its beneficial for the space to make up such words where no one outside the LW sphere understands anything (even tho the term is quite self explanatory). But I have to admit its a really good post. Go check it out.

Could you train a ChatGPT-beating model for $85,000 and run it in a browser?

https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-browser/

July 2022

The effective altruist work ethic and the spirit of utilitarianism https://www.dwarkeshpatel.com/p/ea-billionaires?r=2jrlm&s=w&utm_campaign=post&utm_medium=web

The Track Record of Futurists Seems ... Fine https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/

Balaji’s new book the Network State: https://thenetworkstate.com/

June 2022

Gwern’s GPT-3 2nd Anniversary predictions: https://www.reddit.com/r/mlscaling/comments/uznkhw/gpt3_2nd_anniversary/

Check here for a summary: https://twitter.com/johannes_hage/status/1530898189162782721

April 2022

DeepMind releases new scaling laws that contradict with the ones from OpenAI: https://arxiv.org/abs/2203.15556

PaLM - 540B parameter model by Google AI: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html

DALL-E-2 by OpenAI: https://openai.com/dall-e-2/

Second order effects of the rise of large language models:https://twitter.com/russelljkaplan/status/1513128005828165634

March 2022

Georg Hotz - Ride or Die: https://return.life/2022/03/07/george-hotz-comma-ride-or-die/

Super realistic AI takeoff scenario by gwern, based on current models and scaling effects: https://www.lesswrong.com/posts/a5e9arCnbDac9Doig/it-looks-like-you-re-trying-to-take-over-the-world

Examples of barbell strategies for everyday life: https://dwarkeshpatel.com/barbell-strategies/

Deep Neural Nets - 33 years ago and 33 years from now → Andrej Karpathy reimplemented one of the first neural net papers by LeCun from 1989 and analysed if we made any fundamental progress: https://karpathy.github.io/2022/03/14/lecun1989/

Directory of all Large Language Models: https://docs.google.com/spreadsheets/d/1gc6yse74XCwBx028HV_cvdxwXkmXejVjkO-Mz2uwE0k/edit#gid=0

BigScience published several interesting blog posts this month on how they are training their 176B parameter language model: https://bigscience.huggingface.co/blog

February 2022

Slate Star Codex analysis of AGI timelines: https://astralcodexten.substack.com/p/biological-anchors-a-trick-that-might

Thesis on Sleep by Alexey Guzey: https://guzey.com/theses-on-sleep/

Why Tyler Cowen with Emergent Ventures has been so successful in curating talent and why the first batch of YC and the Thiel Fellowship were so successful: https://www.highmodernism.com/blog/talentcuration

January 2022

Motivation for the roaring 20s. We choose to solve problems like alignment (+aging) not because they are easy but because they are hard!

https://www.lesswrong.com/posts/BseaxjsiDPKvGtDrm/we-choose-to-align-ai

December 2021

Cool newsletter by Sonia Joseph: https://mirror.xyz/soniajoseph.eth/AsFhFt-JOjqdyb6GCVhcNCIK6whBIZpcu_XbSvpc6W8

Sequence to understand the relationship between Progress Studies and Effective Altruism: https://www.highmodernism.com/sequence

New AGI Workshop at Mila Quebec with a bunch of videos: https://sites.google.com/mila.quebec/scaling-laws-workshop/schedule

WebGPT: Improving the factual accuracyof language models through web browsing: https://openai.com/blog/improving-factual-accuracy/

Gopher: DeepMinds 280B parameter model with new SOTAs across the board: https://deepmind.com/blog/article/language-modelling-at-scale

First published work by Aleph Alpha: https://arxiv.org/pdf/2112.05253.pdf

New developments on nuclear reactors in Wyoming! https://www.terrapower.com/natrium-demo-kemmerer-wyoming/

October 2021

Deep Learning Diminishing Returns → Important piece but I don't agree with a lot of stuff in this

https://spectrum.ieee.org/deep-learning-computational-cost

Whole Brain Emulation → No Progress on C. elgans After 10 Years:

https://www.lesswrong.com/posts/mHqQxwKuzZS69CXX5/whole-brain-emulation-no-progress-on-c-elgans-after-10-years
Also check out the Reddit discussion about the WBE topic:

https://www.reddit.com/r/slatestarcodex/comments/q0hlyh/whole_brain_emulation_no_progress_on_c_elegans/

For fellow ML Engineers - How to Train Really Large Models on Many GPUs?

https://lilianweng.github.io/lil-log/2021/09/24/train-large-neural-networks.html

Your chance to invest in a Longevity company that went trough YC with a reasonable valuation:

https://wefunder.com/gerostate.alpha

How to Train Large Deep Learning Models as a Startup

https://www.assemblyai.com/blog/how-to-train-large-deep-learning-models-as-a-startup/

The Vitalik Buterin Fellowships in AI Existential Safety:

https://grants.futureoflife.org/

530B parameter language model by Microsoft + NVIDIA: https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/

Also check out this awesome thread about the model: https://twitter.com/BlancheMinerva/status/1447560921530896389

State of AI Report 2021: https://www.stateof.ai/

Bryan Johnson measuring all his 70+ organs to maximally reverse the quantified biological age of each:

https://blueprint.bryanjohnson.co/

Awesome video about the Scaling Laws: https://www.youtube.com/watch?v=StLtMcsbQes

Just Ask for Generalization by Eric Jang: https://evjang.com/2021/10/23/generalization.html

September 2021

In What Sense is Matter 'Programmable'? → A lot of interesting ideas inspired by David Deutsch ideas in the Beginning of Infinity: https://jaredtumiel.github.io/blog/2021/08/14/programmable-matter.html

Founder of NeuraLink is building a new company called Science: https://maxhodak.com/nonfiction/2021/09/03/science.html

Summary of Sam Altman Q&A on AGI predictions and GPT-4: https://www.lesswrong.com/posts/aihztgJrknBdLHjd2/sam-altman-q-and-a-gpt-and-agi

August 2021

New chip cluster that will make 120 trillion parameter models possible (almost 100x from GPT-3): https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/

Scott Alexander on AGI risks: https://astralcodexten.substack.com/p/highlights-from-the-comments-on-acemoglu + his answers to the comments are super interesting: https://astralcodexten.substack.com/p/highlights-from-the-comments-on-acemoglu

We need founder-led Biotech companies: https://www.pillar.vc/news/the-future-of-biotech-is-founder-led/

If Einstein Had The Internet: An Interview With Balaji Srinivasan:

https://sotonye.substack.com/p/if-einstein-had-the-internet-an-interview

July 2021

Must read on the arguments for a slow AGI takeoff: https://sideways-view.com/2018/02/24/takeoff-speeds/

Prompt design of neural networks (learning how to talk to an AI) will be a superpower in the future. Adding "dramatic atmospheric ultra high definition free desktop wallpaper" to the prompt for CLIP produces much more realistic images: https://ai-weirdness.ghost.io/the-art-of-asking-nicely/

There will be a dope Ethereum documentary: https://ethereumfilm.mirror.xyz/3SV8gLXHIW8Ot45h3RL7aOgDINxN2hjLfFVOvyatB2A

Funniest AI blog post I've ever read: https://blog.eleuther.ai/year-one/

How to apply for an insane amount of free TPUs as an ML Engineer: https://blog.gpt4.org/jaxtpu

Building Europes AGI: https://www.aleph-alpha.de/

Putting the power of AlphaFold into the world’s hands: https://deepmind.com/blog/article/putting-the-power-of-alphafold-into-the-worlds-hands

June 2021

https://breakthroughinitiatives.org/ → Research initiative that wants to go to Alpha Centauri by 2060 via an ultra-light uncrewed space flight at 20% of the speed of light

https://jacobjackson.com/cross-entropy/ → inexperienced engineers tend to undervalue simplicity → a justification of the cross entropy loss