This link section is inspired by the ones from my favourite bloggers such as gwern, guzey or nintil. It presents a semi up-to-date list of my most interesting reads of the last few months.
June 2023
- Large Language Models can Simulate Everything
- https://kliu.io/post/llms-can-simulate-everything/
- It might be time to build a General LLM Company — a virtual company of LLMs, with each “employee” specialized into a particular task.
- Large Language Models as Tool Makers
- https://arxiv.org/abs/2305.17126
- In similar fashion to the recent Voyager paper
- Blockwise Parallel Transformer for Long Context Large Models:
- https://arxiv.org/abs/2305.19370
- Created the the urge in me to go full Tri Dao et al and write a custom kernel for this neat trick of applying blockwise computation also to the FeedForward network
May 2023
- Jason Wei’s response to emergent abilities of LLMs are a mirage argumgents: https://www.jasonwei.net/blog/common-arguments-regarding-emergent-abilities
April 2023
- Scaffolded LLMs are not just cool toys but actually the substrate of a new type of general-purpose natural language computer
March 2023
- Is ChatGPT 175 Billion Parameters? Technical Analysis
- https://orenleung.super.site/is-chatgpt-175-billion-parameters-technical-analysis
- Interesting counterarguments in the comments: https://twitter.com/O42nl/status/1631820805972668416
- A step towards self-improving LLMs
- Alexey Guzey’s Lifehacks: https://guzey.com/lifehacks/
- huge L for Chomsky: https://scottaaronson.blog/?p=7094
- “like the Jesuit astronomers declining to look through Galileo’s telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn’t predict and that doesn’t fit their worldview.”
- The Waluigi Effect of LLMs: https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post
- I stopped myself from reading the waluigi post until today because I don’t really think its beneficial for the space to make up such words where no one outside the LW sphere understands anything (even tho the term is quite self explanatory). But I have to admit its a really good post. Go check it out.
- Could you train a ChatGPT-beating model for $85,000 and run it in a browser?
July 2022
- The effective altruist work ethic and the spirit of utilitarianism https://www.dwarkeshpatel.com/p/ea-billionaires?r=2jrlm&s=w&utm_campaign=post&utm_medium=web
- The Track Record of Futurists Seems ... Fine https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/
- Balaji’s new book the Network State: https://thenetworkstate.com/
June 2022
- Gwern’s GPT-3 2nd Anniversary predictions: https://www.reddit.com/r/mlscaling/comments/uznkhw/gpt3_2nd_anniversary/
- Check here for a summary: https://twitter.com/johannes_hage/status/1530898189162782721
April 2022
- DeepMind releases new scaling laws that contradict with the ones from OpenAI: https://arxiv.org/abs/2203.15556
- PaLM - 540B parameter model by Google AI: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html
- DALL-E-2 by OpenAI: https://openai.com/dall-e-2/
- Second order effects of the rise of large language models:https://twitter.com/russelljkaplan/status/1513128005828165634
March 2022
- Georg Hotz - Ride or Die: https://return.life/2022/03/07/george-hotz-comma-ride-or-die/
- Super realistic AI takeoff scenario by gwern, based on current models and scaling effects: https://www.lesswrong.com/posts/a5e9arCnbDac9Doig/it-looks-like-you-re-trying-to-take-over-the-world
- Examples of barbell strategies for everyday life: https://dwarkeshpatel.com/barbell-strategies/
- Deep Neural Nets - 33 years ago and 33 years from now → Andrej Karpathy reimplemented one of the first neural net papers by LeCun from 1989 and analysed if we made any fundamental progress: https://karpathy.github.io/2022/03/14/lecun1989/
- Directory of all Large Language Models: https://docs.google.com/spreadsheets/d/1gc6yse74XCwBx028HV_cvdxwXkmXejVjkO-Mz2uwE0k/edit#gid=0
- BigScience published several interesting blog posts this month on how they are training their 176B parameter language model: https://bigscience.huggingface.co/blog
February 2022
- Slate Star Codex analysis of AGI timelines: https://astralcodexten.substack.com/p/biological-anchors-a-trick-that-might
- Thesis on Sleep by Alexey Guzey: https://guzey.com/theses-on-sleep/
- Why Tyler Cowen with Emergent Ventures has been so successful in curating talent and why the first batch of YC and the Thiel Fellowship were so successful: https://www.highmodernism.com/blog/talentcuration
January 2022
- Motivation for the roaring 20s. We choose to solve problems like alignment (+aging) not because they are easy but because they are hard!
December 2021
- Cool newsletter by Sonia Joseph: https://mirror.xyz/soniajoseph.eth/AsFhFt-JOjqdyb6GCVhcNCIK6whBIZpcu_XbSvpc6W8
- Sequence to understand the relationship between Progress Studies and Effective Altruism: https://www.highmodernism.com/sequence
- New AGI Workshop at Mila Quebec with a bunch of videos: https://sites.google.com/mila.quebec/scaling-laws-workshop/schedule
- WebGPT: Improving the factual accuracyof language models through web browsing: https://openai.com/blog/improving-factual-accuracy/
- Gopher: DeepMinds 280B parameter model with new SOTAs across the board: https://deepmind.com/blog/article/language-modelling-at-scale
- First published work by Aleph Alpha: https://arxiv.org/pdf/2112.05253.pdf
- New developments on nuclear reactors in Wyoming! https://www.terrapower.com/natrium-demo-kemmerer-wyoming/
October 2021
- Deep Learning Diminishing Returns → Important piece but I don't agree with a lot of stuff in this
- Whole Brain Emulation → No Progress on C. elgans After 10 Years:
- https://www.lesswrong.com/posts/mHqQxwKuzZS69CXX5/whole-brain-emulation-no-progress-on-c-elgans-after-10-years
- Also check out the Reddit discussion about the WBE topic:
- For fellow ML Engineers - How to Train Really Large Models on Many GPUs?
- Your chance to invest in a Longevity company that went trough YC with a reasonable valuation:
- How to Train Large Deep Learning Models as a Startup
- The Vitalik Buterin Fellowships in AI Existential Safety:
- 530B parameter language model by Microsoft + NVIDIA: https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/
- Also check out this awesome thread about the model: https://twitter.com/BlancheMinerva/status/1447560921530896389
- State of AI Report 2021: https://www.stateof.ai/
- Bryan Johnson measuring all his 70+ organs to maximally reverse the quantified biological age of each:
- Awesome video about the Scaling Laws: https://www.youtube.com/watch?v=StLtMcsbQes
- Just Ask for Generalization by Eric Jang: https://evjang.com/2021/10/23/generalization.html
September 2021
- In What Sense is Matter 'Programmable'? → A lot of interesting ideas inspired by David Deutsch ideas in the Beginning of Infinity: https://jaredtumiel.github.io/blog/2021/08/14/programmable-matter.html
- Founder of NeuraLink is building a new company called Science: https://maxhodak.com/nonfiction/2021/09/03/science.html
- Summary of Sam Altman Q&A on AGI predictions and GPT-4: https://www.lesswrong.com/posts/aihztgJrknBdLHjd2/sam-altman-q-and-a-gpt-and-agi
August 2021
- New chip cluster that will make 120 trillion parameter models possible (almost 100x from GPT-3): https://www.wired.com/story/cerebras-chip-cluster-neural-networks-ai/
- Scott Alexander on AGI risks: https://astralcodexten.substack.com/p/highlights-from-the-comments-on-acemoglu + his answers to the comments are super interesting: https://astralcodexten.substack.com/p/highlights-from-the-comments-on-acemoglu
- We need founder-led Biotech companies: https://www.pillar.vc/news/the-future-of-biotech-is-founder-led/
- If Einstein Had The Internet: An Interview With Balaji Srinivasan:
July 2021
- Must read on the arguments for a slow AGI takeoff: https://sideways-view.com/2018/02/24/takeoff-speeds/
- Prompt design of neural networks (learning how to talk to an AI) will be a superpower in the future. Adding "dramatic atmospheric ultra high definition free desktop wallpaper" to the prompt for CLIP produces much more realistic images: https://ai-weirdness.ghost.io/the-art-of-asking-nicely/
- There will be a dope Ethereum documentary: https://ethereumfilm.mirror.xyz/3SV8gLXHIW8Ot45h3RL7aOgDINxN2hjLfFVOvyatB2A
- Funniest AI blog post I've ever read: https://blog.eleuther.ai/year-one/
- How to apply for an insane amount of free TPUs as an ML Engineer: https://blog.gpt4.org/jaxtpu
- Building Europes AGI: https://www.aleph-alpha.de/
- Putting the power of AlphaFold into the world’s hands: https://deepmind.com/blog/article/putting-the-power-of-alphafold-into-the-worlds-hands
June 2021
- https://breakthroughinitiatives.org/ → Research initiative that wants to go to Alpha Centauri by 2060 via an ultra-light uncrewed space flight at 20% of the speed of light
- https://jacobjackson.com/cross-entropy/ → inexperienced engineers tend to undervalue simplicity → a justification of the cross entropy loss