Dive into Deep Learning
  • Slides
  • Courses
  • GitHub
  • Discuss
  • PDF
    • PyTorch
    • JAX
    • TensorFlow
    • MXNet
  1. index.html
  • index.html
  • Preface
  • Installation
  • Notation
  • 1  Introduction
  • Preliminaries
    • 2  Preliminaries
    • 2.1  Data Manipulation
    • 2.2  Data Preprocessing
    • 2.3  Linear Algebra
    • 2.4  Calculus
    • 2.5  Automatic Differentiation
    • 2.6  Probability and Statistics
    • 2.7  Documentation
  • Linear Neural Networks
    • 3  Linear Neural Networks for Regression
    • 3.1  Linear Regression
    • 3.2  Object-Oriented Design for Implementation
    • 3.3  Synthetic Regression Data
    • 3.4  Linear Regression Implementation from Scratch
    • 3.5  Concise Implementation of Linear Regression
    • 3.6  Generalization
    • 3.7  Weight Decay
    • 4  Linear Neural Networks for Classification
    • 4.1  Softmax Regression
    • 4.2  The Image Classification Dataset
    • 4.3  The Base Classification Model
    • 4.4  Softmax Regression Implementation from Scratch
    • 4.5  Concise Implementation of Softmax Regression
    • 4.6  Generalization in Classification
    • 4.7  Environment and Distribution Shift
  • Multilayer Perceptrons
    • 5  Multilayer Perceptrons
    • 5.1  Multilayer Perceptrons
    • 5.2  Implementation of Multilayer Perceptrons
    • 5.3  Forward Propagation, Backward Propagation, and Computational Graphs
    • 5.4  Numerical Stability and Initialization
    • 5.5  Generalization in Deep Learning
    • 5.6  Dropout
    • 5.7  Predicting House Prices on Kaggle
  • Deep Learning Computation
    • 6  Builders’ Guide
    • 6.1  Layers and Modules
    • 6.2  Parameter Management
    • 6.3  Parameter Initialization
    • 6.4  Lazy Initialization
    • 6.5  Custom Layers
    • 6.6  File I/O
    • 6.7  GPUs
  • Convolutional Neural Networks
    • 7  Convolutional Neural Networks
    • 7.1  From Fully Connected Layers to Convolutions
    • 7.2  Convolutions for Images
    • 7.3  Padding and Stride
    • 7.4  Multiple Input and Multiple Output Channels
    • 7.5  Pooling
    • 7.6  Convolutional Neural Networks (LeNet)
    • 8  Modern Convolutional Neural Networks
    • 8.1  Deep Convolutional Neural Networks (AlexNet)
    • 8.2  Networks Using Blocks (VGG)
    • 8.3  Network in Network (NiN)
    • 8.4  Multi-Branch Networks (GoogLeNet)
    • 8.5  Batch Normalization
    • 8.6  Residual Networks (ResNet) and ResNeXt
    • 8.7  Densely Connected Networks (DenseNet)
    • 8.8  Designing Convolutional Network Architectures
  • Recurrent Neural Networks
    • 9  Recurrent Neural Networks
    • 9.1  Working with Sequences
    • 9.2  Converting Raw Text into Sequence Data
    • 9.3  Language Models
    • 9.4  Recurrent Neural Networks
    • 9.5  Recurrent Neural Network Implementation from Scratch
    • 9.6  Concise Implementation of Recurrent Neural Networks
    • 9.7  Backpropagation Through Time
    • 10  Modern Recurrent Neural Networks
    • 10.1  Long Short-Term Memory (LSTM)
    • 10.2  Gated Recurrent Units (GRU)
    • 10.3  Deep Recurrent Neural Networks
    • 10.4  Bidirectional Recurrent Neural Networks
    • 10.5  Machine Translation and the Dataset
    • 10.6  The Encoder–Decoder Architecture
    • 10.7  Sequence-to-Sequence Learning for Machine Translation
    • 10.8  Beam Search
  • Attention Mechanisms and Transformers
    • 11  Attention Mechanisms and Transformers
    • 11.1  Queries, Keys, and Values
    • 11.2  Attention Pooling by Similarity
    • 11.3  Attention Scoring Functions
    • 11.4  The Bahdanau Attention Mechanism
    • 11.5  Multi-Head Attention
    • 11.6  Self-Attention and Positional Encoding
    • 11.7  The Transformer Architecture
    • 11.8  Transformers for Vision
    • 11.9  Large-Scale Pretraining with Transformers
  • Optimization
    • 12  Optimization Algorithms
    • 12.1  Optimization and Deep Learning
    • 12.2  Convexity
    • 12.3  Gradient Descent
    • 12.4  Stochastic Gradient Descent
    • 12.5  Minibatch Stochastic Gradient Descent
    • 12.6  Momentum
    • 12.7  Adagrad
    • 12.8  RMSProp
    • 12.9  Adadelta
    • 12.10  Adam
    • 12.11  Learning Rate Scheduling
  • Computational Performance
    • 13  Computational Performance
    • 13.1  Compilers and Interpreters
    • 13.2  Asynchronous Computation
    • 13.3  Automatic Parallelism
    • 13.4  Hardware
    • 13.5  Training on Multiple GPUs
    • 13.6  Concise Implementation for Multiple GPUs
    • 13.7  Parameter Servers
  • Computer Vision
    • 14  Computer Vision
    • 14.1  Image Augmentation
    • 14.2  Fine-Tuning
    • 14.3  Object Detection and Bounding Boxes
    • 14.4  Anchor Boxes
    • 14.5  Multiscale Object Detection
    • 14.6  The Object Detection Dataset
    • 14.7  Single Shot Multibox Detection
    • 14.8  Region-based CNNs (R-CNNs)
    • 14.9  Semantic Segmentation and the Dataset
    • 14.10  Transposed Convolution
    • 14.11  Fully Convolutional Networks
    • 14.12  Neural Style Transfer
    • 14.13  Image Classification (CIFAR-10) on Kaggle
    • 14.14  Dog Breed Identification (ImageNet Dogs) on Kaggle
  • Natural Language Processing
    • 15  Natural Language Processing: Pretraining
    • 15.1  Word Embedding (word2vec)
    • 15.2  Approximate Training
    • 15.3  The Dataset for Pretraining Word Embeddings
    • 15.4  Pretraining word2vec
    • 15.5  Word Embedding with Global Vectors (GloVe)
    • 15.6  Subword Embedding
    • 15.7  Word Similarity and Analogy
    • 15.8  Bidirectional Encoder Representations from Transformers (BERT)
    • 15.9  The Dataset for Pretraining BERT
    • 15.10  Pretraining BERT
    • 16  Natural Language Processing: Applications
    • 16.1  Sentiment Analysis and the Dataset
    • 16.2  Sentiment Analysis: Using Recurrent Neural Networks
    • 16.3  Sentiment Analysis: Using Convolutional Neural Networks
    • 16.4  Natural Language Inference and the Dataset
    • 16.5  Natural Language Inference: Using Attention
    • 16.6  Fine-Tuning BERT for Sequence-Level and Token-Level Applications
    • 16.7  Natural Language Inference: Fine-Tuning BERT
  • Advanced Topics
    • 17  Reinforcement Learning
    • 17.1  Markov Decision Process (MDP)
    • 17.2  Value Iteration
    • 17.3  Q-Learning
    • 18  Gaussian Processes
    • 18.1  Introduction to Gaussian Processes
    • 18.2  Gaussian Process Priors
    • 18.3  Gaussian Process Inference
    • 19  Hyperparameter Optimization
    • 19.1  What Is Hyperparameter Optimization?
    • 19.2  Hyperparameter Optimization API
    • 19.3  Asynchronous Random Search
    • 19.4  Multi-Fidelity Hyperparameter Optimization
    • 19.5  Asynchronous Successive Halving
    • 20  Generative Adversarial Networks
    • 20.1  Generative Adversarial Networks
    • 20.2  Deep Convolutional Generative Adversarial Networks
    • 21  Recommender Systems
    • 21.1  Overview of Recommender Systems
    • 21.2  The MovieLens Dataset
    • 21.3  Matrix Factorization
    • 21.4  AutoRec: Rating Prediction with Autoencoders
    • 21.5  Personalized Ranking for Recommender Systems
    • 21.6  Neural Collaborative Filtering for Personalized Ranking
    • 21.7  Sequence-Aware Recommender Systems
    • 21.8  Feature-Rich Recommender Systems
    • 21.9  Factorization Machines
    • 21.10  Deep Factorization Machines
  • Appendix
    • 22  Appendix: Mathematics for Deep Learning
    • 22.1  Geometry and Linear Algebraic Operations
    • 22.2  Eigendecompositions
    • 22.3  Single Variable Calculus
    • 22.4  Multivariable Calculus
    • 22.5  Integral Calculus
    • 22.6  Random Variables
    • 22.7  Maximum Likelihood
    • 22.8  Distributions
    • 22.9  Naive Bayes
    • 22.10  Statistics
    • 22.11  Information Theory
    • 23  Appendix: Tools for Deep Learning
    • 23.1  Using Jupyter Notebooks
    • 23.2  Using Amazon SageMaker
    • 23.3  Using AWS EC2 Instances
    • 23.4  Using Google Colab
    • 23.5  Selecting Servers and GPUs
    • 23.6  Contributing to This Book
    • 23.7  Utility Functions and Classes
    • 23.8  The d2l API Document
  • References

Dive into Deep Learning

Aston Zhang

Zachary C. Lipton

Mu Li

Alexander J. Smola

2026-05-24

Dive into Deep Learning

Interactive deep learning, with code, math, and discussions — implemented in PyTorch, JAX, TensorFlow, and MXNet.

By Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola · Adopted at 500+ universities in 70+ countries · Published by Cambridge University Press.

Get started Free PDF GitHub Discuss
Dive into Deep Learning book cover

One book, four frameworks

Every example runs end-to-end in your framework of choice. Switch tabs to see the same idea in idiomatic code for each.

  • PyTorch
  • JAX
  • TensorFlow
  • MXNet

What you get

{ }

Code you can run

Every concept ships with executable Jupyter notebooks. Tweak hyperparameters and see the effect immediately.

∑

Math grounded in intuition

Derivations stay close to the code. Equations, figures, and prose are interwoven, not relegated to appendices.

↔

Truly multi-framework

The same chapter, the same explanations, in PyTorch, JAX, TensorFlow, and MXNet. Pick your framework, keep the book.

☁

Runs anywhere

Local Jupyter, Google Colab, Amazon SageMaker Studio Lab, or your own GPU box. No paywalls, no setup hurdles.

🎓

Classroom-tested

Used as a primary or supplementary text at 500+ universities. Slide decks, exercises, and a discussion forum included.

∞

Always free, always evolving

The book is fully open-source. New chapters and corrections land continuously, in step with the field.

Authors

Aston Zhang Aston Zhang AWS
Zachary C. Lipton Zachary C. Lipton Carnegie Mellon University
Mu Li Mu Li Boson AI · AWS
Alexander J. Smola Alexander J. Smola Boson AI · CMU

Chapter contributors

Specialist authors who led the writing of individual chapters in the second volume.

Reinforcement Learning

Pratik Chaudhari Pratik Chaudhari UPenn · Amazon
Rasool Fakoor Rasool Fakoor Amazon
Kavosh Asadi Kavosh Asadi Amazon

Gaussian Processes

Andrew Gordon Wilson Andrew Gordon Wilson NYU · Amazon

Hyperparameter Optimization

Aaron Klein Aaron Klein Amazon
Matthias Seeger Matthias Seeger Amazon
Cedric Archambeau Cedric Archambeau Amazon

Recommender Systems

Shuai Zhang Shuai Zhang Amazon
Yi Tay Yi Tay Google

Mathematics for Deep Learning

Brent Werness Brent Werness Amazon
Rachel Hu Rachel Hu Amazon

Framework adaptation leads

Driving the per-framework code: porting every example to PyTorch, JAX, and TensorFlow.

Anirudh Dagar Anirudh Dagar PyTorch & JAX · Amazon
Yuan Tang Yuan Tang TensorFlow · Akuity

Adopted at universities worldwide

500+universities in 70+ countries teach with Dive into Deep Learning.

Abasyn University Islamabad Campus Ain Shams University Alexandria University Amity University Andhra University Anna University Ateneo de Naga University Australian Catholic University Australian National University Bar Ilan University Barnard College Beijing Institute of Mathematical Sciences and Applications Beykoz University Birla Institute of Technology and Science, Hyderabad Birla Institute of Technology and Science Pilani BITS Pilani BML Munjal University Boston College Boston University Brac University Brandeis University Brown University Bursa Uludag University Cairo University Cankaya University Carnegie Mellon University Chennai Mathematical Institute China Ocean University Chinese University of Hong Kong Chinese University of Hong Kong Shenzhen Chongqing University of Science and Technology Christ University Bengaluru Chulalongkorn University City University of Hong Kong College of Engineering Pune Colorado State University Columbia University Concordia University Cornell University Cyprus Institute Dalian University of Technology Dayananda Sagar University Deakin University DePaul University Diponegoro University Duke University Durban University of Technology East China Normal University Eastern Mediterranean University Ecole Nationale Superieure dInformatique Emory University Eötvös Loránd University (ELTE), Budapest EPFL Escuela Politecnica Nacional Escuela Superior Politecnica del Litoral Ewha Womans University Federal University Lokoja Fenerbahce University Feng Chia University FPT University Fudan University Gayatri Vidya Parishad College of Engineering (Autonomous) Gazi Universitesi George Mason University Georgetown University Georgia Institute of Technology Goa University Golden Gate University Great Lakes Institute of Management Guangdong University of Technology Gwangju Institute of Science and Technology Habib University Hamad Bin Khalifa University Hangzhou Dianzi University Hankuk University of Foreign Studies Harare Institute of Technology Harbin Institute of Technology Harokopio University Harvard University Hasso Plattner Institut Hebrew University of Jerusalem Heilongjiang University of Science and Technology Heinrich Heine Universitat Dusseldorf Hertie School Hiroshima University Ho Chi Minh City University of Foreign Languages and Information Technology Hochschule Bremen Hochschule fur Technik und Wirtschaft Hong Kong Polytechnic University Hong Kong University of Science and Technology Huazhong University of Science and Technology IIT Bombay IIT Delhi IIT Gandhinagar IIT Guwahati IIT Jodhpur IIT Kanpur IIT Madras Imperial College London IMT Mines Ales Indian Institute of Information Technology Lucknow Indian Institute of Information Technology Una Indian Institute of Science Bangalore Indian Institute of Science IISc Bangalore Indian Institute of Technology Bombay Indian Institute of Technology Delhi Indian Institute of Technology Gandhinagar Indian Institute of Technology Guwahati Indian Institute of Technology Hyderabad Indian Institute of Technology Jodhpur Indian Institute of Technology Kanpur Indian Institute of Technology Kharagpur Indian Institute of Technology Madras Indian Institute of Technology Mandi Indian Institute of Technology Patna Indian Institute of Technology Ropar Indira Gandhi National Open University Indraprastha Institute of Information Technology, Delhi Information Technology University Lahore Institut catholique d'arts et metiers (ICAM) Institut de recherche en informatique de Toulouse Institut Superieur d'Informatique et des Techniques de Communication Institut Superieur De L'electronique Et Du Numerique Institut Teknologi Bandung Instituto Politécnico Nacional Instituto Tecnologico Autonomo de Mexico Instituto Tecnologico de Buenos Aires Islamic University of Medina Istanbul Atlas University Istanbul Teknik Universitesi Istinye University IT Universitetet i Kobenhavn Jeonbuk National University Johns Hopkins University Kansas State University Keio University Kennesaw State University King Abdullah University of Science and Technology King Fahd University of Petroleum and Minerals King Faisal University Kongu Engineering College Korea Aerospace University KPR Institute of Engineering and Technology KU Leuven Kyung Hee University Kyungpook National University Lahore University of Management Sciences Lancaster University Leading Unviersity Leibniz Universitat Hannover Leuphana University of Luneburg Liuzhou Institute of Technology London School of Economics & Political Science London School of Economics and Political Science Make School Mar Baselios College of Engineering and Technology Marmara University Masaryk University Massachusetts Institute of Technology McGill University MEF University Menoufia University Michigan State University Middlebury College Milwaukee School of Engineering Minia University Mohammed V University in Rabat Monash University Multimedia University Murdoch University Nagoya University Nanchang Hangkong University Nanjing Normal University Zhongbei College Nanjing University Nanjing University of Finance and Economics National Chengchi University National Chung Hsing University National Institute of Technical Teachers Training & Research National Institute of Technology, Warangal National Institute of Technology Delhi National Institute of Technology Kurukshetra National Institute of Technology Rourkela National Sun Yat sen University National Taiwan University National Technical University of Athens National Technical University of Ukraine National United University National University of Sciences and Technology National University of Singapore Nazarbayev University New Jersey Institute of Technology New York University Newman University Noida Institute of Engineering and Technology North Carolina State University North Ossetian State University Northeastern University Northwestern University NRI Institute of Technology Ohio University Oita University Ontario Tech University Pakuan University Peking University Pennsylvania State University Pohang University of Science and Technology Politecnico di Milano Pomona College Pontificia Universidad Catolica de Chile Pontificia Universidad Catolica del Peru Portland State University Prasad V. Potluri Siddhartha Institute of Technology Purdue University Quaid e Azam University Queen's University Radboud Universiteit Rensselaer Polytechnic Institute Rikkyo University Rowan University Rutgers, The State University of New Jersey RV University Bengaluru Sant Longowal Institute of Engineering Technology Santa Clara University Sanya University Sapienza Universita di Roma Seoul National University Seoul National University of Science and Technology Shahid Beheshti University Shandong University Shanghai Jiao Tong University Shanghai University of Electric Power Shanghai University of Finance and Economics ShanghaiTech University Sharif University of Technology Shenzhen University Shivaji University Kolhapur Simon Fraser University Singapore University of Technology and Design Skolkovo Institute of Science and Technology Sogang University Sookmyung Women s University Southern New Hampshire University Southern Utah University St. Polten University of Applied Sciences Stanford University State University of New York at Binghamton Stellenbosch University Stevens Institute of Technology Stony Brook University Sungkyunkwan University Technion   Israel Institute of Technology Technische Universitat Berlin Technische Universiteit Delft Tecnológico de Monterrey, Campus Guadalajara Tekirdag Namik Kemal Universitesi Texas A&M University Texas Christian University Thapar Institute of Engineering and Technology Tsinghua University Tufts University Tunghai University Umea University United International University Universidad Carlos III de Madrid Universidad de Chile Universidad de Ibagué Universidad de Ingeniería y Tecnología Universidad de Salamanca Universidad de Zaragoza Universidad del Norte Universidad Icesi Universidad Militar Nueva Granada Universidad Nacional Agraria La Molina Universidad Nacional Autonoma de Mexico Universidad Nacional de Colombia Sede Manizales Universidad Nacional de Tierra del Fuego Universidad Politécnica Salesiana, Cuenca Universidad Rafael Landívar Universidad Rey Juan Carlos Universidad San Francisco de Quito Universidad Tecnológica Nacional Universidad Tecnologica de Pereira Universidade Catolica de Brasilia Universidade de Coimbra Universidade Estadual de Campinas Universidade Federal de Goiás Universidade Federal de Minas Gerais Universidade Federal de Ouro Preto Universidade Federal de Pernambuco Universidade Federal de São Carlos Universidade Federal de Viçosa Universidade Federal do Pampa Universidade Federal do Rio Grande Universidade Lusófona Universidade NOVA de Lisboa Universidade Presbiteriana Mackenzie Universidade Tecnológica Federal do Paraná Università degli Studi di Firenze Università degli Studi di Pavia Universita Bocconi Universita degli Studi di Bari Aldo Moro Universita degli Studi di Brescia Universita degli Studi di Catania Universita degli Studi di Padova Università degli Studi di Roma Tor Vergata Università degli Studi di Bologna Universitas Andalas, Padang Universitas Indonesia Universitas Negeri Yogyakarta Universitas Udayana Universitat de Barcelona Universitat Heidelberg Universitat Politecnica de Catalunya Universitat Tubingen Universitatea Babes Bolyai Universitatea de Vest din Timisoara Universite Cote dAzur Universite de technologie de Compiegne Universite Paris Saclay Universiteit Leiden Universiteit van Amsterdam University of Arkansas University of Augsburg University of Baghdad University of Bamberg University of British Columbia University of Cagliari University of California, Berkeley University of California, Irvine University of California, Los Angeles University of California, San Diego University of California, Santa Barbara University of California, Santa Cruz University of Cambridge University of Canberra University of Catania University of Central Florida University of Chinese Academy of Sciences University of Cincinnati University of Colorado Denver University of Connecticut University of Copenhagen University of Florence University of Florida University of Ghana University of Groningen University of Hamburg University of Hull University of Iceland University of Idaho University of Illinois at Chicago University of Illinois at Urbana Champaign University of Illinois Urbana Champaign University of International Business and Economics University of Juba University of Jyväskylä University of Kentucky University of Klagenfurt University of Liege University of Louisville University of Maryland University of Maryland Baltimore County University of Melbourne University of Michigan Università degli Studi di Milano-Bicocca University of Minnesota, Twin Cities University of Moratuwa University of Mosul University of Nebraska-Lincoln University of Nevada, Reno University of New Hampshire University of New South Wales University of Newcastle University of North Carolina at Chapel Hill University of North Texas University of Northern Philippines University of Nottingham University of Oklahoma University of Oslo University of Pennsylvania University of Pittsburgh University of Rhode Island University of Rochester University of Sao Paulo University of Science and Technology of China University of South Australia University of Southern California University of Southern Maine University of St Andrews University of Sydney University of Szeged University of Technology Sydney University of Tehran University of Texas at Austin University of Texas at Dallas University of Toronto University of Virginia University of Warsaw University of Washington University of Waterloo University of Wisconsin University of Wisconsin Madison Univerzita Komenskeho v Bratislave Van Yuzuncu Yil University Vardhaman College of Engineering Vardhman Mahaveer Open University Vietnamese German University Virginia Tech VIT (Vellore Institute of Technology), Vellore VNU-HCM University of Science Wageningen University West Virginia University Western University Xavier University Bhubaneswar Xi'an Jiaotong Liverpool University Xiamen University Yale University Yeshiva University Yildiz Technical University Yonsei University Yunnan University Zhejiang University Zhengzhou Tourism College

Click a logo to see a course at that institution adopting the book.

What people are saying

In a way that strikes the perfect balance between hands-on learning and mathematical rigor, this book is the most accessible and resourceful guide to deep learning we currently have.

Course adopter, R1 university

The notebooks make it easy to get students from zero to a working model in a single lecture. The math is there when you want it and stays out of the way when you don't.

Instructor, graduate ML course

I switched from PyTorch to JAX mid-semester and didn't have to switch textbooks. That alone is unheard of.

Researcher, industry lab

Cite the book

@book{zhang2023dive,
  title     = {Dive into Deep Learning},
  author    = {Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J.},
  publisher = {Cambridge University Press},
  note      = {\url{https://D2L.ai}},
  year      = {2023}
}

Resources

Read the bookStart with the preface PDF (PyTorch)Single-file download Source on GitHubNotebooks & library CoursesSlides & videos Discussion forumPer-chapter Q&A Chinese edition中文版

Global university adoption

World map showing universities teaching Dive into Deep Learning
Preface