Within Deep Learning

Why do deep networks need shortcuts?

Residual connections helped very deep networks learn useful changes without forcing every layer to rebuild the whole signal.

On this page

  • Why simply stacking layers can fail
  • How residual learning changes the training problem
  • Why Res Net mattered beyond one benchmark
Preview for Why do deep networks need shortcuts?

Introduction

As neural networks became deeper, researchers expected performance to keep improving. Instead, they discovered a surprising problem: simply adding more layers could make a network harder to train and sometimes even less accurate. Skip connections, popularised by the 2015 ResNet (Residual Network) architecture, solved much of this optimisation difficulty by giving information and gradients a direct path through the network. Rather than forcing every layer to completely transform its input, skip connections allowed layers to focus on learning only the useful changes. This seemingly simple architectural adjustment made it practical to train networks with dozens, hundreds, and eventually even more than a thousand layers, helping unlock many later advances in artificial intelligence. [arXiv]arxiv.orgarXiv[1512.03385] Deep Residual Learning for Image RecognitionDecember 10, 2015 — by K He · 2015 · Cited by 315493 — We present a residua…Published: December 10, 2015

Skip links illustration 1

Why simply stacking layers can fail

The intuitive idea behind deep learning is that more layers should allow a network to learn more complex representations. However, early experiments revealed a problem known as the degradation problem. Researchers found that beyond a certain depth, adding layers could increase training error rather than reduce it. Importantly, this was not merely a matter of overfitting; even on the training data itself, deeper networks could become harder to optimise. [arXiv+2cv-foundation.org]arxiv.orgarXiv:1512.03385v1 [cs.CV] 10 Dec 2015December 10, 2015 — by K He · 2015 · Cited by 320310 — In this paper, we address the degradati…Published: December 10, 2015

One reason is that information and error signals must travel through many transformations during training. As gradients are propagated backwards, they can become weak or unstable, making it difficult for earlier layers to learn effectively. Deep networks also face a practical optimisation challenge: each additional layer increases the complexity of the function that training must discover. [arXiv+2GeeksforGeeks]arxiv.orgarXiv Res Net: Enabling Deep Convolutional Neural NetworksResNet: Enabling Deep Convolutional Neural Networks…October 28, 2025 — by X Liu · 2025 · Cited by 4 — Surprisingly, the addition…Published: October 28, 2025

A useful thought experiment illustrates the issue. Suppose a 20-layer network already performs well. A 30-layer version should, in theory, be able to achieve at least the same result by letting the extra ten layers do nothing. Yet standard architectures often failed to find this simple solution during training. The deeper model could therefore perform worse even though it contained all the capabilities of the shallower one. This observation became one of the central motivations for residual learning. [arXiv]arxiv.orgarXiv:1512.03385v1 [cs.CV] 10 Dec 2015December 10, 2015 — by K He · 2015 · Cited by 320310 — In this paper, we address the degradati…Published: December 10, 2015

How residual learning changes the training problem

The key insight behind ResNet was that layers do not always need to learn an entirely new representation. Often, they only need to make a small adjustment to what already exists.

Instead of asking a stack of layers to learn a complete mapping from input to output, residual learning asks them to learn the difference between the desired output and the original input. In the ResNet formulation, a shortcut path carries the input forward unchanged while the main path learns a residual function. The final output is obtained by combining the two. [arXiv+2arXiv]arxiv.orgarXiv:1512.03385v1 [cs.CV] 10 Dec 2015December 10, 2015 — by K He · 2015 · Cited by 320310 — In this paper, we address the degradati…Published: December 10, 2015

A simplified view of a residual block is:

y=F(x)+xy = F(x) + xy=F(x)+x

Here, the shortcut contributes the original signal xxx, while the learned component F(x)F(x)F(x) only has to represent what should change. If the best action is effectively to leave the signal alone, the residual function can approach zero and the block behaves close to an identity mapping. This is far easier for optimisation algorithms to discover than forcing several layers to learn the identity transformation from scratch. [arXiv+2ICML]arxiv.orgarXiv:1512.03385v1 [cs.CV] 10 Dec 2015December 10, 2015 — by K He · 2015 · Cited by 320310 — In this paper, we address the degradati…Published: December 10, 2015

The shortcut connection also creates a more direct route for gradients during backpropagation. Error signals can flow through the identity path without being repeatedly distorted by every weight layer, making training more stable as depth increases. Later theoretical analyses linked this behaviour to improved gradient preservation and more reliable optimisation in very deep networks. [arXiv+2abhik.ai]arxiv.orgarXiv Norm-Preservation: Why Residual Networks Can Become Extremely Deep?arXiv Norm-Preservation: Why Residual Networks Can Become Extremely Deep?

Skip links illustration 2

What the shortcut is actually doing

A common misunderstanding is that skip connections allow the network to skip learning. In reality, they make learning more selective.

The residual branch can still learn complex transformations when they are useful. The shortcut simply ensures that information already present in the input is not unnecessarily destroyed or reconstructed. Layers can therefore concentrate on refining features rather than repeatedly rebuilding them. [cv-foundation.org+2arXiv]cv-foundation.orgHe Deep Residual Learning CVPR 2016 paperWe explicitly reformulate…Read more…

This changes the optimisation landscape. Instead of every layer carrying the burden of preserving useful information while simultaneously creating new features, the shortcut preserves the baseline signal and the learned branch focuses on improvement. The result is a network that is easier to train even as depth grows dramatically. [cv-foundation.org]cv-foundation.orgHe Deep Residual Learning CVPR 2016 paperWe explicitly reformulate…Read more…

Why ResNet mattered beyond one benchmark

The original ResNet work demonstrated that residual learning could successfully train networks far deeper than those that had previously been practical. The architecture achieved leading results in the ImageNet image-recognition competition and showed that depth could continue to provide benefits when paired with the right optimisation strategy. [cv-foundation.org]cv-foundation.orgHe Deep Residual Learning CVPR 2016 paperWe explicitly reformulate…Read more…

Its influence extended far beyond image classification. Residual connections became a standard design pattern across deep learning because they addressed a general optimisation problem rather than a task-specific one. Variants of the idea appeared in later computer vision systems, language models, reinforcement-learning systems, and scientific AI applications. Surveys of modern architectures consistently identify skip connections as one of the foundational innovations that enabled extremely deep neural networks. [arXiv]arxiv.orgDevelopment of Skip Connection in Deep Neural Networks…2 May 2024 — This survey provides a comprehensive summary and outlook on t…Published: May 2024

Perhaps the most important legacy of skip connections is conceptual. They showed that improving neural networks is not only about adding more layers or more parameters. Sometimes the decisive breakthrough comes from changing the way information flows through a model. By allowing layers to learn useful corrections instead of complete reconstructions, residual learning turned depth from a liability into an advantage and made very deep networks practical for modern artificial intelligence. [arXiv+2cv-foundation.org]arxiv.orgarXiv[1512.03385] Deep Residual Learning for Image RecognitionDecember 10, 2015 — by K He · 2015 · Cited by 315493 — We present a residua…Published: December 10, 2015

Skip links illustration 3

Amazon book picks

Further Reading

Books and field guides related to Why do deep networks need shortcuts?. Use these as the next step if you want deeper reading beyond the article.

BookCover for Deep Learning

Deep Learning

By Ian Goodfellow, Yoshua Bengio et al.

Rating: 3.5/5 from 6 Google Books ratings

Covers optimisation challenges, gradient flow, and deep architectures.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: arxiv.org
    Link: https://arxiv.org/abs/1512.03385
    Source snippet

    arXiv[1512.03385] Deep Residual Learning for Image RecognitionDecember 10, 2015 — by K He · 2015 · Cited by 315493 — We present a residua...

    Published: December 10, 2015

  2. Source: cv-foundation.org
    Title: He Deep Residual Learning CVPR 2016 paper
    Link: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf
    Source snippet

    We explicitly reformulate...Read more...

  3. Source: arxiv.org
    Link: https://arxiv.org/pdf/1512.03385
    Source snippet

    arXiv:1512.03385v1 [cs.CV] 10 Dec 2015December 10, 2015 — by K He · 2015 · Cited by 320310 — In this paper, we address the degradati...

    Published: December 10, 2015

  4. Source: arxiv.org
    Title: arXiv Res Net: Enabling Deep Convolutional Neural Networks
    Link: https://arxiv.org/pdf/2510.24036
    Source snippet

    ResNet: Enabling Deep Convolutional Neural Networks...October 28, 2025 — by X Liu · 2025 · Cited by 4 — Surprisingly, the addition...

    Published: October 28, 2025

  5. Source: geeksforgeeks.org
    Title: Geeksfor Geeks Residual Networks (Res Net)
    Link: https://www.geeksforgeeks.org/deep-learning/residual-networks-resnet-deep-learning/
    Source snippet

    Residual Networks (ResNet) - Deep LearningMay 12, 2026 — Eases training of deep networks by allowing direct [gradient flow]({{ 'gradient-flow/' | relative_url }}) through skip co...

    Published: May 12, 2026

  6. Source: arxiv.org
    Title: arXiv Norm-Preservation: Why Residual Networks Can Become Extremely Deep?
    Link: https://arxiv.org/abs/1805.07477

  7. Source: icml.cc
    Title: icml2016 tutorial deep residual networks kaiminghe
    Link: https://icml.cc/2016/tutorials/icml2016_tutorial_deep_residual_networks_kaiminghe.pdf
    Source snippet

    Deep Residual Networks“Deep Residual Learning for Image Recognition”. CVPR 2016. • If identity were optimal, easy to set weights as 0. •...

  8. Source: abhik.ai
    Title: skip connections
    Link: https://www.abhik.ai/concepts/deep-learning/skip-connections
    Source snippet

    in Neural Networks1 Apr 2024 — Skip Connections in Neural Networks. Summary: Learn how skip connections and residual learning enable trai...

  9. Source: arxiv.org
    Link: https://arxiv.org/html/2405.01725v1
    Source snippet

    Development of Skip Connection in Deep Neural Networks...2 May 2024 — This survey provides a comprehensive summary and outlook on t...

    Published: May 2024

  10. Source: resnet.us
    Link: https://www.resnet.us/about/us/
    Source snippet

    United States.Read more...

  11. Source: web.cs.ucdavis.edu
    Link: https://web.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/resnet.pdf
    Source snippet

    Page 2... Identity Mapping. If the “extra” layers are identity functions. The network...Read more...

  12. Source: people.csail.mit.edu
    Title: cvpr2016 deep residual learning kaiminghe
    Link: https://people.csail.mit.edu/kaiming/cvpr16resnet/cvpr2016_deep_residual_learning_kaiminghe.pdf
    Source snippet

    arXiv 2016. Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”.Read more...

  13. Source: mohitjain.me
    Link: https://mohitjain.me/2018/06/13/resnet/
    Source snippet

    Deep Residual Learning for Image Recognition (ResNet)13 Jun 2018 — The paper analysed what was causing the accuracy of deeper networks to...

  14. Source: community.deeplearning.ai
    Title: Reading time: 6 min read.Read more
    Link: https://community.deeplearning.ai/t/resnet-identity-mapping/48515
    Source snippet

    identity mapping - Convolutional Neural NetworksOct 11, 2021 — Residual learning framework to ease the training of networks that are subs...

Additional References

  1. Source: irejournals.com
    Link: https://www.irejournals.com/formatedpaper/1703688.pdf
    Source snippet

    Deep Residual Learning for Image RecognitionDegradation Problem: The deeper networks at times visually showcase higher... Skip connectio...

  2. Source: semanticscholar.org
    Link: https://www.semanticscholar.org/paper/Deep-Residual-Learning-for-Image-Recognition-He-Zhang/2c03df8b48bf3fa39054345bafabfeff15bfd11d
    Source snippet

    Deep Residual Learning for Image RecognitionThis work presents a residual learning framework to ease the training of networks that are su...

  3. Source: huggingface.co
    Link: https://huggingface.co/learn/computer-vision-course/en/unit2/cnns/resnet
    Source snippet

    ResNet (Residual Network)ResNets introduce a concept called residual learning, which allows the network to learn the residuals (i.e., the...

  4. Source: medium.com
    Link: https://medium.com/%40kdwaMachineLearning/resnet-explained-how-skip-connections-saved-deep-learning-faed41c36418
    Source snippet

    ResNet Explained: How Skip Connections Saved Deep...Researchers introduced residual connections that skip one or more layers and add the...

  5. Source: medium.com
    Link: https://medium.com/%40tnodecode/resnet-e7e0cba19e04
    Source snippet

    ResNet. How skip connections enabled very deep…How skip connections enabled very deep networks and tackled problems like vanishing gradie...

  6. Source: researchgate.net
    Link: https://www.researchgate.net/publication/286512696_Deep_Residual_Learning_for_Image_Recognition
    Source snippet

    Deep Residual Learning for Image RecognitionWe present a residual learning framework to ease the training of networks that are substantia...

  7. Source: medium.com
    Link: https://medium.com/deepreview/review-of-identity-mappings-in-deep-residual-networks-ad6533452f33
    Source snippet

    Review of Identity Mappings in Deep Residual NetworksIn this paper, the authors investigate the nature of residual networks and the impac...

  8. Source: medium.com
    Link: https://medium.com/%40zilliz_learn/deep-residual-learning-for-image-recognition-0025592e3910

  9. Source: merriam-webster.com
    Link: https://www.merriam-webster.com/dictionary/residual
    Source snippet

    RESIDUAL Definition & Meaning1. of, relating to, or being a residue 2. leaving a residue that is effective for some time afterwardRead more...

  10. Source: viso.ai
    Link: https://viso.ai/deep-learning/resnet-residual-neural-network/
    Source snippet

    It is an innovative neural network architecture that was first introduced by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and...Read more...

Topic Tree

Follow this branch

Parent topic

Deep Learning Why Layers Changed AI

Related pages 4

More on this topic 3