Jürgen Schmidhuber

Jürgen Schmidhuber (born 17 January 1963)[1] is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland.[2] He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.[3]

Jürgen Schmidhuber
Schmidhuber speaking at the AI for GOOD Global Summit in 2017
Born17 January 1963[1]
Alma materTechnical University of Munich
Known forLong short-term memory, Gödel machine, artificial curiosity, meta-learning
Scientific career
FieldsArtificial intelligence
InstitutionsDalle Molle Institute for Artificial Intelligence Research
Websitepeople.idsia.ch/~juergen

He is best known for his foundational and highly-cited[4] work on long short-term memory (LSTM), a type of neural network architecture which went on to become the dominant technique for various natural language processing tasks in research and commercial applications in the 2010s.

Career

Schmidhuber completed his undergraduate (1987) and PhD (1991) studies at the Technical University of Munich in Munich, Germany.[1] His PhD advisors were Wilfried Brauer and Klaus Schulten.[5] He taught there from 2004 until 2009. From 2009,[6] until 2021, he was a professor of artificial intelligence at the Università della Svizzera Italiana in Lugano, Switzerland.[1]

He has served as the director of Dalle Molle Institute for Artificial Intelligence Research (IDSIA), a Swiss AI lab, since 1995.[1]

In 2014, Schmidhuber formed a company, Nnaisense, to work on commercial applications of artificial intelligence in fields such as finance, heavy industry and self-driving cars. Sepp Hochreiter, Jaan Tallinn, and Marcus Hutter are advisers to the company.[2] Sales were under US$11 million in 2016; however, Schmidhuber states that the current emphasis is on research and not revenue. Nnaisense raised its first round of capital funding in January 2017. Schmidhuber's overall goal is to create an all-purpose AI by training a single AI in sequence on a variety of narrow tasks.[7]

Research

In the 1980s, backpropagation did not work well for deep learning with long credit assignment paths in artificial neural networks. To overcome this problem, Schmidhuber (1991) proposed a hierarchy of recurrent neural networks (RNNs) pre-trained one level at a time by self-supervised learning.[8] It uses predictive coding to learn internal representations at multiple self-organizing time scales. This can substantially facilitate downstream deep learning. The RNN hierarchy can be collapsed into a single RNN, by distilling a higher level chunker network into a lower level automatizer network.[8][9] In 1993, a chunker solved a deep learning task whose depth exceeded 1000.[10]

In 1991, Schmidhuber published adversarial neural networks that contest with each other in the form of a zero-sum game, where one network's gain is the other network's loss.[11][12][13] The first network is a generative model that models a probability distribution over output patterns. The second network learns by gradient descent to predict the reactions of the environment to these patterns. This was called "artificial curiosity." In 2014, this principle was used in a generative adversarial network where the environmental reaction is 1 or 0 depending on whether the first network's output is in a given set. This can be used to create realistic deepfakes.

Schmidhuber may be best known for his early research on the long short-term memory (LSTM), a type of recurrent neural network. The LSTM was developed by his student Sepp Hochreiter and initially reported on in his 1991 diploma thesis which analyzed and overcame the famous vanishing gradient problem.[14] The name LSTM was introduced in a tech report (1995) leading to the most cited LSTM publication (1997), co-authored by Hochreiter and Schmidhuber.[15] Schmidhuber, Hochreiter, and Schmidhuber's other students, including Felix Gers, Fred Cummins, and Alex Graves, and others, Schmidhuber published increasingly sophisticated versions of the LSTM.

The standard LSTM architecture which is used in almost all current applications was introduced in 2000.[16] Today's "vanilla LSTM" using backpropagation through time was published in 2005,[17][18] and its connectionist temporal classification (CTC) training algorithm[19] in 2006. CTC enabled end-to-end speech recognition with LSTM. By the 2010s, the LSTM became the dominant technique for a variety of natural language processing tasks including speech recognition and machine translation, and was widely implemented in commercial technologies such as Google Translate and Siri.[20] LSTM has become the most cited neural network of the 20th century.[9]

In 2015, Rupesh Kumar Srivastava, Klaus Greff, and Schmidhuber used LSTM principles to create the Highway network, a feedforward neural network with hundreds of layers, much deeper than previous networks.[21][22] 7 months later, the ImageNet 2015 competition was won with an open-gated or gateless Highway network variant called Residual neural network.[23] This has become the most cited neural network of the 21st century.[9]

Since 2018, transformers have overtaken the LSTM as the dominant neural network architecture in natural language processing[24] through large language models such as ChatGPT. As early as 1992, Schmidhuber published an alternative to recurrent neural networks[25] which is now called a Transformer with linearized self-attention[26][27][9] (save for a normalization operator). It learns internal spotlights of attention:[28] a slow feedforward neural network learns by gradient descent to control the fast weights of another neural network through outer products of self-generated activation patterns FROM and TO (which are now called key and value for self-attention).[26] This fast weight attention mapping is applied to a query pattern.

In 2011, Schmidhuber's team at IDSIA with his postdoc Dan Ciresan also achieved dramatic speedups of convolutional neural networks (CNNs) on fast parallel computers called GPUs. An earlier CNN on GPU by Chellapilla et al. (2006) was 4 times faster than an equivalent implementation on CPU.[29] The deep CNN of Dan Ciresan et al. (2011) at IDSIA was already 60 times faster[30] and achieved the first superhuman performance in a computer vision contest in August 2011.[31] Between 15 May 2011 and 10 September 2012, their fast and deep CNNs won no fewer than four image competitions.[32][33] They also significantly improved on the best performance in the literature for multiple image databases.[34] The approach has become central to the field of computer vision.[33] It is based on CNN designs introduced much earlier by Yann LeCun et al. (1989)[35] who applied the backpropagation algorithm to a variant of Kunihiko Fukushima's original CNN architecture called neocognitron,[36] later modified by J. Weng's method called max-pooling.[37][33]

Credit disputes

Schmidhuber has controversially argued that he and other researchers have been denied adequate recognition for their contribution to the field of deep learning, in favour of Geoffrey Hinton, Yoshua Bengio and Yann LeCun, who shared the 2018 Turing Award for their work in deep learning.[2][20][38] He wrote a "scathing" 2015 article arguing that Hinton, Bengio and Lecun "heavily cite each other" but "fail to credit the pioneers of the field".[38] In a statement to the New York Times, Yann LeCun wrote that "Jürgen is manically obsessed with recognition and keeps claiming credit he doesn't deserve for many, many things... It causes him to systematically stand up at the end of every talk and claim credit for what was just presented, generally not in a justified manner."[2] Schmidhuber replied that LeCun did this "without any justification, without providing a single example,"[39] and published details of numerous priority disputes with Hinton, Bengio and LeCun.[40] Some have suggested that Schmidhuber's accomplishments have been downplayed because of his personality.[20]

Recognition

Schmidhuber received the Helmholtz Award of the International Neural Network Society in 2013,[41] and the Neural Networks Pioneer Award of the IEEE Computational Intelligence Society in 2016[42] for "pioneering contributions to deep learning and neural networks."[1] He is a member of the European Academy of Sciences and Arts.[43][6]

References

  1. Schmidhuber, Jürgen. "Curriculum Vitae".
  2. John Markoff (27 November 2016). When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad’. The New York Times. Accessed April 2017.
  3. Jürgen Schmidhuber. cemse.kaust.edu.sa. Archived from the original on 13 March 2023. Retrieved 9 May 2023.
  4. "Juergen Schmidhuber". scholar.google.com. Retrieved 20 October 2021.
  5. "Jürgen H. Schmidhuber". The Mathematics Genealogy Project. Retrieved 5 July 2022.
  6. Dave O'Leary (3 October 2016). The Present and Future of AI and Deep Learning Featuring Professor Jürgen Schmidhuber. IT World Canada. Accessed April 2017.
  7. "AI Pioneer Wants to Build the Renaissance Machine of the Future". Bloomberg.com. 16 January 2017. Retrieved 23 February 2018.
  8. Schmidhuber, Jürgen (1992). "Learning complex, extended sequences using the principle of history compression (based on TR FKI-148, 1991)" (PDF). Neural Computation. 4 (2): 234–242. doi:10.1162/neco.1992.4.2.234. S2CID 18271205.
  9. Schmidhuber, Juergen (2022). "Annotated History of Modern AI and Deep Learning". arXiv:2212.11279 [cs.NE].
  10. Schmidhuber, Jürgen (1993). Habilitation Thesis (PDF).
  11. Schmidhuber, Jürgen (1991). "A possibility for implementing curiosity and boredom in model-building neural controllers". Proc. SAB'1991. MIT Press/Bradford Books. pp. 222–227.
  12. Schmidhuber, Jürgen (2010). "Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010)". IEEE Transactions on Autonomous Mental Development. 2 (3): 230–247. doi:10.1109/TAMD.2010.2056368. S2CID 234198.
  13. Schmidhuber, Jürgen (2020). "Generative Adversarial Networks are Special Cases of Artificial Curiosity (1990) and also Closely Related to Predictability Minimization (1991)". Neural Networks (preprint arXiv/1906.04493). 127: 58–66. arXiv:1906.04493. doi:10.1016/j.neunet.2020.04.008. PMID 32334341. S2CID 216056336.
  14. Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (diploma thesis). Technical University of Munich, Institute of Computer Science.
  15. Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.
  16. Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). "Learning to Forget: Continual Prediction with LSTM". Neural Computation. 12 (10): 2451–2471. CiteSeerX 10.1.1.55.5709. doi:10.1162/089976600300015015. PMID 11032042. S2CID 11598600.
  17. Graves, A.; Schmidhuber, J. (2005). "Framewise phoneme classification with bidirectional LSTM and other neural network architectures". Neural Networks. 18 (5–6): 602–610. CiteSeerX 10.1.1.331.5800. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.
  18. Klaus Greff; Rupesh Kumar Srivastava; Jan Koutník; Bas R. Steunebrink; Jürgen Schmidhuber (2015). "LSTM: A Search Space Odyssey". IEEE Transactions on Neural Networks and Learning Systems. 28 (10): 2222–2232. arXiv:1503.04069. Bibcode:2015arXiv150304069G. doi:10.1109/TNNLS.2016.2582924. PMID 27411231. S2CID 3356463.
  19. Graves, Alex; Fernández, Santiago; Gomez, Faustino (2006). "Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks". In Proceedings of the International Conference on Machine Learning, ICML 2006: 369–376. CiteSeerX 10.1.1.75.6306.
  20. Vance, Ashlee (15 May 2018). "This Man Is the Godfather the AI Community Wants to Forget". Bloomberg Business Week. Retrieved 16 January 2019.
  21. Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].
  22. Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks". Advances in Neural Information Processing Systems. Curran Associates, Inc. 28: 2377–2385.
  23. He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1.
  24. Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus. 151 (2): 127–138. doi:10.1162/daed_a_01905. S2CID 248377870.
  25. Schmidhuber, Jürgen (1 November 1992). "Learning to control fast-weight memories: an alternative to recurrent nets". Neural Computation. 4 (1): 131–139. doi:10.1162/neco.1992.4.1.131. S2CID 16683347.
  26. Schlag, Imanol; Irie, Kazuki; Schmidhuber, Jürgen (2021). "Linear Transformers Are Secretly Fast Weight Programmers". ICML 2021. Springer. pp. 9355–9366.
  27. Choromanski, Krzysztof; Likhosherstov, Valerii; Dohan, David; Song, Xingyou; Gane, Andreea; Sarlos, Tamas; Hawkins, Peter; Davis, Jared; Mohiuddin, Afroz; Kaiser, Lukasz; Belanger, David; Colwell, Lucy; Weller, Adrian (2020). "Rethinking Attention with Performers". arXiv:2009.14794 [cs.CL].
  28. Schmidhuber, Jürgen (1993). "Reducing the ratio between learning complexity and number of time-varying variables in fully recurrent nets". ICANN 1993. Springer. pp. 460–463.
  29. Kumar Chellapilla; Sid Puri; Patrice Simard (2006). "High Performance Convolutional Neural Networks for Document Processing". In Lorette, Guy (ed.). Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft.
  30. Ciresan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber (2011). "Flexible, High Performance Convolutional Neural Networks for Image Classification" (PDF). Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two. 2: 1237–1242. Retrieved 17 November 2013.
  31. "IJCNN 2011 Competition result table". OFFICIAL IJCNN2011 COMPETITION. 2010. Retrieved 14 January 2019.
  32. Schmidhuber, Jürgen (17 March 2017). "History of computer vision contests won by deep CNNs on GPU". Retrieved 14 January 2019.
  33. Schmidhuber, Jürgen (2015). "Deep Learning". Scholarpedia. 10 (11): 1527–54. CiteSeerX 10.1.1.76.1541. doi:10.1162/neco.2006.18.7.1527. PMID 16764513. S2CID 2309950.
  34. Ciresan, Dan; Meier, Ueli; Schmidhuber, Jürgen (June 2012). Multi-column deep neural networks for image classification. 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York, NY: Institute of Electrical and Electronics Engineers (IEEE). pp. 3642–3649. arXiv:1202.2745. CiteSeerX 10.1.1.300.3283. doi:10.1109/CVPR.2012.6248110. ISBN 978-1-4673-1226-4. OCLC 812295155. S2CID 2161592.
  35. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Backpropagation Applied to Handwritten Zip Code Recognition; AT&T Bell Laboratories
  36. Fukushima, Neocognitron (1980). "A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position". Biological Cybernetics. 36 (4): 193–202. doi:10.1007/bf00344251. PMID 7370364. S2CID 206775608.
  37. Weng, J; Ahuja, N; Huang, TS (1993). "Learning recognition and segmentation of 3-D objects from 2-D images". Proc. 4th International Conf. Computer Vision: 121–128.
  38. Oltermann, Philip (18 April 2017). "Jürgen Schmidhuber on the robot future: 'They will pay as much attention to us as we do to ants'". The Guardian. Retrieved 23 February 2018.
  39. Schmidhuber, Juergen (7 July 2022). "LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015". IDSIA, Switzerland. Archived from the original on 9 February 2023. Retrieved 3 May 2023.
  40. Schmidhuber, Juergen (30 December 2022). "Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award. Technical Report IDSIA-77-21". IDSIA, Switzerland. Archived from the original on 30 December 2022. Retrieved 3 May 2023.
  41. INNS Awards Recipients. International Neural Network Society. Accessed December 2016.
  42. Recipients: Neural Networks Pioneer Award. Piscataway, NJ: IEEE Computational Intelligence Society. Accessed January 2019.
  43. Members. European Academy of Sciences and Arts. Accessed December 2016.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.