时光荏苒
基本信息
姓名:苏剑林
生日:1993年X月Y日
硕士:中山大学数学学院
本科:华南师范大学数学科学学院
坐标:广东广州
老家:广东云浮
爱好:阅读、研究、折腾
偶像:Richard Feynman
东扯西犊
中山大学基础数学研究生,本科为华南师范大学。93年从奥尔特星云移民地球,因忘记回家路线,遂仰望星空,希望找到时空之路。
兼爱各种科学,热衷钻牛角尖,因此经常碰壁,但偶然把牛角钻穿,也乐在其中。偏爱物理、天文、计算机,喜欢思考,企图打开科学的果壳。虽擅长理性分析,但也容易感情用事,崇拜Feynman。闲时无聊读金庸附庸风雅,没事偷懒玩象棋适情雅趣,时时兴起焖炖煮开水白菜,偶尔手痒也开开数据挖掘机仰望蓝翔。
明明要学基础数学,偏偏不务正业,沉溺神经网络,妄想人工智能,未曾在ACL、AAAI、CVPR、ICLR等发表多篇文章。目前专注于自然语言处理,企图破解语言奥秘。爱好写作,经常在博客天方夜谭,幸未被读者嫌弃。现科学空间(https://kexue.fm)恭候各位大驾光临,非诚亦可扰。
微言微语
-
2025-11-13 13:00
妈妈“现在的孩子怎么这么多病呢”
儿子“不是病多了,是能治的病多了,搁以前都是夭折的”好精辟的回答,受教了!
来源:https://www.zhihu.com/question/1926923396882621109/answer/1970943451643224638 评论区
-
2025-11-10 17:06
入职两年多,第二次到北京总部。
-
2025-10-24 16:34
推荐论文:
Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
AlphaFlow_ Understanding and Improving MeanFlow Models
Arithmetic-Mean μP for Modern Architectures_ A Unified Learning-Rate Scale for CNNs and ResNets
Equilibrium Matching_ Generative Modeling with Implicit Energy-Based Models
From Condensation to Rank Collapse_ A Two-Stage Analysis of Transformer Training Dynamics
On residual network depth
On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization
Optimal Scaling Needs Optimal Norm
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
Who Said Neural Networks Aren't Linear?
Why Low-Precision Transformer Training Fails_ An Analysis on Flash Attention
https://papers.cool/arxiv/2510.04988,2510.20771,2510.04327,2510.02300,2510.06954,2510.03470,2510.19953,2510.03871,2510.11354,2510.08570,2510.04212
-
2025-10-06 11:07
疯狗逻辑:虽然某些人跟疯狗有很大区别,但我只要我认为这些区别不重要,那么某些人就是疯狗。
-
2025-10-03 23:08
我是一个比较蠢的人,只会按部就班地进行推导,同时也没啥直觉,通常无法理解能推导出来以外的内容。
-
2025-10-02 21:23
推荐论文:
Conda_ Column-Normalized Adam for Training Large Language Models Faster
DiVeQ_ Differentiable Vector Quantization Using the Reparameterization Trick
Efficient Hyperparameter Tuning via Trajectory Invariance Principle
Muon Outperforms Adam in Tail-End Associative Memory Learning
Power Lines_ Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws
https://papers.cool/arxiv/2509.24218,2509.26469,2509.25049,2509.26030,2505.13738,2509.19189
-
2025-09-16 11:04
推荐论文:
Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching
Attention as an Adaptive Filter
Causal Attention with Lookahead Keys
Depth-Aware Initialization for Stable and Efficient Neural Network Training
Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models
Flow Straight and Fast in Hilbert Space_ Functional Rectified Flow
Limitations of Normalization in Attention Mechanism
Predicting the Order of Upcoming Tokens Improves Language Modeling
Rotational Equilibrium_ How Weight Decay Balances Learning Across Neural Networks
Scaled-Dot-Product Attention as One-Sided Entropic Optimal Transport
The Optimiser Hidden in Plain Sight_ Training with the Loss Landscape's Induced Metric
Transition Models_ Rethinking the Generative Learning Objective
UltraMemV2_ Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Understanding Transformers through the Lens of Pavlovian Conditioning
https://papers.cool/arxiv/2509.00336,2509.04154,2509.07301,2509.05018,2508.21106,2509.10384,2508.17821,2508.19228,2305.17212,2508.08369,2509.03594,2509.04394,2508.18756,2508.08289
-
2025-08-10 23:52
推荐论文:
Accelerating Newton-Schulz Iteration for Orthogonalization via Chebyshev-type Polynomials
Zero-Variance Gradients for Variational Autoencoders
https://papers.cool/arxiv/2506.10935,2508.03587
-
2025-08-09 19:20
mark:知乎关注8万。
-
2025-07-14 20:11
推荐论文:
AbbIE_ Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling
Analysis of Muon's Convergence and Critical Batch Size
Conformal Transformations for Symmetric Power Transformers
GPAS_ Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
https://papers.cool/arxiv/2507.08567,2507.01598,2503.03269,2506.22049
部分工作
title: Variational Inference: A Unified Framework of Generative Models and Some Revelations
author: Su Jianlin
journal: arXiv preprint arXiv:1807.05936
year: 2018
title: Using deep Residual Networks to search for galaxy-Ly $\alpha$ emitter lens candidates based on spectroscopic selection
author: Li Rui; Shu Yiping; Su Jianlin; Feng Haicheng; Zhang Guobao; Wang Jiancheng; Liu Hongtao
journal: Monthly Notices of the Royal Astronomical Society
volume: 482
number: 1
pages: 313--320
year: 2018
publisher: Oxford University Press
title: f-VAEs: Improve VAEs with Conditional Flows
author: Su Jianlin; Wu Guang
journal: arXiv preprint arXiv:1809.05861
year: 2018
title: Training Generative Adversarial Networks Via Turing Test
author: Su Jianlin
journal: arXiv preprint arXiv:1810.10948
year: 2018
title: Gan-qp: A novel gan framework without gradient vanishing and lipschitz constraint
author: Su Jianlin
journal: arXiv preprint arXiv:1811.07296
year: 2018
title: Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification
author: Ren Hao; Su Jianlin; Lu Hong
journal: arXiv preprint arXiv:1901.10112
year: 2019
title: Artist Style Transfer Via Quadratic Potential
author: Bhalley Rahul; Su Jianlin
journal: arXiv preprint arXiv:1902.11108
year: 2019
title: O-GAN: Extremely Concise Approach for Auto-Encoding Generative Adversarial Networks
author: Su Jianlin
journal: arXiv preprint arXiv:1903.01931
year: 2019
title: Rectified Exponential Units for Convolutional Neural Networks
author: Ying Yao; Su Jianlin; Shan Peng; Miao Ligang; Wang Xiaolian; Peng Silong
journal: IEEE Access
year: 2019
publisher: IEEE
title: A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
author: Zhepei Wei; Jianlin Su; Yue Wang; Yuan Tian; Yi Chang
journal: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
year: 2020
publisher: ACL
title: Whitening Sentence Representations for Better Semantics and Faster Retrieval
author: Jianlin Su; Jiarun Cao; Weijie Liu; Yangyiwen Ou
journal: arXiv preprint arXiv:2103.15316
year: 2021
title: RoFormer: Enhanced Transformer with Rotary Position Embedding
author: Jianlin Su; Yu Lu; Shengfeng Pan; Bo Wen; Yunfeng Liu
journal: arXiv preprint arXiv:2104.09864
year: 2021
往事如烟
苏剑林,今年(2009)正好16岁,居住在广东省云浮市的一个小村庄。
我从小就对科学感兴趣,数学是我的强项,不过到了初三,还要加上一个“化学”。
我从2006.09开始接触电脑,而接触网络的时间就是2007.01,想想看,发展还是挺快的(接触电脑之前我可是一无所知)。2007.04接触到了BBS,后来曾经自行建立过IT类的BBS,后来因为IT而疏远了科学。到了2008.09以后,我开始重新专注科学,于是在努力下,便诞生了这个Blog。
现在(2012年7月)我已经是高中毕业了。经历了很多事情,也成熟了很多,自我感觉我更懂得珍惜了,也有了各种各样喜欢的东西。以前我的很内向、腼腆,现在相对来说开朗了很多,也懂得和朋友们一起闹、一起疯了。当然,我对科学的激情有增无减,但是兴趣方面有所变化。数学依然是我的核心,我爱好物理,陶醉于天文,之前的化学、生物于我而言成为了业余的兴趣了。^_^愿在科学空间一直和各位读者分享我的科学人生。
目前(2018年1月)中山大学研究生二年级,专业是基础数学(方向为生物应用数学),但花了较多时间在机器学习相关(尤其是自然语言处理)方面。各种东西都想学,都想弄清楚,无奈心有余而力不足~加油吧,再前进一点点。
如今(2019年7月)总算顺利毕业了,彻底入坑了机器学习。目前在追一科技的机器学习算法部门打杂~
(未完,但别待续了吧~)








最近评论