基本信息

姓名:苏剑林

生日:1993年X月Y日

硕士:中山大学数学学院

本科:华南师范大学数学科学学院

坐标:广东广州

老家:广东云浮

爱好:阅读、研究、折腾

偶像:Richard Feynman

邮箱bojone@spaces.ac.cn

主页https://jianlin.su

微博https://weibo.com/bojone

推特https://x.com/Jianlin_S

代码https://github.com/bojone

学术Google Scholar

东扯西犊

中山大学基础数学研究生,本科为华南师范大学。93年从奥尔特星云移民地球,因忘记回家路线,遂仰望星空,希望找到时空之路。

兼爱各种科学,热衷钻牛角尖,因此经常碰壁,但偶然把牛角钻穿,也乐在其中。偏爱物理、天文、计算机,喜欢思考,企图打开科学的果壳。虽擅长理性分析,但也容易感情用事,崇拜Feynman。闲时无聊读金庸附庸风雅,没事偷懒玩象棋适情雅趣,时时兴起焖炖煮开水白菜,偶尔手痒也开开数据挖掘机仰望蓝翔。

明明要学基础数学,偏偏不务正业,沉溺神经网络,妄想人工智能,未曾在ACL、AAAI、CVPR、ICLR等发表多篇文章。目前专注于自然语言处理,企图破解语言奥秘。爱好写作,经常在博客天方夜谭,幸未被读者嫌弃。现科学空间(https://kexue.fm)恭候各位大驾光临,非诚亦可扰。

微言微语

  • 2025-11-13 13:00

    妈妈“现在的孩子怎么这么多病呢”
    儿子“不是病多了,是能治的病多了,搁以前都是夭折的”

    好精辟的回答,受教了!

    来源:https://www.zhihu.com/question/1926923396882621109/answer/1970943451643224638 评论区

  • 2025-11-10 17:06

    入职两年多,第二次到北京总部。

  • 2025-10-24 16:34

    推荐论文:

    Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization

    AlphaFlow_ Understanding and Improving MeanFlow Models

    Arithmetic-Mean μP for Modern Architectures_ A Unified Learning-Rate Scale for CNNs and ResNets

    Equilibrium Matching_ Generative Modeling with Implicit Energy-Based Models

    From Condensation to Rank Collapse_ A Two-Stage Analysis of Transformer Training Dynamics

    On residual network depth

    On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization

    Optimal Scaling Needs Optimal Norm

    Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks

    Who Said Neural Networks Aren't Linear?

    Why Low-Precision Transformer Training Fails_ An Analysis on Flash Attention

    https://papers.cool/arxiv/2510.04988,2510.20771,2510.04327,2510.02300,2510.06954,2510.03470,2510.19953,2510.03871,2510.11354,2510.08570,2510.04212

  • 2025-10-06 11:07

    疯狗逻辑:虽然某些人跟疯狗有很大区别,但我只要我认为这些区别不重要,那么某些人就是疯狗。

  • 2025-10-03 23:08

    我是一个比较蠢的人,只会按部就班地进行推导,同时也没啥直觉,通常无法理解能推导出来以外的内容。

  • 2025-10-02 21:23

    推荐论文:

    Conda_ Column-Normalized Adam for Training Large Language Models Faster

    DiVeQ_ Differentiable Vector Quantization Using the Reparameterization Trick

    Efficient Hyperparameter Tuning via Trajectory Invariance Principle

    Muon Outperforms Adam in Tail-End Associative Memory Learning

    Power Lines_ Scaling Laws for Weight Decay and Batch Size in LLM Pre-training

    Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws

    https://papers.cool/arxiv/2509.24218,2509.26469,2509.25049,2509.26030,2505.13738,2509.19189

  • 2025-09-16 11:04

    推荐论文:

    Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching

    Attention as an Adaptive Filter

    Causal Attention with Lookahead Keys

    Depth-Aware Initialization for Stable and Efficient Neural Network Training

    Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models

    Flow Straight and Fast in Hilbert Space_ Functional Rectified Flow

    Limitations of Normalization in Attention Mechanism

    Predicting the Order of Upcoming Tokens Improves Language Modeling

    Rotational Equilibrium_ How Weight Decay Balances Learning Across Neural Networks

    Scaled-Dot-Product Attention as One-Sided Entropic Optimal Transport

    The Optimiser Hidden in Plain Sight_ Training with the Loss Landscape's Induced Metric

    Transition Models_ Rethinking the Generative Learning Objective

    UltraMemV2_ Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

    Understanding Transformers through the Lens of Pavlovian Conditioning

    https://papers.cool/arxiv/2509.00336,2509.04154,2509.07301,2509.05018,2508.21106,2509.10384,2508.17821,2508.19228,2305.17212,2508.08369,2509.03594,2509.04394,2508.18756,2508.08289

  • 2025-08-10 23:52

    推荐论文:

    Accelerating Newton-Schulz Iteration for Orthogonalization via Chebyshev-type Polynomials

    Zero-Variance Gradients for Variational Autoencoders

    https://papers.cool/arxiv/2506.10935,2508.03587

  • 2025-08-09 19:20

    mark:知乎关注8万。

  • 2025-07-14 20:11

    推荐论文:

    AbbIE_ Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling

    Analysis of Muon's Convergence and Critical Batch Size

    Conformal Transformations for Symmetric Power Transformers

    GPAS_ Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

    https://papers.cool/arxiv/2507.08567,2507.01598,2503.03269,2506.22049

部分工作

title: Variational Inference: A Unified Framework of Generative Models and Some Revelations
author: Su Jianlin
journal: arXiv preprint arXiv:1807.05936
year: 2018

title: Using deep Residual Networks to search for galaxy-Ly $\alpha$ emitter lens candidates based on spectroscopic selection
author: Li Rui; Shu Yiping; Su Jianlin; Feng Haicheng; Zhang Guobao; Wang Jiancheng; Liu Hongtao
journal: Monthly Notices of the Royal Astronomical Society
volume: 482
number: 1
pages: 313--320
year: 2018
publisher: Oxford University Press

title: f-VAEs: Improve VAEs with Conditional Flows
author: Su Jianlin; Wu Guang
journal: arXiv preprint arXiv:1809.05861
year: 2018

title: Training Generative Adversarial Networks Via Turing Test
author: Su Jianlin
journal: arXiv preprint arXiv:1810.10948
year: 2018

title: Gan-qp: A novel gan framework without gradient vanishing and lipschitz constraint
author: Su Jianlin
journal: arXiv preprint arXiv:1811.07296
year: 2018

title: Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification
author: Ren Hao; Su Jianlin; Lu Hong
journal: arXiv preprint arXiv:1901.10112
year: 2019

title: Artist Style Transfer Via Quadratic Potential
author: Bhalley Rahul; Su Jianlin
journal: arXiv preprint arXiv:1902.11108
year: 2019

title: O-GAN: Extremely Concise Approach for Auto-Encoding Generative Adversarial Networks
author: Su Jianlin
journal: arXiv preprint arXiv:1903.01931
year: 2019

title: Rectified Exponential Units for Convolutional Neural Networks
author: Ying Yao; Su Jianlin; Shan Peng; Miao Ligang; Wang Xiaolian; Peng Silong
journal: IEEE Access
year: 2019
publisher: IEEE

title: A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
author: Zhepei Wei; Jianlin Su; Yue Wang; Yuan Tian; Yi Chang
journal: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
year: 2020
publisher: ACL

title: Whitening Sentence Representations for Better Semantics and Faster Retrieval
author: Jianlin Su; Jiarun Cao; Weijie Liu; Yangyiwen Ou
journal: arXiv preprint arXiv:2103.15316
year: 2021

title: RoFormer: Enhanced Transformer with Rotary Position Embedding
author: Jianlin Su; Yu Lu; Shengfeng Pan; Bo Wen; Yunfeng Liu
journal: arXiv preprint arXiv:2104.09864
year: 2021

往事如烟

苏剑林,今年(2009)正好16岁,居住在广东省云浮市的一个小村庄。

我从小就对科学感兴趣,数学是我的强项,不过到了初三,还要加上一个“化学”。

我从2006.09开始接触电脑,而接触网络的时间就是2007.01,想想看,发展还是挺快的(接触电脑之前我可是一无所知)。2007.04接触到了BBS,后来曾经自行建立过IT类的BBS,后来因为IT而疏远了科学。到了2008.09以后,我开始重新专注科学,于是在努力下,便诞生了这个Blog。

现在(2012年7月)我已经是高中毕业了。经历了很多事情,也成熟了很多,自我感觉我更懂得珍惜了,也有了各种各样喜欢的东西。以前我的很内向、腼腆,现在相对来说开朗了很多,也懂得和朋友们一起闹、一起疯了。当然,我对科学的激情有增无减,但是兴趣方面有所变化。数学依然是我的核心,我爱好物理,陶醉于天文,之前的化学、生物于我而言成为了业余的兴趣了。^_^愿在科学空间一直和各位读者分享我的科学人生。

目前(2018年1月)中山大学研究生二年级,专业是基础数学(方向为生物应用数学),但花了较多时间在机器学习相关(尤其是自然语言处理)方面。各种东西都想学,都想弄清楚,无奈心有余而力不足~加油吧,再前进一点点。

如今(2019年7月)总算顺利毕业了,彻底入坑了机器学习。目前在追一科技的机器学习算法部门打杂~

(未完,但别待续了吧~)