Publications

Publications by categories in reversed chronological order.

2025

  1. __ASPLOS__
    Design and Operation of Shared Machine Learning Clusters on Campus
    Kaiqiang Xu, Decang Sun, Hao WangZhenghang RenXinchen WanXudong Liao, and 3 more authors
    In ACM ASPLOS, 2025
  2. __EuroSys__
    Achieving Fairness Generalizability for Learning-based Congestion Control with Jury
    Han TianXudong Liao, Decang Sun, Chaoliang ZengYilun JinJunxue Zhang, and 4 more authors
    In ACM EuroSys, 2025

2024

  1. __SIGCOMM__
    Fast, Scalable, and Accurate Rate Limiter for RDMA NICs
    Zilong WangXinchen WanLuyang Li, Yijun Sun, Peng Xie, Xin Wei, and 3 more authors
    In ACM SIGCOMM, 2024
  2. __EuroSys__
    Astraea: Towards Fair and Efficient Learning-based Congestion Control
    Xudong LiaoHan TianChaoliang ZengXinchen Wan, and Kai Chen
    In ACM EuroSys, 2024
  3. ___NSDI___
    Accelerating Neural Recommendation Training with Embedding Scheduling
    Chaoliang ZengXudong Liao, Xiaodian Cheng, Han TianXinchen WanHao Wang, and 1 more author
    In USENIX NSDI, 2024
  4. ___NSDI___
    Towards Domain-Specific Network Transport for Distributed DNN Training
    Hao WangHan TianJingrong ChenXinchen Wan, Jiacheng Xia, Gaoxiong Zeng, and 4 more authors
    In USENIX NSDI, 2024

2023

  1. __APNET__
    Accurate and Scalable Rate Limiter for RDMA NICs
    Zilong WangXinchen WanChaoliang Zeng, and Kai Chen
    In ACM APNet, 2023
  2. __SIGMOD__
    Scalable and Efficient Full-Graph GNN Training for Large Graphs
    Xinchen WanKaiqiang XuXudong LiaoYilun JinKai Chen, and Xin Jin
    In ACM SIGMOD, 2023
  3. ___NSDI___
    SRNIC: A scalable architecture for RDMA NICs
    Zilong WangLayong Luo, Qingsong Ning, Chaoliang ZengWenxue LiXinchen Wan, and 5 more authors
    In USENIX NSDI, 2023

2022

  1. ___ICNP___
    DGS: Communication-Efficient Graph Sampling for Distributed GNN Training
    Xinchen WanKai Chen, and Yiming Zhang
    In IEEE ICNP, 2022

2021

  1. ___ArXiv___
    Tacc: A full-stack cloud computing infrastructure for machine learning tasks
    Kaiqiang XuXinchen WanHao WangZhenghang RenXudong Liao, Decang Sun, and 2 more authors
    arXiv preprint arXiv:2110.01556, 2021

2020

  1. ___ArXiv___
    Domain-specific communication optimization for distributed DNN training
    Hao WangJingrong ChenXinchen WanHan Tian, Jiacheng Xia, Gaoxiong Zeng, and 4 more authors
    arXiv preprint arXiv:2008.08445, 2020
  2. __APNET__
    Rat-resilient allreduce tree for distributed machine learning
    Xinchen WanHong ZhangHao WangShuihai HuJunxue Zhang, and Kai Chen
    In ACM APNet, 2020