Publications

Publications by categories in reversed chronological order.

2025

  1. __CoNEXT__
    Taurus: Towards A High-Performance and Generic Congestion Control Framework for Datacenter Networks
    Luyang Li, Heng Pan, Pengyi Zhang, Kai Lv, Zilong Wang, Xinchen Wan, and 11 more authors
    In Proceedings of the International Conference on Emerging Networking EXperiments and Technologies (CoNEXT) , 2025
  2. __SIGCOMM__
    Coflow Scheduling for LLM Training
    Xinchen Wan, Xinyu Yang, Kaiqiang Xu, Xudong Liao, Yilun Jin, Yijun Sun, and 3 more authors
    In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM Short Paper) , 2025
  3. __SIGCOMM__
    MixNet: A Runtime Reconfigurable Optical-Electrical Fabric for Distributed Mixture-of-Experts Training
    Xudong Liao, Yijun Sun, Han Tian, Xinchen Wan, Yilun Jin, Zilong Wang, and 10 more authors
    In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM) , 2025
  4. ____ATC____
    Towards Optimal Rack-scale μs-level CPU Scheduling through In-Network Workload Shaping
    Xudong Liao, Han Tian, Xinchen Wan, Chaoliang Zeng, Hao Wang, Junxue Zhang, and 3 more authors
    In Proceedings of USENIX Annual Technical Conference (ATC) , 2025
  5. __ASPLOS__
    Harmonia: A Unified Framework for Heterogeneous FPGA Acceleration in the Cloud
    Luyang Li, Heng Pan, Xinchen Wan, Kai Lv, Zilong Wang, Qian Zhao, and 6 more authors
    In Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , 2025
  6. __INFOCOM__
    A Generic and Efficient Communication Framework for Message-level In-Network Computing
    Xinchen Wan, Luyang Li, Han Tian, Xudong Liao, Xinyang Huang, Chaoliang Zeng, and 7 more authors
    In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM) , 2025
  7. __ASPLOS__
    Design and Operation of Shared Machine Learning Clusters on Campus
    Kaiqiang Xu, Decang Sun, Hao Wang, Zhenghang Ren, Xinchen Wan, Xudong Liao, and 3 more authors
    In Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , 2025
  8. __EuroSys__
    Achieving Fairness Generalizability for Learning-based Congestion Control with Jury
    Han Tian, Xudong Liao, Decang Sun, Chaoliang Zeng, Yilun Jin, Junxue Zhang, and 4 more authors
    In Proceedings of the 20th ACM European Conference on Computer Systems (EuroSys) , 2025

2024

  1. __SIGCOMM__
    Fast, Scalable, and Accurate Rate Limiter for RDMA NICs
    Zilong Wang, Xinchen Wan, Luyang Li, Yijun Sun, Peng Xie, Xin Wei, and 3 more authors
    In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM) , 2024
  2. __EuroSys__
    Astraea: Towards Fair and Efficient Learning-based Congestion Control
    Xudong Liao, Han Tian, Chaoliang Zeng, Xinchen Wan, and Kai Chen
    In Proceedings of the 19th ACM European Conference on Computer Systems (EuroSys) , 2024
  3. ___NSDI___
    Accelerating Neural Recommendation Training with Embedding Scheduling
    Chaoliang Zeng, Xudong Liao, Xiaodian Cheng, Han Tian, Xinchen Wan, Hao Wang, and 1 more author
    In Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI) , 2024
  4. ___NSDI___
    Towards Domain-Specific Network Transport for Distributed DNN Training
    Hao Wang, Han Tian, Jingrong Chen, Xinchen Wan, Jiacheng Xia, Gaoxiong Zeng, and 4 more authors
    In Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI) , 2024

2023

  1. __APNET__
    Accurate and Scalable Rate Limiter for RDMA NICs
    Zilong Wang, Xinchen Wan, Chaoliang Zeng, and Kai Chen
    In Proceedings of the 7th Asia-Pacific Workshop on Networking (APNet) , 2023
  2. __SIGMOD__
    Scalable and Efficient Full-Graph GNN Training for Large Graphs
    Xinchen Wan, Kaiqiang Xu, Xudong Liao, Yilun Jin, Kai Chen, and Xin Jin
    In Proceedings of the ACM on Management of Data (SIGMOD) , 2023
  3. ___NSDI___
    SRNIC: A scalable architecture for RDMA NICs
    Zilong Wang, Layong Luo, Qingsong Ning, Chaoliang Zeng, Wenxue Li, Xinchen Wan, and 5 more authors
    In Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI) , 2023

2022

  1. ___ICNP___
    DGS: Communication-Efficient Graph Sampling for Distributed GNN Training
    Xinchen Wan, Kai Chen, and Yiming Zhang
    In Proceedings of the 30th IEEE International Conference on Network Protocols (ICNP) , 2022

2021

  1. ___ArXiv___
    Tacc: A full-stack cloud computing infrastructure for machine learning tasks
    Kaiqiang Xu, Xinchen Wan, Hao Wang, Zhenghang Ren, Xudong Liao, Decang Sun, and 2 more authors
    arXiv preprint arXiv:2110.01556, 2021

2020

  1. ___ArXiv___
    Domain-specific communication optimization for distributed DNN training
    Hao Wang, Jingrong Chen, Xinchen Wan, Han Tian, Jiacheng Xia, Gaoxiong Zeng, and 4 more authors
    arXiv preprint arXiv:2008.08445, 2020
  2. __APNET__
    Rat-resilient allreduce tree for distributed machine learning
    Xinchen Wan, Hong Zhang, Hao Wang, Shuihai Hu, Junxue Zhang, and Kai Chen
    In Proceedings of the 4th Asia-Pacific Workshop on Networking (APNet) , 2020