Publications
2023
- RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-DesignBenjamin Reidys, Yuqi Xue, Yiqi Liu, Daixuan Li, Bharat Sukhwanim, Wen-mei Hwu, Deming Chen, Sameh Asaad, and Jian HuangTo appear in 29th ACM Symposium on Operating Systems Principles (SOSP’23)
- Hardware-Assisted Virtualization for Neural Processing UnitsYuqi Xue, Yiqi Liu, Lifeng Nai, and Jian HuangIn 1st Workshop on Hot Topics in System Infrastructure (HotInfra’23)
@inproceedings{npuvirt:hotinfra23, author = {Xue, Yuqi and Liu, Yiqi and Nai, Lifeng and Huang, Jian}, title = {Hardware-Assisted Virtualization for Neural Processing Units}, year = {2023}, booktitle = {1st Workshop on Hot Topics in System Infrastructure}, series = {HotInfra'23}, }
- System Virtualization for Neural Processing UnitsYuqi Xue, Yiqi Liu, and Jian HuangIn Proceedings of the 19th Workshop on Hot Topics in Operating Systems (HotOS’23)
Modern cloud platforms have been employing hardware accelerators such as neural processing units (NPUs) to meet the increasing demand for computing resources for AI-based application services. However, due to the lack of system virtualization support, the current way of using NPUs in cloud platforms suffers from either low resource utilization or poor isolation between multi-tenant application services. In this paper, we investigate the system virtualization techniques for NPUs across the entire software and hardware stack, and present our NPU virtualization solution named NeuCloud. We propose a flexible NPU abstraction named vNPU that allows fine-grained NPU virtualization and resource management. We leverage this abstraction and design the vNPU allocation, mapping, and scheduling policies to maximize the resource utilization, while achieving both performance and security isolation for vNPU instances at runtime.
@inproceedings{neucloud:hotos23, author = {Xue, Yuqi and Liu, Yiqi and Huang, Jian}, title = {System Virtualization for Neural Processing Units}, year = {2023}, isbn = {9798400701955}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3593856.3595912}, doi = {10.1145/3593856.3595912}, booktitle = {Proceedings of the 19th Workshop on Hot Topics in Operating Systems}, pages = {80-86}, numpages = {7}, keywords = {accelerator virtualization, hardware accelerator, cloud computing, neural processing unit}, location = {Providence, RI, USA}, series = {HotOS'23}, }
- V10: Hardware-Assisted NPU Multi-Tenancy for Improved Resource Utilization and FairnessYuqi Xue, Yiqi Liu, Lifeng Nai, and Jian HuangIn Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA’23)
Modern cloud platforms have deployed neural processing units (NPUs) like Google Cloud TPUs to accelerate online machine learning (ML) inference services. To improve the resource utilization of NPUs, they allow multiple ML applications to share the same NPU, and developed both time-multiplexed and preemptive-based sharing mechanisms. However, our study with real-world NPUs discloses that these approaches suffer from surprisingly low utilization, due to the lack of support for fine-grained hardware resource sharing in the NPU. Specifically, its separate systolic array and vector unit cannot be fully utilized at the same time, which requires fundamental hardware assistance for supporting multi-tenancy.In this paper, we present V10, a hardware-assisted NPU multi-tenancy framework for improving resource utilization, while ensuring fairness for different ML services. We rethink the NPU architecture for supporting multi-tenancy. V10 employs an operator scheduler for enabling concurrent operator executions on the systolic array and the vector unit and offers flexibility for enforcing different priority-based resource-sharing mechanisms. V10 also enables fine-grained operator preemption and lightweight context switch in the NPU. To further improve NPU utilization, V10 also develops a clustering-based workload collocation mechanism for identifying the best-matching ML services on a shared NPU. We implement V10 with an NPU simulator. Our experiments with various ML workloads from MLPerf AI Benchmarks demonstrate that V10 can improve the overall NPU utilization by 1.64\texttimes, increase the aggregated throughput by 1.57\texttimes, reduce the average latency of ML services by 1.56\texttimes, and tail latency by 1.74\texttimes on average, in comparison with state-of-the-art NPU multi-tenancy approaches.
@inproceedings{v10:isca23, author = {Xue, Yuqi and Liu, Yiqi and Nai, Lifeng and Huang, Jian}, title = {V10: Hardware-Assisted NPU Multi-Tenancy for Improved Resource Utilization and Fairness}, year = {2023}, isbn = {9798400700958}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3579371.3589059}, doi = {10.1145/3579371.3589059}, booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture}, articleno = {24}, numpages = {15}, keywords = {multi-tenancy, neural processing unit, ML accelerator}, location = {Orlando, FL, USA}, series = {ISCA'23}, }
2022
- Building A Trusted Execution Environment for In-Storage ComputingYuqi Xue, Luyi Kang, Weiwei Jia, Xiaohao Wang, Jongryool Kim, Changhwan Youn, Myeong Joon Kang, Hyung Jin Lim, Bruce Jacob, and Jian Huang
author = {Xue, Yuqi and Kang, Luyi and Jia, Weiwei and Wang, Xiaohao and Kim, Jongryool and Youn, Changhwan and Kang, Myeong Joon and Lim, Hyung Jin and Jacob, Bruce and Huang, Jian}, title = {Building A Trusted Execution Environment for In-Storage Computing}, doi = {10.48550/ARXIV.2205.06361}, keywords = {Cryptography and Security (cs.CR), Hardware Architecture (cs.AR), FOS: Computer and information sciences, FOS: Computer and information sciences}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}, }
2021
- IceClave: A Trusted Execution Environment for In-Storage ComputingLuyi Kang*, Yuqi Xue*, Weiwei Jia*, Xiaohao Wang, Jongryool Kim, Changhwan Youn, Myeong Joon Kang, Hyung Jin Lim, Bruce Jacob, and Jian HuangIn Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21)*Co-primary authors.A short version is presented in NVMW’22.
In-storage computing with modern solid-state drives (SSDs) enables developers to offload programs from the host to the SSD. It has been proven to be an effective approach to alleviate the I/O bottleneck. To facilitate in-storage computing, many frameworks have been proposed. However, few of them treat the in-storage security as the first citizen. Specifically, since modern SSD controllers do not have a trusted execution environment, an offloaded (malicious) program could steal, modify, and even destroy the data stored in the SSD. In this paper, we first investigate the attacks that could be conducted by offloaded in-storage programs. To defend against these attacks, we build a lightweight trusted execution environment, named IceClave for in-storage computing. IceClave enables security isolation between in-storage programs and flash management functions that include flash address translation, data access control, and garbage collection, with TrustZone extensions. IceClave also achieves security isolation between in-storage programs by enforcing memory integrity verification of in-storage DRAM with low overhead. To protect data loaded from flash chips, IceClave develops a lightweight data encryption/decryption mechanism in flash controllers. We develop IceClave with a full system simulator. We evaluate IceClave with a variety of data-intensive applications such as databases. Compared to state-of-the-art in-storage computing approaches, IceClave introduces only 7.6% performance overhead, while enforcing security isolation in the SSD controller with minimal hardware cost. IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.31x better performance than the conventional host-based trusted computing approach.
@inproceedings{iceclave:micro21, author = {Kang*, Luyi and Xue*, Yuqi and Jia*, Weiwei and Wang, Xiaohao and Kim, Jongryool and Youn, Changhwan and Kang, Myeong Joon and Lim, Hyung Jin and Jacob, Bruce and Huang, Jian}, title = {IceClave: A Trusted Execution Environment for In-Storage Computing}, year = {2021}, isbn = {9781450385572}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3466752.3480109}, doi = {10.1145/3466752.3480109}, booktitle = {Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture}, pages = {199-211}, numpages = {13}, keywords = {Trusted Execution Environment, Security Isolation, In-Storage Computing, ARM TrustZone}, location = {Virtual Event, Greece}, series = {MICRO'21}, }