[Paper Note] H2O Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models 09-11
[Paper Note] IMPRESS an Importance-Informed Multi-Tier Prefix KV Storage System for Large Language Model Inference 09-06
[Paper Note] Attentionstore Cost-Effective Attention Reuse Across Multi-Turn Conversations in Large Language Model Serving 07-02
[Paper Note] Understanding the Performance Implications of the Design Principles in Storage-Disaggregated Databases 08-15
[Paper Note] Efficient Exposure of Partial Failure Bugs in Distributed Systems With Inferred Abstract States 07-31
[Paper Note] Acto Automatic End-to-End Testing for Operation Correctness of Cloud System Management 07-03