[Paper Note] Acto Automatic End-to-End Testing for Operation Correctness of Cloud System Management
背景
许多部署在 Kubernetes 等现代云平台上的系统使用 operator 替代人工部署,但这些 operator 通常没有完整的 e2e 测试,极大的影响了分布式系统的可靠性。
由于这些原因,人工编写完善的 e2e 测试基本上是不可行的:
- 开发者很难在庞大的状态空间中构造良好的测试用例。人工编写的 e2e 测试通常从理想的初始状态触发,一步(只修改一次 spec)到达最终状态。这种测试无法覆盖足够多的状态转移。
- operator 的开发者和被管理的系统的开发者往往不是一拨人,operator 开发者很难有足够的知识完善 e2e 测试。
- operator 的协调循环(reconcile loop)涉及大量状态迁移,其中一些还涉及被管理系统的细节。
论文开发了一个自动生成 operator e2e 测试的框架 Acto,发现了大量流行的系统的 operator 中的 bug,其中某些 bug 甚至是由 Kubernetes 和 Go 语言运行时的 bug 导致的。
![Featured image for [Paper Note] Acto Automatic End-to-End Testing for Operation Correctness of Cloud System Management](/posts/acto-automatic-end-to-end-testing-for-operation-correctness-of-cloud-system-management/images/Acto-property-mapping.png)
![Featured image for [Paper Note] Wisckey Separating keys from values in ssd-conscious storage](/posts/wisckey-separating-keys-from-values-in-ssd-conscious-storage/images/Wiskey-data-layout.png)
![Featured image for [Paper Note] Linux block IO introducing multi-queue SSD access on multi-core systems](/posts/linux-block-io-introducing-multi-queue-ssd-access-on-multi-core-systems/images/pasted-image-20250912181245.png)

![Featured image for [Paper Note] Chain Replication for Supporting High Throughput and Availability](/posts/chain-replication-for-supporting-high-throughput-and-availability/images/Screenshot_20231111_214832.png)
![Featured image for [Paper Note] CAMP Compiler and Allocator-based Heap Memory Protection](/posts/camp-compiler-and-allocator-based-heap-memory-protection/images/pasted-image-20250912181506.png)
![Featured image for [Paper Note] The design of a practical system for fault-tolerant virtual machines](/posts/the-design-of-a-practical-system-for-fault-tolerant-virtual-machines/images/VMware-basic-FT-configuration.png)