Home avatar

主题的晦涩 人生的短暂

[Paper Note] SGLang Efficient Execution of Structured Language Model Programs

SGLang 的目标是提供一套完整、高效的 Language Model(LM) program 框架,包括前端的编程语言和后端的运行时,而非单独聚焦于 LLM 推理,前后端协同设计给 SGLang 更多的优化空间。

Fronted

LM program 即通过编程和语言模型交互的程序,由于语言模型 non-deterministic 的本质,LM program 需要做大量的工作,例如复杂的字符串处理,才能和 LM 交互。

KV Cache on SSD: Taking Twitter's Fatcache as an Example.

High performance in-memory key-value caches are indispensable components in large-scale web architecture. However, the limited memory capacity and high power consumption of memory motives researchers and developers to develop key-value cache on SSD, where SSD is considered as an extension of limited memory.

In this post, I will talk about the general ideas about KV cache on SSD based on Twitter’s fatcache and further discuss the issues with this traditional approach.

Background

Twitter’s fatcache and many other modern memory allocators, such as Google tcmalloc and Linux slab allocator are based on the idea of slab allocator. You can get the comprehensive detail about slab allocator in paper titled The slab allocator: An object-caching kernel memory allocator. I am not willing to delve into too many trivial details here, but overall, slab allocator is a kind of segregated list list allocator. The term slab is actually a continuous memory area, which is the basic management unit of slab allocator. A slab is further divided into slots of the same size which are used to store objects and other metadata. Besides, a slab uses a freelist to keep track of the allocation status of slots, which is the key of allocation and deallocation. All slabs with the same slot(object) size are categorized together and further organized into a sorted array based on the slot size. By doing so, the allocator is able to use binary search to allocate objects from the best-fitted slab.

[Paper Note] ALPS an Adaptive Learning, Priority OS Scheduler for Serverless Functions

Motivation

FaaS 环境下存在大量短生命周期的函数,这些函数作为进程调度到 OS 上。同时创建数千个函数都是家常便饭。由于 Faas Function 生命周期通常很短,研究表明 99% 的 Azure Function 都在 224s 以内。因此,OS 调度策略会对 FaaS function 的周转时间(turnaround time)产生重大影响。然而,Linux CFS 在大量短生命周期任务的 FaaS 下表现并不好。

[Paper Note] Demystifying and Checking Silent Semantic Violations in Large Distributed Systems

这个工作太神奇了,阅读 Understanding, detecting and localizing partial failures in large system software 的时候,在思考怎样检测 silent semantic violation,论文里说一个难点就是不知道正确的语义是什么,我想到也许可以用 LLM 推测。完全没想到可以用论文如此简洁的方式推测。

论文的思路很简单,从系统的 regression test 入手。尽管这些 test 通常是真的特定的 bug 的,但这些 test 仍然蕴含了系统的语义。论文要做的就是从 regression test 中推导出这些语义,并在运行时检测系统是否违背了语义。

[Paper Note] Understanding the Performance Implications of the Design Principles in Storage-Disaggregated Databases

本论文从最基本的单体数据库出发,一步步推导出目前主流的架构设计,并详细对这些设计进行性能分析。对于我这种新人而言,跟这作者的思路走,像是一场思想旅行,打开了一扇大门。

  • 论文针对哪种类型的数据库?

    storage-disaggregated OLTP database

  • 为什么 storage-disaggregated OLAP database 不使用 log-as-the-database 和 shared-storage 设计?

    OLAP 通常服务读密集型负载,这两个设计解决的是写密集型负载的痛点。

  • log-as-the-database 的原理和效果?

    计算节点只发送 xlog,存储节点通过重放 xlog 得到数据。降低了网络负担,并且利用了存储节点的 CPU。

[Paper Note] Understanding, Detecting and Localizing Partial Failures in Large System Software

背景

partial failure 是区别于 fail-stop 模型 full failure 的另一种故障模式,简而言之,partial failure 指系统部分地故障,但不是完全故障而无法服务。论文给 partial failure 下了以下定义:

对于一个服务器进程 P,其中包含许多组件,提供一系列服务 R。如果进程 P 中发生了故障(fault),但这个故障没有让 P crash,但却破坏了 $R_f \notin R$ 安全性(safety)保证、活性(liveness)或性能问题,这样的故障就是 partial failure。

0%