Van Jacobson’s network channels, from LWN.net.

Van Jacobson 在今年一月底的 linux.conf.au 上展示了他的 network channels 的 idea, 然後引發了一系列的討論. 雖然有不少障礙要克服, 不過設計上應該有蠻多學習的地方 ;-)

Van 的 slides [PDF] 上講到, 傳統的 Networking stack, 到現在的實作已經變成了 “Standard Model”. 在 Linux kernel 實作上大概如下 (From Van’s slides):

net channel 1

當封包被網路卡收到, kernel 會收到 interrupt, 然後呼叫 ISR, 或是有註冊 NET_RX_SOFTIRQ 的 softirq handler (一般應該是 driver; 另, 這裡也可能是 tasklet), 會根據自己的硬體運作方法把 packer 收下來組成 skb, 然後呼叫 net/core/dev.c:netif_receive_skb(). netif_receive_skb() 這裡會檢查 payload 然後解多工. 舉個例子, IP 應該會送到 net/ipv4/ip_input.c:ip_recv() 去. 當然, 後面的 Socket (更高的像是 UDP/TCP Layer) 也是會參考這個 skb, 當然就大家所知, TCP 甚至還要組成 Byte-Stream.

這樣的設計當可能有一些缺點, 為了不失原意, 我摘錄原文如下:

  • Passing network packets through multiple layers of the kernel.
    When a packet arrives, the network card’s interrupt handler begins the task of feeding the packet to the kernel. The remainder of the work may well be performed at software interrupt level within the driver (in a tasklet, perhaps). The core network processing happens in another software interrupt. Copying the data (an expensive operation in itself) to the application happens in kernel context. Finally the application itself does something interesting with the data. The context changes are expensive, and if any of these changes causes the work to move from one CPU to another, a big cache penalty results. Much work has been done to improve CPU locality in the networking subsystem, but much remains to be done.
  • Locking is expensive.
    Taking a lock requires a cross-system atomic operation and moves a cache line between processors. Locking costs have led to the development of lock-free techniques like seqlocks and read-copy-update (RCU), but the the networking stack (like the rest of the kernel) remains full of locks.
  • The networking code makes extensive use of queues implemented with doubly-linked lists.
    These lists have poor cache behavior since they require each user to make changes (and thus move cache lines) in multiple places.

因此, 為了要增加 networking scalability, 首要就是要消除 locking 和 shared data. Van 利用 end-to-end principle 來達成這個目的. 也就是說, 盡可能的讓資料交給 application, 而不要在 kernel 任何地方等待. 於是他設計了 net channel — 一個 circular buffer (應該是 Circular FIFO queue implemented by Array) 用來取代 skb 和目前用在 networking stack 的 queue. 比方說, 原先需要用 softirq 的地方 (driver -> socket), 改用 netchannel, locking 數都明顯的下降, 進而提高 scalability.

但是, 這個方法遇到的第一個問題就是, 讓資料從 packet 一條鞭的到 application, 會把 netfilter 的原先 hook 的點變相的消除, 為了加回來 netfilter 的 support, 利用得到的優勢便蕩然無存了.

不過我有一點不懂的是, 一個還算簡單的 circular buffer, 為何是 “Cache aware, cache friendly queue” 呢? 是因為用 Array implement 這樣嗎? :p

Popularity: 20% [?]