We should forget about small efficiencies — Donald Knuth
9 Sep
前一陣子上班時, Samuel 跑過來問:
“咦? 你們之前寫某個 Server 的時候, 怎麼讓程式自己產生 Core dump 啊?”
“就是用 setrlimit(2) 的啊!”
“那幫我看一下為啥我照著那樣寫不會動..”
沒想到就開始了殘酷的惡夢~ 讓我們花費了不少工夫才發現為甚麼.
在 Linux 上, 預設是不會有 Core dump 的, 而要讓程式產生 Core dump 的方法就是利用 bash built-in 的 ulimit 指令. 我去年這篇 How to enlarge Coredump Size and File Descriptor Limitations 剛好也有寫到.
不過意外常常有, 所以除了讓 System administrator 設 ulimit -c unlimited 之外, 我自己也會在程式裡面利用 setrlimit(2) 這隻 system call, 讓程式在執行時, 能夠不管 administrator 有沒有用 ulimit 設定 Core dump size, 保證一定會產生 Core dump. 在程式 crash 的時候, Core dump 是很重要線索啊! 就像 CSI 一樣, 讓證據會說話.
可是在看完 Samuel 的程式時, 我和他兩個就覺得很奇怪, 應該是會 work 才對, 因為同樣的 code 寫的程式已經在 production server 上跑了一段時間, 應該不會有問題才對, 這時候, 我才猛然想起來, 管機器的 administrator 曾說過有時候 Core dump 不會出現. “God! 該不會就是同一個問題吧?”
和 Samuel 兩個人找完資料的答案, 就是寫這篇文章的動機了. 我們發現, 如果一個程式按照上面的方法都無法產生 Core dump, 那要看看這個程式是否是用了 setuid(2) 這個 System call. 我們發現, 一個 setuid(2) 或 seteuid(2) 過後的程式, 是沒有辦法產生 Core dump 的.
好巧不巧, 通常 Server 為了一些安全性的考量, 也會實作 setuid(2) 或 seteuid(2) 來達到 Running with Least Privilege (相對應 Windows 的指令就是 Run As). 也就是說, setuid(2) 一定是不能略過的. 解決方法有兩種.
第一種是只有這個程式有效. 用的方法就是 prctl(2). 這個方法是可以改 source, 然後 rebuild 的狀況下才能用. Sample code 如下:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <pwd.h>
#include <sys/resource.h>
#include <sys/prctl.h>
#define SU_USER "nobody"
int main(void)
{
struct rlimit corelimit;
struct passwd *pw = NULL;
char *cp = NULL;
/* if switch to nobody failed */
if (NULL == (pw = getpwnam(SU_USER)))
{
fprintf(stderr, "Cannot get uid from user(%s). Error: %s\n",
SU_USER, strerror(errno));
return -1;
}
/* try to switch to nobody */
if ((setuid(pw->pw_uid) < 0) || (seteuid(pw->pw_uid) < 0))
{
fprintf(stderr, "Cannot switch to user(%s). Error: %s\n",
SU_USER, strerror(errno));
return -2;
}
/* force to make coredump */
if (prctl(PR_SET_DUMPABLE, 1) < 0)
{
fprintf(stderr, "Cannot enable core dumping. Error: %s\n",
strerror(errno));
return -3;
}
/* set core size to unlimited */
corelimit.rlim_cur = RLIM_INFINITY;
corelimit.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_CORE, &corelimit) < 0)
{
fprintf(stderr, "Setrlimit failed! Error: %s\n",
strerror(errno));
return -4;
}
/* force to coredump */
*cp = '1';
return 0;
}
這種方法要注意的事情有:
第二種方法是 system-wide 的, 也就是會影響到所有在系統執行的程式. 方法就是 /proc/sys/fs/suid_dumpable. 因為這個和 kernel 的版本有關, 請 man 5 proc 比較保險. 這是在無法動 source code, 而且原程式沒有用 prctl(2) 情況下的殺手鐧.
要注意的事情和第一種方法一樣, 把 /proc/sys/fs/suid_dumpable 設成 1 或 2 的差別, 是會決定 core dump file 的擁有者. 如果設成 1, 就會和第一種方法一樣是被 switch 的 owner (本例是 nobody). 如果設 2 就一定是 root, 但是這個似乎要 kernel 2.6.13 以上才有 support? 我不是很確定.
希望這篇文章給大家當作個參考囉!
PS: 這篇竟然寫了快二個周末, 真是夠久的~
Popularity: 68% [?]
13 Aug
今天更新 Gentoo Linux 時, 在 sudo emerge -uvDNat world 之後, 發現 php-5.2.3 升到 php-5.2.4_pre200708051230, 然後 expat-1.95.8 升到 expat-2.0.1, 當下暗暗感覺不妙, 果不其然, 升級完之後 apache 就不會動了, 比方說執行 sudo /etc/init.d/apache2 stop 時, 會出現:
sojia [~] -pigfoot- sudo /etc/init.d/apache2 stop
/usr/sbin/apache2: error while loading shared libraries: libexpat.so.0: cannot open shared object file: No such file or directory
果然, 因為 expat-1.95.8 的是編出來的 libexpat.so.0, expat-2.0.1 會編出 libexpat.so.1. 我試著 rebuild apache 結果也一樣不行.
正確答案要這樣:
sojia [~] -pigfoot- sudo revdep-rebuild
就是把系統中, 會用到 libexpat 的套件都 rebuild 之後才會動. 給大家作個參考.
Popularity: 57% [?]
13 Aug
上個禮拜, Sun 終於拿出殺手級的 processor : UltraSPARC T2 (Niagara 2)!
除了把從 T1 的 32 threads per processor (8 cores, 4 threads/core), 加到 64 threads per processor (8 cores, 8 threads/core) 之外, 最令人在意的是 “Zero Cost” Security 這個 feature.
一般大家不作一些 security 保護的理由不外乎是速度慢, 尤其是在 web server 的 implementation, 作了之後對 throughput 影響非常明顯, 所以在 T2 中, 他們加了一個 cryptographic unit, 也就是說可以利用這個硬體來完成一些費時的 security function.
本來以為作了 DES, 3DES, AES, MD5, SHA-1 就了不起了, 但是在 2007/07/08 的新聞中, 聽到 Sun Staff Engineer Dr. Lawrence Spracklen 的 talk, T2 support 的有:
後來在他的 blog (Lawrence Spracklen’s Blog) 中, 看到他寫的這篇 UltraSPARC T2 Crypto performance. 看起來還蠻不賴的啊~
接著我猜會有強者利用這些硬體作出 T2 的 SSL. 好期待測出來的數字啊~
PS: 這裡是用 UltraSPARC T1 CPU 的 T1000 和 T2000 的價錢, 不知道搭配 T2 的 Server 會賣多少錢 XD
Popularity: 36% [?]
5 Jul
When we write a network server program, I think lots of system calls have their own explicit parameters like socket(), bind(), accept(). But it’s very interesting when we use this system call listen(). Let’s see its prototype:
int listen(int sockfd, int backlog);
Yes, it’s very obvious that the first parameter is the socket fd. But, what’s the meaning of backlog number? Some body would tell us like manpage LISTEN(2) says: “The backlog parameter defines the maximum length the queue of pending connections may grow to. If a connection request arrives with the queue full the client may receive an error with an indication of ECONNREFUSED or, if the underlying protocol supports retransmission, the request may be ignored so that retries succeed.”
From a robust server’s perspective, what’s the maximum value it should be assign? At first, I assigned very large number like 1,024 (of course, listen system call still returns successfully). After saw the manpage LISTEN(2) in Linux, I was wrong..
If the socket is of type AF_INET, and the backlog argument is greater than the constant SOMAXCONN (128 in Linux 2.0 & 2.2), it is silently truncated to SOMAXCONN.
It doesn’t mention kernel 2.6. But it’s fine. Let’s investigate into Linux kernel source code.
In Linux kernel 2.6.20.1, we can see the listen system call implementation in net/socket.c line 1306. As it shows, the maximum number of backlog cannot be large than sysctl_somaxconn, which is assigned to SOMAXCONN. Furthermore, SOMAXCONN is defined 128 in include/linux/socket.h line 226.
In my opinion, in Linux 2.0 to 2.6, this means backlog cannot exceed 128 by default, or it would be truncated to SOMAXCONN silently like the manpage says.
How about FreeBSD? We can see the note of manpage LISTEN(2) in FreeBSD 6:
The listen() system call appeared in 4.2BSD. The ability to configure the maximum backlog at run-time, and to use a negative backlog to request the maximum allowable value, was introduced in FreeBSD 2.2.
I’m not very familiar with FreeBSD kernel, but let me try to trace. The start point is to check sys/kern/uipc_syscalls.c of cvstag RELENG_6 in FreeBSD. We can see listen system call will invoke solisten(so, uap->backlog, td). Thus, we go to sys/kern/uipc_socket.c now to see the implementation of solisten(struct socket *so, int backlog, struct thread *td). The same, the maximum value is somaxconn which is assigned to SOMAXCONN by default. Finally, we can see the value is defined in sys/sys/socket.h. The value is the same as Linux — 128.
To put it another way, if you’re writing a server program in either Linux or FreeBSD platform, it’s very appropriate to assign the value of backlog to 128. in FreeBSD, however, you can assign a negative backlog to request the maximum allowable value.
You may ask what’s the value of backlog in popular modern server? Let’s check the source of Apache HTTP Server. As you see in /server/listen.c, ap_listenbacklog is assigned to DEFAULT_LISTENBACKLOG which is defined 511 in /include/mpm_common.h.
Popularity: 57% [?]
31 Jan
According to Darryl Gove’s post yesterday, the UltraSPARC-T1 tuning guide has been updated to include information about the Cool Tools.
Popularity: 62% [?]
8 Jan
從金山大長輩 tjs 在 CDPA 板上 post 知道的.
On January 31st, FreeBSD 4.11 and FreeBSD 6.0 will have reached their End of Life dates and will no longer be supported by the FreeBSD Security Team.
Users of either of these FreeBSD releases are strongly encouraged to upgrade to FreeBSD 5.5, FreeBSD 6.1, or the upcoming FreeBSD 6.2 before that date.
| Branch | Release | Type | Release date | Estimated EoL |
|---|---|---|---|---|
| RELENG_4 | N/A | N/A | N/A | January 31, 2007 |
| RELENG_4_11 | 4.11-RELEASE | Extended | January 25, 2005 | January 31, 2007 |
| RELENG_5 | N/A | N/A | N/A | May 31, 2008 |
| RELENG_5_5 | 5.5-RELEASE | Extended | May 25, 2006 | May 31, 2008 |
| RELENG_6 | N/A | N/A | N/A | last release + 2y |
| RELENG_6_0 | 6.0-RELEASE | Normal | November 4, 2005 | January 31, 2007 |
| RELENG_6_1 | 6.1-RELEASE | Extended | May 9, 2006 | May 31, 2008 |
Popularity: 51% [?]
24 Dec
When I read the portage log of Gentoo Linux, I saw a interesting software called ngrep - network grep (net-analyzer/ngrep). Here is the description from official site:
ngrep strives to provide most of GNU grep’s common features, applying them to the network layer.
ngrep is a pcap-aware tool that will allow you to specify extended regular or hexadecimal expressions to match against data payloads of packets.
It currently recognizes IPv4/6, TCP, UDP, ICMPv4/6, IGMP and Raw across Ethernet, PPP, SLIP, FDDI, Token Ring and null interfaces, and understands BPF filter logic in the same fashion as more common packet sniffing tools, such as tcpdump and snoop.
We can visit the Usage Section and learn more about how ngrep works and can be leveraged to see all sorts of neat things.
Example: Debugging HTTP interactions:
# ngrep -W byline port 80
interface: eth0 (10.1.1.10/255.255.252.0)
filter: ip and ( port 80 )
####
T 10.1.1.10:42177 -> 64.90.164.74:80 [AP]
GET / HTTP/1.1.
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; X11; Linux i686) Opera …
Host: www.darkridge.com.
Accept: text/html, application/xml;q=0.9, application/xhtml+xml;q=0.9 …
Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1.
Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0.
Cookie: SQMSESSID=5272f9ae21c07eca4dfd75f9a3cda22e.
Cookie2: $Version=1.
Cache-Control: no-cache.
Connection: Keep-Alive, TE.
TE: deflate, gzip, chunked, identity, trailers.
.
##
T 64.90.164.74:80 -> 10.1.1.10:42177 [AP]
HTTP/1.1 200 OK.
Date: Mon, 29 Mar 2004 00:47:25 GMT.
Server: Apache/2.0.49 (Unix).
Last-Modified: Tue, 04 Nov 2003 12:09:41 GMT.
ETag: “210e23-326-f8200b40″.
Accept-Ranges: bytes.
Vary: Accept-Encoding,User-Agent.
Content-Encoding: gzip.
Content-Length: 476.
Keep-Alive: timeout=15, max=100.
Connection: Keep-Alive.
Content-Type: text/html; charset=ISO-8859-1.
Content-Language: en.
.
……….}S]..0.|………..H…8……..@..\….(…..Dw.%.,..;.k.. …
.;kw*U.j.< ...\0Tn.l.:......>Fs….’….h.’…u.H4..’.6.vIDI…….N.r …
..H..#..J….u.?..]….^..2…..e8v/gP…..].48…qD!……….#y…m …
####
Popularity: 48% [?]