Seitenanfang

Bottleneck file-I/O

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.


Database load, disk, CPU and memory usage are known ressources on modern servers, but there is another one which is much more important: The I/O volume is limited and can't be raised as easy as all the others. It has no fixed limit and can't be measured (very good). It's a rare (maybe the most rare) good and shouldn't be wasted.

Once all available I/O capacity is used, each following disk access request gets queued - slowing down the issuing task. All following requests from other processes are also queued and the system response time might peak from "ok" to "disaster" within seconds.

You should care about your scripts I/O usage before it's too late, because there are little weapons to fight the over-usage once it appears.

 

I started looking at one webserver process using strace:

strace -tte trace=open,stat,execve -f -p 27402 -o /tmp/webserver.trace
27402 is the process id (PID) of the webserver process to-be-monitored (find it using the "ps" command).

The new file "/tmp/webserver.trace" has most of the file I/O calls but the list will be very long. Try to shorten it with grep, starting with

head -n 50 /tmp/webserver.trace
Here is a sample from my trace file:
19:40:03.933350 open("/proc/27402/statm", O_RDONLY) = 23519:40:03.942873 stat("/usr/share/zoneinfo/America/Sao_Paulo", {st_mode=S_IFREG|0644, st_size=2001, ...}) = 019:40:03.943073 open("/usr/share/zoneinfo/America/Sao_Paulo", O_RDONLY) = 23519:40:03.953761 open("/proc/27402/statm", O_RDONLY) = 23519:40:03.955536 open("/proc/27402/statm", O_RDONLY) = 12419:40:03.958287 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 019:40:03.958436 open("/etc/localtime", O_RDONLY) = 12419:40:03.959058 open("/var/log/fastcgi/request.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 12419:40:04.897352 open("/proc/27402/statm", O_RDONLY) = 124
You'll notice calls to /proc/27402/statm, which is one of the process status files in the virtual /proc filesystem and which is free of cost because it's simulated by the Linux kernel and file requests are not reaching any physical disk. I don't care about them (currently, at least):
grep -v /proc/ /tmp/webserver.trace | head -n 50
The remaining calls are:
19:40:03.942873 stat("/usr/share/zoneinfo/America/Sao_Paulo", {st_mode=S_IFREG|0644, st_size=2001, ...}) = 019:40:03.943073 open("/usr/share/zoneinfo/America/Sao_Paulo", O_RDONLY) = 23519:40:03.958287 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 019:40:03.958436 open("/etc/localtime", O_RDONLY) = 12419:40:03.959058 open("/var/log/fastcgi/request.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 124
/etc/localtime is being used quite often. Perl's "localtime()" function is using this file (via underlaying C functions). Passing a TZ environment variable to the webserver (or fast-cgi server process) will remove them, here's the proof:
$ strace -e trace=open,stat perl -le 'for (1..3) { localtime time; }'[...]open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 3stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
The first localtime call reads /etc/localtime and all others trigger a stat call on it. Not that expensive but unnecessary if your scripte always run in the same timezone. Here is the proof:
$ TZ="Europe/Berlin" strace -e trace=open,stat perl -le 'for (1..3) { localtime time; }'[...]open("/usr/share/zoneinfo/Europe/Berlin", O_RDONLY|O_CLOEXEC) = 3
Only one access directly to my timezone file and nothing for the following two calls. Perfect! Did you expect a simple localtime() call to be that expensive?

I issued the TZ argument for being fixed and removed all these calls from my trace file:

grep -v /proc/ /tmp/webserver.trace | grep -v /etc/localtime | head -n 50

The remaining calls are:19:40:03.942873 stat("/usr/share/zoneinfo/America/Sao_Paulo", {st_mode=S_IFREG|0644, st_size=2001, ...}) = 019:40:03.943073 open("/usr/share/zoneinfo/America/Sao_Paulo", O_RDONLY) = 23519:40:03.959058 open("/var/log/fastcgi/request.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 124

These are not 50 lines and my job is far away from being done, but the example looks clear to me now. I assume that the timezone file reads are required and the FastCGI request log must be written - but I should really check if it must be opened for each single request.

Go on with your file, check everything (I prefer to start with the calls seen most often) and exclude the lines from the result using grep -v.

I found a system call used in about 80% of the requests - which is very bad (I think). We're running Gearman and Memcached - no need to lauch external tools and (at least in this case) really no need to call them multiple times per second.

 

File I/O improvements don't show up in lower usage numbers or faster access times (usually), it's just something you could do to delay your out-of-I/O problems - maybe even until the end of time.

The Linux kernel file cache will hide many problems with files used very often, but they might show up when memory gets low, the system starts swapping and file I/O gets really short - worst time to notice.

 

 

 

Noch keine Kommentare. Schreib was dazu

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>