On Mindcrafts April 2022 Benchmark

From Yogi Central
Jump to: navigation, search

I just compiled [Apache] using almost all of the modules disabled I'm using the highperformance-conf.dist config file from the distribution." Also, Karthik's post about linux-kernel (and its followups) is available. This sounds rather like the behavior Mindcraft reported ("After the restart, Apache performance climbed back to within 30% of its peak from a low of about 6% of the peak performance"). Kernel issue #2: Wake One vs. Thundering Herd



(Note: A "task-exclusive" wake-one patch was identified by the Linux Scalability Project in their paper on the thundering herd issue. However, Andrea claims that as of 2.4.0-test10 it still wakes up processes according to the order they were put to bed, which is not optimal for caching. It would be more efficient to have the reverse order. See also Nov 2000 measurements by Andrew Morton ([email protected]); post 1, post 2, and Linus' reply.) Phillip Ezolt, 5/05/99 in linux-kernel. "Overscheduling DOES occur with high web server loads." "): "When I ran a SPECWeb96 strobe on Alpha/linux, I found 18% of the time was spent in the scheduler when the CPU is pegged." This was something Russinovich spoke about in his critique Linux. This post sparked a lively thread in linux kernel (now in its second-week). It looks like Apache (and the scheduler) are ready for some changes. - Rik van Riel, 6 May 1999, in linuxperf (Re: [linuxperf] Possible fix for Mindcraft Apache problem): ... The main problem with web benchmark remains. The way Apache and Linux "cooperate", there is a lot of trouble in the 'thundering shed' problem. When a signal is received, all processes get woken up. The scheduler must then choose one of the dozens ....runnable process. The real solution is to switch from wake-all semantics and use a wake-one style to avoid the huge runqueues that Phillip Ezolt, the DEC guy, experienced. The good news is that it's a simple patch that can probably be fixed within a few days... - Tony Gale, 6 May 1999, in linuxperf ( Re: [linuxperf] Possible fix for Mindcraft Apache problem): Apache uses file locking to serialise access to the accept call. This can be very costly on some systems. I haven't yet found the time or the patience to run the Linux numbers for all the possible server models. Check Stephens UNPv1 2nd Edition Chapter 27 for details. - Andrea Arcangeli May 12, 1999 in Linux-Kernel ( [patch] wake_one für accept(2) [was: Re: Overscheduling DOES NOT happen with high web server loads.] I released a new andrea/andrea.bz2 patch against 2.2.8. This new one contains my new wake-one code on accept(2) strightforward code. However, to get the improvement, you must ensure that your apache tasks are not sleeping in accept(2). A strace-p pidofapache should tell that. The patch can be accessed from this link. David Miller's answer to the above question:... on every TCP connection, there are 2 spurious and unsolicited wakeups. These wakeups originate in the write_space socket callback. This is because we free up SYN frames and wakeup listening socket sleepers. This is exactly the problem I have been trying to solve. Ingo Molnar May 13th 1999 in linux kernel ( Re: [RFT] 2.0.8_andrea1 wake-one ]Re: Overscheduling DOES sometimes occur when there is high web server load ]): note that pre-2.3.1 already has a wake-one implementation for accept() ... and more coming up. - Phillip Ezolt ([email protected]), May 14th, 1999, in linux-kernel ( Great News!! Was: [RFT] 2.2.8_andrea1 wake-one ): I've been doing some more SPECWeb96 tests, and with Andrea's patch to 2.2.8 (ftp://ftp.suse.com/pub/people/andrea/kernel/2.2.8_andrea1.bz) **On identical hardware, I get web-performance nearly identical to Tru64! **... Tru64 4ms2.2.5 100ms2.2.8 9ms2.2.8_a4ms... The time spent in schedule has decreased as shown by this Iprobe data. The number of SPECWeb96 maxOps per second has increased as well. **Please add the wakeone patch to the 2.2.X kernel. ** Larry Sendlosky tried this patch, and says: Your 2.2.8 patch really helps apache performance on a single cpu system, but there is really no performance improvement on a 2 cpu SMP system.



below. Also see: - Dimitris Michailidis, 14 May 1999 in linux–kernel. ([PATCH] scheduler fixes, improvements and improvements). -- several improvements to the 2.2.8 scheduler. - Andrea Arcangeli, [email protected], 21 May 1999 in linux–kernel. (Re: andrea buffer code (2.2.9–C.gz.) ) -- update. There might be some SMP bottleneck fixes. Kernel issue #3 - SMP Bottlenecks in the 2.2 Kernel



Juergen Schmidt, May 19, 1999 in linux-kernel ( Bad Apache Perfomance Wtih Linux SMP), asked about Apache's failures under SMP. Andi Kleen said that it is most likely that TCP data copy runs completely serialized. This can be fixed by replacing the skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err); in tcp.c:tcp_do_sendmsg with unlock_kernel(); skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err); lock_kernel(); The patch does not violate any locking requirements in the kernel... [To fix your connection refused errors,] try: echo 32768 > /proc/sys/fs/file-max echo 65536 > /proc/sys/fs/inode-max Overall it should be clear that the current Linux kernel doesn't scale to CPUs for system load (user load is fine). I blame the Linux vendors for advertising it, although it is not true. ... Work to fix all these problems is underway [2.3 will be fixed first, then the changes will be backported to 2.2]. [Note: Andi's TCP unlocking fix appears to be in 2.2.9-ac3.] Andrea Arcangeli responded describing his own version of this fix ( ftp://ftp.suse.com/pub/people/andrea/kernel/2.3.3_andrea2.bz2 ) as less cluttered: If you look at my patch (the second one, in the first one I missed the reaquire_kernel_lock done before returning from schedule, woops :) then you'll see my approch to address the unlock-during-uaccess. My patch doesn't modify tcp/ip ex2 etc... but it does touch uaccess.h as well as usercopy.c. I don't like unlock_kernel being all over the place. Juergen Schmidt, 26/05/99 on linux-kernel, and new-httpd ( Linux/Apache, and SMP - My fault ), has retracted his previous problem report. I had reported "disastrous performance" for Linux and Apache on an SMP system. To doublecheck, I've downloaded a clean kernel source (2.2.8 and 2.2.9) and had to realize, that those do *not* show the reported penalty when running on SMP systems. After seeing the first very poor results, I made the mistake of using the kernel sources that were already installed. These sources were already modified before the machine arrived to me. They should have been thrown away in the first instance. Please excuse my confusion. Others have reported modest performance gains (20% or so) with Andrea's SMP fix, but only when serving largish files (100 kilobytes). Juergen has completed his testing. Unfortunately, he neglected to compile Apache with -DSINGLE_LISTEN_UNSERIALIZED_ACCEPT, which ( according to Andrea) significantly hurt Apache performance. Juergen didn't notice this, which means it's too difficult to understand. To make it easier to get good performance in the future, we need the wake-one patch added to a stable kernel (say, 2.2.10), and we need Apache's configuration script to notice that the system is being compiled for 2.2.10 or later, and automatically select SINGLE_LISTEN_UNSERIALIZED_ACCEPT. Other Apache users are available to help with performance issues



Mike Whitaker ([email protected]), 22 mai 1999 in linuxperf (High load under Apache1.3.3/mod_perl1.16/Linux2.2.7 SMP ), described a strange performance problem. Our typical webserver is a dual PII450, with 1G and split httpd's. Typically, 300 static pages are served and 80-100 dynamic serve mod_perl ads. Unneeded modules will be disabled and hostname lookups shut down, just as any sensible person would. There's typically between one and three mod_perl hits/page on top of the usual dozen or so inline images... The kernel (2.2.7) has MAX_TASKS upped to 4090, and the unlock_kernel/lock_kernel around csum_and_copy_from_user() in tcp_do_sendmsg that Andi Kleen suggested. The performance is.. interesting. Load on the machine fluctuates between 10 and 120, while the user CPU goes from 15% (80% idle) to 180% (0% idle, machine *crawling*), about once every minute and a half. Vmstat shows that the number of processes in a state ranges from 0 (when load has been low) to 30-40. The static servers can manage 60-70 peak hits/sec. Without the dynamic httpd's everything *flies*... After being advised to try a kernel with wake-one support, he wrote back: We're up with 2.3.3 plus Andi Kleen's tcp_do_sendmsg patch plus Apache sleeping in accept() on one production server, and comparing it against a 2.2.7 plus tcp_do_sendmsg patch plus Apache sleeping in flock(). Identical systems (dual PII450 and 1G, two disk drivers). The wake-one patch is clearly doing its job. The 2.2.7 machine still has loads into three figures, while the 2.3.3 machine hasn’t managed to actully manage a load of 1. UNFORTUNATELY, observation suggests that the 2.3.3 machine/Apache combination is dropping/ignoring about one connection in ten, maybe more. (Network error. Connection reset by peer. His next update, which was posted on May 25, reads: More progress at the bleeding edge. (Remember that the config is split static/mod_perl HTTPD's with a very CPU-intensive mod_perl Script serving advertisements as an SSI, as the probable bottleneck.) Linux kernel version 2.2.9 plus the 2.2.

9_andrea3's (wake-1) patch seems to work. It handles hits at a speed that suggests that it's pushing the adverser close to its observed maximum. (As I said in a previous note, avoid 2.2.8 like the plague: it trashes HDs - see threads on linux-kernel for details.) However... When it *does* get overstressed, BOY does it get overstressed. When the idle CPU is at zero (i.e. The idle CPU is essentially 0 and it starts processing advert requests. Spikes in demand are a possibility. Once you're in this situation, it can be difficult to get back under the load of prgressively higher backlog requests. This is counterintuitive. You can *REDUCE* MaxClients and hope that the tcp listen queue can handle a load surge. This seems to work, according to experience. (Aside: This is a great case for Eddieware’s load balancing DNS. - Eric Hicks, 26 May 1999, in linux-kernel ( Apache/kernel problem? ): ... I'm experiencing some major problems. It appears that a single PII400Mhz or a single AMD400 will outrun a double PII 450 at http requests to Apache. ... HTTP Server Data Tests: 100 1MByte MPEG files stored on local disks. Results: - AMD 400Mghz K6, 128MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec. - PII 400Mghz, 512MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec. - Dual PII/450Mghz and 512MB, Linux 2.2.8 and Linux 2.0.36; handles far fewer than 300 simultaneous clients @57.6Kbits/sec.

I advised him to use 2.2.9_andrea3; and he said that he would try it and report back. Kernel issue #4: Interrupt Bottlenecks



According to Zach, the Mindcraft benchmark's use of four Fast Ethernet cards and a quad SMP system exposes a bottleneck in Linux's interrupt processing; the kernel spent a lot of time in synchronize_bh(). This bottleneck would be less stressed if there was only one Gigabit Ethernet card. Mingo claims that TCP throughput scales better with more CPUs in 2.3.9 than in 2.2.10; however, he hasn’t yet tried it with multiple Ethernets. See also comments on reducing interrupts under heavy load by Steve Underwood and Steven Guo. See also Linus's "State of Linux" talk at Usenix '99 where he talks about the Mindcraft benchmark and SMP scalability. SCT's Jan 2000 comments on progress in scaling can be found here. Softnet is on the horizon! Kernel 2.3.43 adds the new softnet networking changes. Softnet changes the interface of the networking cards. Therefore, every driver must be updated. However, network performance should be much better on large SMP systems. (For more information, Alexy's softnet-howto or his February 15 post about how you can convert old drivers to Softnet) The Feb '00 thread Gigabit Ethernet Bottlenecks (especially its second week) has lots of interesting tidbits about how what interrupt (and other) bottlenecks remain, and how they are being addressed in the 2.3 kernel. Ingo Molnar wrote a post 27 February 2000 that explains the IA32 code's improvements to interrupt handling in great detail. These will be integrated into the core Kernel in 2.5, it appears. Kernel issue #5 - Mysterious network slowdown



This is a bug, and not a scaling problem. Several users of 2.2 reported that their networking performance sometimes drops to 1 to 10% below normal. These slowdowns are often associated with high ping times. However, they were able to temporarily fix the problem by cycling the interface. Oystein Sigsen reported that after upgrading to 2.2, we experienced occasional slowdowns in TCP performance. When I remove the interface and insert the eepro100 modules into the kernel, the performance returns to normal. After I've done that, the performance is fine for a couple of days or maybe weeks. David Stahl reported on 29 Jun 1999: I have 3 computers running 2.2.10 [with multiple]3COM 905b PCI [cards ]...] After approximately two days of uptime I will begin to notice ping times jump to 7-20 secs on the local network. There is no loss, just some very high latency. ... It seems to be dependant upon the network load -- lighter loads lead to longer periods between problems. It is also gradual. It will start at 4 second, then 7 second, then 30 minutes later it can go up to 12-20 seconds. - Another eepro100 report. A tulip report. Less likely to happen again. - David Stahl wrote on 13 July 1999: What DID fix the problem was a private reply from someone elese (sorry about the credit, but i'm not in the mood to sieve 10k emails right now), to try the alpha version of the latest 3c59x.c driver from Donald Becker (http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html). 3c59x.c:v0.99L 5/28/99 is the version that fixed it, from ftp://cesdis.gsfc.nasa.gov/pub/linux/drivers/test/3c59x.c - On 23 Sep 1999, Alexey posted a one-line patch that clears up a similar mysterious slowdown. This patch has been applied to Red Hat 6.1 and 2.2.13. On three Red Hat 6.0 systems I know of with Masq support compiled in, connected to cable modems, this patch fixed a bug which caused very high pings after even short bursts of heavy TCP transfers to distant hosts. Rickard Cedergren, Michael Brown and others reported on linux-kernel on October 21st that Alexey's patch had greatly improved the problem but it is still not completely gone. Tony Hoyle is also experiencing long delays with 2.2.13. Jeremy Fitzhardinge reports another delay. The replies state that it is likely caused by a Tulip driver. Kernel issue #6. 2.2.x/NT-TCP slowdown



Petru Paler reported July 10, 1999 in linux_kernel (" [BUG] TCP communications between Linux and NT ") that any type TCP connection between Linux 2.2.10 (Service Pack 5) and a NT Server 4 (4 Service Pack 5) slows to a crawl. The problem was much milder (6kbytes/sec) with 2.0.37. Andi Kleen provided a log of a slow connection with tcpdump. This allowed Andi to see that NT took a long time to ACK a particular data packet, which was causing Linux stall. Solved: false alarm! It wasn’t Linux’s fault at any point. It turned out that NT was required to be told not use full duplex on the ethernet. Kernel issue #7, Scheduler



Phil Ezolt (22 January 2000 in linux_kernel): Re: Interesting analysis on linux kernel threading from IBM: When I run SPECWeb96 test here, I see both a large quantity of running processes as well as a lot context switches. ... Here's a sample of the vmstat data: procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id ... 24 0 0 2320 2066936 590088 1061464 0 0 0 0 8680 7402 3 96 1 24 0 0 2320 2065752 590664 1061464 0 0 0 1095 11344 10920 3 95 1 Notice. There are 24 processes running and 7000 context switches. This is a lot. Each second, 7000*24 goodnesses can be calculated. Not the (20*3) that a desktop systems sees. This is a scalability problem. A better scheduler means better scalability. Don't tell benchmark data that it is useless. Unless you can give me data using a real system and where it's faults are, benchmark data is all we have. SPECWeb96 pushes Linux til it bleeds. I'm going tell you where it is bleeding. You have two options. It might not be what your system is seeing today, but it will be in the future. Would you rather fix it right away or wait until someone else does? ... Here's a juicy tip. During my runs, 98% contention is seen on the [2.2.14] Kernel Lock, and it is accessed A LOT. I don’t have enough memory support to compare 2.3.40. Andrea will probably be kind enough to give me a patch and I'll be able to see if things have improved. [Phil's data pertains to the webserver that was subject to the SPECWeb96 test. It is an ES404 CPU alpha EV6 running Redhat 6.0 w/kernel v2.2.14 w/SGI speed patches; the interfaces taking the load are 2 ACENic gigabit ethernetcards. Minecraft servers Kernel issue #8: SMP Bottlenecks in 2.4 kernel



Manfred Spraul, April 21, 2000, in linux-kernel ( [PATCH] f_op->poll() without lock_kernel()): [email protected] noticed that select() caused a high contention for the kernel lock, so here is a patch that removes lock_kernel() from poll(). [tested] with version 2.3.99. Although there was some debate about whether this was a wise decision at this late date Linus and David Miller were enthusiastic. It seems like one more bottleneck is in the mix. On 26 April 2000, [email protected] posted benchmark results in Linux-Kernel with and without the lock_kernel() in poll(). Following ups included a kernel patch that improved checksum performance and a patch to Apache 1.3 that forced it to align its buffers at 32-word boundaries. Linus praised the Dean Gaudet patch. He rumored that it could speed up SPECWeb results by 3%. This was an interesting thread. This thread was very interesting. Kernel issue #9: csum_partial_copy_generic



[email protected], 19 May 2000, in linux-kernel ( [PATCH] Fast csum_partial_copy_generic and more ) reports a 3% reduction in total CPU time compared to 2.3.99-pre8 on i686 by optimizing the cache behavior of csum_partial_copy_generic. ZD's WebBench was used for the workload. He adds The benchmark we used has almost same setting as the MINDCRAFT ones, but the apache setting is [changed] slightly not to use symlink checking. We used 24 independent clients, and the maximum number of apache processes was 16.

A four-way XEON processor system is used. It is twice as fast as a single CPU. In ZD's benchmarks with 2.2.6 a four-way XEON processor system only achieved a 1.5x increase in speed over a single CPU. Kumon reports a > 2x speedup. This seems to be similar to the speedup NT4.0sp3 achieved using 4 CPUs with 24 clients. It's encouraging that things may have improved in 11 months after the 2.2.6 tests. Kumon indicated that major improvements were made between pre3 & pre5, poll optimization. Until pre4 (I forget exact version), kernel-lock prevents performance improvement. If you can retrieve all l-k mails between Apr 20-25, the following mails should help you understand the background. Subject: namei() question Subject: [PATCH] F_op->poll() with lock_kernel() Subject: Lockless Poll() (was Re. namei()Query) Subject: "movb” for spin-unlock

Kumon posted again on 4 September 2000, noting that his changes hadn't been implemented into the kernel. Kernel issue number 10: getname() and poll() optimizations



Manfred Spraul posted a patch for linux-kernel, 22 May 2000. This optimized kmalloc(). getname() and select() a little, speeding apache up by 1.5% on 2.3.99.pre8. Kernel issue #11



Alexander Viro posted a fix on 30 May 2000 to get rid of a lock in close_flip(). Kumon ran a benchmark and reported: I measured viro’s ac6D patch using WebBench on a 4cpu Xeon computer. I applied to 2.4.0-test1 not ac6. The patch reduced stext_lock times by 50% and OS time by 44%. ... Do_select accounts for some of the kmalloc/kfree overhead. It can be easily eliminated with a small array on top of a stack. Kumon then posted a patch which avoids kmalloc/kfree for select() and poll() when the number of fd's is less than 64. Kernel issue #12; Poor disk seek behavior with 2.2, new elevator codes in 2.4



On 20 July 2000, Robert Cohen ([email protected]) posted a report in Linux-kernel listing netatalk (appletalk file sharing) benchmarks comparing 2.0, 2.2, and several versions of 2.4.0-pre. The elevator code in 2.4 seems to help (some versions of 2.4 can handle 5 benchmark clients instead of 2) but ... The test4 and test5pre2 versions aren't as good. They manage 2 clients with a 128 Meg server without problems, which is better than 2.2. However they struggle to handle 4 clients. Things have changed a lot since test1 - ac22. Here's an update. The *only* 2.4 kernel versions that could handle 5 clients were 2.4.0-test1-ac22-riel and 2.4.0-test1-ac22-class 5+; everything before and after (up to 2.4.0-test5pre4) can only handle 2. Robert Cohen posted an updated on 26 September 2000. This included a program to show the problem. Jens Axboe, [email protected], replied that Andrea and him had a patch almost ready in 2.4.0.test9-pre5 to fix this problem. Robert Cohen posted an updated on 4/10/2000 with benchmark results for many Kernels, showing that the issue still exists in 2.4.0.test9. Kernel issue #13: Fast Forwarding / Hardware flow control



Jamal ([email protected]), posted on 18 September 2000 a note in Linux kernel describing proposed changes for the 2.4 kernel's driver interface. The changes add hardware flow control as well as several other refinements. Robert Olson and me decided, after the OLS, that we would try to reach the 100Mbps (148.8Kpps), routing peak by year's end. I fear the bar has been raised. Robert is already hitting with 2.4.0 test7 148Kpps, using an ASUS motherboard carrying PIII 700MHZ coppermine with approximately 65% CPU utilization. I was able get a consistent value in the 110Kpps range with a single PII-based Dell computer. As an example, I have attached a modified tulip drivers (hacked by Alexey and mod'ed and modified by Robert over a period) to show how feedback values can be used. ... I believe we could have done better with the mindcraft tests with these changes in 2.2 (and HW FC turned on). [update] BTW: I was informed that Linux users were not allowed to modify the hardware during those tests. I don't think they could have used these modifications if they were available then. Kernel tuning issue: hitting TIME_WAIT



Takashi RichardHorikawa posted a report on Linux-Kernel on March 30, 2000 that listed SPECWeb96 results for both 2.2.14 (and 2.3.41). Performance between a Client and Server running 2.2.14 was poor. This is because too few ports were being used, so TIME_WAIT was not used by ports. The lesson here is to tune clients and servers to use as wide a range of ports as possible. with echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range to avoid bumping into this situation when trying to simulate large numbers of clients with a small number of client machines. On the 2nd April 2000, Mr. Horikawa stated that the problem was solved by increasing the local port range using the above command. Suggestions on future benchmarks



Become familiarized with linux kernel and Apache mailing lists, as well the Linux newsgroups that exist on Usenet (try DejaNews power-searches in forums matching "*linux *').". Post your proposed configuration and see whether people agree with it. Post intermediate results and be open about what you have done. You should probably expect to spend a week or so mulling over ideas with these mailing lists during the course of your tests. If possible, use a modern benchmark like SPECWeb99 rather than the simple ones used by Mindcraft. To better simulate the Internet's situation, it might be possible to inject latency between the client and the server. If possible, benchmark both single and several CPUs as well as single and multiple Ethernet interfaces. You should be aware that the networking performance in Linux kernel version 2.2.2.x does poorly when you add more Ethernet cards and CPUs. This is mostly true for static pages and cached pages. Noncached dynamic pages take a lot of CPU time and should scale well as you add more CPUs. A cache can be used to save frequently generated pages. This will allow dynamic page speeds to be closer to static page speeds. If you are testing dynamic content, don't use the old model that runs a separate process for every request. That is too slow. Use a modern interface to generate dynamic content (e.g. Apache mod_perl Configuring Linux



Tuning problems probably resulted in less than 20% performance decrease in Mindcraft's test, so as of 3 October 1999, most people will be happy with a stock 2.2.13 kernel or whatever comes with Red Hat 6.1. When the 2.4 kernel is available, it will improve SMP performance. Here are some notes if you want to see what people going for the utmost were trying in June: - As of June 1, Linux kernel 2.2.9 plus 2.2.9_andrea3 have been mentioned as performing well on a dual-processor task (see above). (2.2.9_andrea3 seems to include both a wake-one scheduler fix as well as an SMP unlock_kernel fix.) (andrea3 only works on x86, I hear, so people with Alphas or PPC's will have to apply some other wake-one and tcp copy kernel_unlock patch.) Jan Gruber writes that the 2.2.9_andrea3 patch doesn't work with SMP Support disabled. Andrea told me to use ftp://ftp.suse.com/pub/people/andrea/kernel-patches/2.2.9_andrea-VM4.gz instead." Andrea Arcangeli contacted me on 7 June asking: If you are going for a bench, I would appreciate if you would also bench the patch below. ftp://e-mind.com/pub/andrea/kernel-patches/2.2.9_andrea-perf1.gz - On 11 Oct 1999, Andrea Arcangeli posted his list of pending 2.2.x patches, waiting to go into 2.2.13 or so. This includes several that might help performance of SMP systems and systems undergoing heavy I/O. These might be worth considering if you encounter bottlenecks. - For those who are truly brave, you might want to use the kernel-mode http servers, khttpd as a front-end to Apache. It accellerates static web page fetches greatly. It is currently at version 0.1. Please be cautious. - linux_kernel (week 1, week 2 ) is currently (8 Juni 1999) discussing Apache benchmarking. Linus Torvalds is in principle bullish on using khttpd or something like it, and points out that NT is doing the same kind of thing. Configuring Apache



- The usual optimizations should be applied (all unused modules should be left out when compiling, host name lookup should be disabled, and symbolic links should be followed; see http://www.apache.org/docs/misc/perf-tuning.html) - Apache should be compiled to block in accept, e.g. env CFLAGS='-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT' ./configure - The http://www.arctic.org/~dgaudet/apache/1.3/top_fuel.patch may be worth applying. PC Week used top_fuel in their recent benchmarks. (See also interesting comments by Dean Gaudet in linux-kernel and new-httpd.) According to some reports, mod_mmap_static and top_fuel.patch can reduce the number syscalls per request by reducing them from 18 to 9. - For static file benchmarks, try compiling mod_mmap_static into Apache (see http://www.apache.org/docs/mod/mod_mmap_static.html) and configuring Apache to memory-map the static documents, e.g. Make a configfile like this: find/www/htdocs. Type f -print */mmapfile.html and include mmap.conf to your Apache configuration file. Squid being used as a front end to Apache would speed up static page fetches according to several people.

Similar reading

- Usenet posts that show slowness in Apache or Linux: "Apache does not perform as fast as people believe?" ", 1999/04/05, comp.infosystems.www.servers.unix "...when we run WebBench to test the requests/sec and total throughput, Microsoft IIS 4 is 3 times faster for both Linux and Mac OS X." "Re: Apache vs IIS 4: IIS 4 3 times faster", 1999/04/02, comp.infosystems.www.servers.unix "Why are you surprised? It was well-known that Apache is slow. I haven’t tested IIS but I did compare Apache to a few other servers last year. I found some that were three to four times faster. There are several ways to profile the kernel. Kernel Spinlock Metering Linux IA32 is a tool to measure SMP contention. See also some test results comparing 2.2 to 2.3. An example of someone using spinlock measurement to find and fix kernel bottlenecks in 2.3.19. Andrea Arcangeli’s original announcement about ikd's kernel profiling patch gprof (original announcement) Ingo Mollar's Ktracer - for version 2.1.x. Example of ktracer output. Christoph Lameter's perfstat Patch, at Captech’s Linux Performance, Stability and Scalability Project. Also, see their 25 Oct 99 post about linuxperf. How to profile user program: - The old favourite: compile with the -pg.out with the gprof. Mikael Pettersson's x86 performance-monitoring counters patch. Supports 2.3.22 and 2.2.13. Includes a list of other related tools. Hardware performance counters with Linux by David Mentre - The Performance Counter Library -- Supports many architectures. Stephan Meyer's MSR Patch -- Only supports up to 2.2.6 No longer actively developed. Richard Gooch's MSR and PTC patch -- only supports 2.2. Requires devfs. A few linux kernel posts: "2.2.5 optimizations to web benchmarks?" ", 16 Apr 1999 -- Karthik Prabhakar, about to do serious SPECWeb96 benchmarking, asks the right questions. The followups are interesting. "Re: 2.2.5 optimizations for web benchmarks? Dean Gaudet's reply, 16 April 1999. An Apache insider offers some interesting insights. "[patch] new scheduler", 9 May 1999 -- the thread started by Rik van Riel about possible scheduler changes The smbtorture benchmark, which lets you test an SMB server like the big boys Rik van Riel's Linux Performance Tuning site The Linux Scalability Project The C10K problem - Why can't Johnny serve 10000 clients? Banga and Druschel's paper on web server benchmarking Linus's "State of Linux" talk at Usenix '99 where he talks about the Mindcraft benchmark and SMP scalability. my NT vs. Linux Server Benchmark Graphs page A post on comp.unix.bsd.freebsd.misc from June '99 which mentions that FreeBSD also has similar SMP scaling properties as Linux on tests like those run by Mindcraft. Mike Abbott of SGI's performance patches for Apache 1.3.9 Note: Apache 2.0 supports sendfile(), which ought to help its flat file performance.