I have linux servers running apache trafficicserver and found around 1/20 requests for a few milliseconds, which are much longer than expected (should be 1-2 ms).
After tracking using systemtap, I found that the time spent in the linux kernel copy_to_user () call. (Accept-> inet_csk_accept-> move_addr_to_user-> copy_to_user)
Server loading ends at a low level (<100qps) and copy_to_user () only copies data from 16 bytes (struct sockaddr), but spends hundreds of milliseconds.
Since I no longer understand the systemtap method and the kernel method, I cannot investigate a further reason. I checked CPU usage, using swap
Thanks for any advice.
Equipment:
- CPU: Intel (R) Xeon (R) CPU E5-2640 v3 @ 2.60 GHz
- memory: 64G
- disk: 11 * 6T hdd
Software:
- centos6 2.6.32-696.el6.x86_64 # 1 SMP Tue Mar 21 19:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU / Linux
- apache traicserver 5.3.2 configure 10G ram
nginx 1.10.2, proxy server user request to a file server
// systemtap probe kernel.function("move_addr_to_user") { ts["move_addr_to_user", pid()] = gettimeofday_ms() } probe kernel.function("move_addr_to_user").return { printf("[%d]move_addr_to_user done:%d %s %d\n", gettimeofday_ms(), pid(), execname(), gettimeofday_ms()-ts["move_addr_to_user", pid()]) } probe kernel.function("copy_to_user") { ts["copy_to_user", pid()] = gettimeofday_ms() } probe kernel.function("copy_to_user").return { printf("[%d]copy_to_user done:%d %s %d %d\n", gettimeofday_ms(), pid(), execname(), gettimeofday_ms()-ts["copy_to_user", pid()], gettimeofday_ms()-ts["__copy_to_user", pid()]) } // output: [1495630190767] 16 16 move_addr_to_user done:24145 [ACCEPT 0:8080] 0 [1495630191164]copy_to_user done:24145 [ACCEPT 0:8080] 0 [1495630191164] 16 16 move_addr_to_user done:24145 [ACCEPT 0:8080] 0 [1495630192172]copy_to_user done:24145 [ACCEPT 0:8080] 861 [1495630192172] 16 16 move_addr_to_user done:24145 [ACCEPT 0:8080] 861 [1495630192173]copy_to_user done:24145 [ACCEPT 0:8080] 0 [1495630192173] 16 16 move_addr_to_user done:24145 [ACCEPT 0:8080] 0 [1495630192173]copy_to_user done:24145 [ACCEPT 0:8080] 0 [1495630192173] 16 16 move_addr_to_user done:24145 [ACCEPT 0:8080] 0 [1495630192173]copy_to_user done:24145 [ACCEPT 0:8080] 0 free -g total used free shared buffers cached Mem: 62 55 6 0 0 32 -/+ buffers/cache: 23 39 Swap: 15 0 15 top - 20:57:39 up 24 days, 19:26, 2 users, load average: 7.70, 9.43, 9.62 Tasks: 643 total, 1 running, 642 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 1.0%sy, 0.0%ni, 97.5%id, 1.1%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 65560992k total, 58525192k used, 7035800k free, 365084k buffers Swap: 16777212k total, 0k used, 16777212k free, 33957572k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24145 traffics 20 0 21.7g 12g 6212 S 24.7 19.3 212:42.85 [ET_NET 0] 22173 root 20 0 677m 325m 1180 S 3.6 0.5 0:41.10 nginx 22161 root 20 0 677m 325m 1184 S 2.6 0.5 0:47.50 nginx 22168 root 20 0 677m 326m 2076 S 2.6 0.5 0:28.31 nginx 22150 root 20 0 677m 325m 1208 S 1.6 0.5 0:42.75 nginx 22165 root 20 0 677m 325m 1200 S 1.6 0.5 0:31.77 nginx
Update:
@employee of the month, thank you for your advice, after adding some probe to __do_page_fault, I found that the time was spent on __do_page_fault β down_read (& mm-> mmap_sem);
[1495677639202]copy_to_user done:24145 [ACCEPT 0:8080] 1 [1495677639202] 16 16 move_addr_to_user done:24145 [ACCEPT 0:8080] 1 [1495677639314]copy_to_user done:24145 [ACCEPT 0:8080] 0 [1495677639314] 16 16 move_addr_to_user done:24145 [ACCEPT 0:8080] 0 [1495677641329]do page fault done:24145 [ACCEPT 0:8080] 622 [1495677641329]copy_to_user done:24145 [ACCEPT 0:8080] 622 [1495677641329] 16 16 move_addr_to_user done:24145 [ACCEPT 0:8080] 622
@Ricardo Biehl Pasquali thanks for the advice. Apache traicserver reads / writes to hdd in separate streams. traicserver has 1 thread for receiving connections, 88 (8 threads on hdd) threads for blocking read / write (cache contents), but I donβt understand why blocking read / write in other streams can lead to high __do_page_fault () latency.
Device: rrqm/s wrqm/sr/sw/s rsec/s wsec/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 16.00 0.00 4.00 0.00 160.00 40.00 0.05 13.50 0.00 13.50 7.75 3.10 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 0.00 0.00 6.00 0.00 4326.00 0.00 721.00 0.06 10.50 10.50 0.00 6.83 4.10 sdc 0.00 0.00 2.00 0.00 1472.00 0.00 736.00 0.04 18.50 18.50 0.00 9.50 1.90 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 14.00 0.00 10464.00 0.00 747.43 0.04 2.71 2.71 0.00 1.79 2.50
Update:
The problem is solved! The root cause is the periodic call system trafficyserver () to back up the file in the ancient plugin that we wrote.
source share