Slow Samba writes

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
A client has a Suse 9.0 server running Samba 2 on ReiserFS. Workstations are Win2k and connected via 100Mbs.

Copying an 86MB file from the server to a workstation takes about 8 seconds (~10MBs). Copying the same file in the other direction takes a whopping 410 seconds (0.21MBs), i.e. 50 times slower!
  • I tried a different workstation - same result.
  • QCheck transfers 100kB from an old workstation to the server at 80Mbps.
  • Hdparm confirms options such as multiple-sector reads, 32-bit I/O, read ahead, etc are enabled
  • Hdparm says the server disk can supply data at 35MBs (it's only a 20GB).
  • I restarted Samba - same result.
  • I can't see any difference between the settings on this server and on another one that works as expected.
  • I googled a post by one person with the same problem, but there were no replies. :cry:
Help!
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
22,297
Location
I am omnipresent
There are lots and lots of reasons samba might be performing poorly. Have you googled for and read about Samba performance tuning?
 

Fushigi

Storage Is My Life
Joined
Jan 23, 2002
Messages
2,890
Location
Illinois, USA
I've absolutely no experience with Linux to date. But, in general:

- Is write caching disabled on the server?
- Is the server's RAM usage high / is it busy managing a swap file?
- I'm assuming no RAID since you say it's a 20GB drive & not some kind of array. Otherwise I'd mention RAID5 overhead.
- How's the network? Hub or switch? If hub, any collisions during the transfer? Half/full duplex setting (for both workstation & server)?
 

blakerwry

Storage? I am Storage!
Joined
Oct 12, 2002
Messages
4,203
Location
Kansas City, USA
Website
justblake.com
I assume you have tried more than 1 workstation so that you have isolated the issue to something with the server?


I would try running a HDD benchmark that measures writes. Something like the following:

Code:
[user@server]$time dd if=/dev/zero of=/home/user/testFile bs=1M count=100

This will benchmark how long it will take to write 100MB to the HDD, divide 100MB by the "real time" reported at the end of this process to see an estimate of the sustained sequential write speed of the HDD.


100MB, isn't all that big of a file, so you might want to bump it up to 500 or 1000 MB if you have the space and want more accurate results.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Mercutio said:
Have you googled for and read about Samba performance tuning?
Yeah, but AFAIK you're talking percentages - not 3 times an order of magnitude or more?

Fushigi said:
- Is write caching disabled on the server?
- Is the server's RAM usage high / is it busy managing a swap file?
- I'm assuming no RAID since you say it's a 20GB drive & not some kind of array. Otherwise I'd mention RAID5 overhead.
- How's the network? Hub or switch? If hub, any collisions during the transfer? Half/full duplex setting (for both workstation & server)?
- Being an ATA drive, I'd expect drive write caching to be enabled (I've never disabled it on this box). But see my response to Merc.
- It has 512MB RAM and was idle (after hours) every time I've tested it.
- No, no RAID. I've also tried to copy the 86MB file just using the server desktop. It appears to complete instantly, but that might be a quirk of VNC. Anyone know of a Linux-based disk benchmark? Bonnie++ didn't want to play when I tried it.
- Most of the LAN is on a single switch, but the server connects through a second switch built into a router. There's also a workstation plugged into the router-switch (thereby bypassing the main switch) that displays the same performance. However, I'm relying on the fact that I can QCheck to the server at >80Mbps, and obviously the outbound transfers are full speed. I tried to sustain a stream, but QCheck tops out at 1Mbps for that test.

I am wondering if the onboard Realtek 8169 is somehow choking on sustained inbound traffic. Any ideas on how to find out?
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
22,297
Location
I am omnipresent
I have no idea what would make it THAT slow. I've had to delete lock files in the past, but this clearly isn't that problem.

It's happening on multiple clients, but did you try a transfer from a client in "Safe Mode with Networking" to rule out something running on both the client machines?

Drivers are a possible culprit, and certainly some people have said that the RTL8139 is kind of crummy NIC, but I don't know of any particular issues with it in Linux. Does your other server use the same card?
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
Mercutio said:
It's happening on multiple clients, but did you try a transfer from a client in "Safe Mode with Networking" to rule out something running on both the client machines?
Difficult - I'm doing all this remotely. However, I haven't noticed anything stealing cycles on any of the three workstations I've tried (I was watching Task Manager when the problem first became obvious).

Drivers are a possible culprit, and certainly some people have said that the RTL8139 is kind of crummy NIC, but I don't know of any particular issues with it in Linux. Does your other server use the same card?
This one is an RTL-8169 (GbE) - the other is an RTL-8139 (I think).

You're gonna love the fact that the slow box is an nForce2 whereas the good one is a KT600 ... ;)
 

blakerwry

Storage? I am Storage!
Joined
Oct 12, 2002
Messages
4,203
Location
Kansas City, USA
Website
justblake.com
I have an 8169 on my linux server... should have no problems with 20MB/sec, wouldnt be surprised if you can get upto 40MB/sec.

Please try my disk benchmark and let me know the results.

-Blake
 

The JoJo

Wannabe Storage Freak
Joined
Jan 25, 2002
Messages
1,490
Location
Finland, Turku
Website
www.thejojo.com
How does this line in your smb.conf file read?
Here's mine:
socket options = TCP_NODELAY SO_RCVBUF=16384 SO_SNDBUF=16384

Have you tried bonnie for benchmarking, or hdparm? The results from that dd would be very informative, please try that.

What kernel and driver are you using for that RTL?
To verify the network speed is ok and that it's more samba related, you could install netperf on the win2k and linux machine, and test the network speed in both directions. This could rule out the RTL from the problem.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
blakerwry said:
Please try my disk benchmark and let me know the results.
Sorry Blake, I took so long to finalize my post I didn't see your post until later.

What did you expect $time to do? It didn't do anything for me, so I ended up incorporating the dd command in a script with a call to date before and after.

Elapsed time was one second.

I decided it might be more meaningful if the source wasn't a virtual device, so I passed the resulting 100MB file through the dd tool and came up with six seconds. Allowing for the reads, that sounds like up to 30MBs to me (with the OS write cache).

I can also confirm that ifconfig reports zero bad packets or overruns, and Samba CPU utilization doesn't seem to be significant when copying a file.

TCP_NODELAY is set in my config file, but I haven't changed NO_RCVBUF etc. Isn't 16384 actually above the optimum point?

I tried more tests with QCheck, transferring 1MB at a time to the server, and it hit around 86Mbps. I'd still like to see a bigger transfer, but I admit this angle is looking weak. I tried mii-tool, but it didn't recognize the network adaptor.
 

blakerwry

Storage? I am Storage!
Joined
Oct 12, 2002
Messages
4,203
Location
Kansas City, USA
Website
justblake.com
srry, the command does not have a dollar sign infront, that was a symbol of the default linux prompt. The time command times the command immediately following.


The input file (if=/dev/zero) is virtual, so there are no reads. Only writes (to the location specified by the output file, of=)


If it took 6 seconds to write 100MB, then you are getting about 16MB/sec.... doesn't sound great.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
blakerwry said:
If it took 6 seconds to write 100MB, then you are getting about 16MB/sec.... doesn't sound great.
No, the 6 seconds (about 1 second according to the 'time' function) was reading and writing an actual file, so write speed works out at closer to 30MB/s. With the virtual file as the source, the time was far less than a second (meaningless since it's writing to cache anyway). I'm afraid the benchmark looks a little unreliable.

BTW, there was nothing useful in the Samba log. I also set up a share on the workstation that shares the router with the server, and it accepted uploads from another workstation at expected speeds.

Anyway, after much trawling, and exploring the misleading solutions that that threw up, I tried changing the speed/duplex settings - of a workstation (I couldn't find how to do it for the server). 100Mbps half duplex and 10Mbps full duplex didn't help, but I hit the jackpot with 10Mbps half duplex.

Uploading 86MB to the server took <80 seconds, which works out at >1MB/s, or 5 times faster than the result with full speed 100Mbps! So, I'm inclined to think that my suspicions that the Realtek 8169 is being overrun are most likely correct - comments?

There's a mention on the Net that some people had problems with the RTL Linux driver (hanging, I think) and there is some sort of patch out there, although I don't now if it's compatible with my older 2.4.* kernel. Any observations from your setup, Blake?
 

Buck

Storage? I am Storage!
Joined
Feb 22, 2002
Messages
4,514
Location
Blurry.
Website
www.hlmcompany.com
time said:
Anyway, after much trawling, and exploring the misleading solutions that that threw up, I tried changing the speed/duplex settings - of a workstation (I couldn't find how to do it for the server). 100Mbps half duplex and 10Mbps full duplex didn't help, but I hit the jackpot with 10Mbps half duplex.

I had a full duplex/half duplex problem with a Windows NT Server that was resolved through hardware. The server used a Netgear 311 NIC and was connecting to a Cisco router in someone’s data center. The problem was resolved when I put a Netgear switch in between the Netgear NIC and Cisco router. The older Cisco router was unable to automatically negotiate the appropriate speeds for the network card.
 

blakerwry

Storage? I am Storage!
Joined
Oct 12, 2002
Messages
4,203
Location
Kansas City, USA
Website
justblake.com
time said:
blakerwry said:
If it took 6 seconds to write 100MB, then you are getting about 16MB/sec.... doesn't sound great.
No, the 6 seconds (about 1 second according to the 'time' function) was reading and writing an actual file, so write speed works out at closer to 30MB/s. With the virtual file as the source, the time was far less than a second (meaningless since it's writing to cache anyway). I'm afraid the benchmark looks a little unreliable.

BTW, there was nothing useful in the Samba log. I also set up a share on the workstation that shares the router with the server, and it accepted uploads from another workstation at expected speeds.

Anyway, after much trawling, and exploring the misleading solutions that that threw up, I tried changing the speed/duplex settings - of a workstation (I couldn't find how to do it for the server). 100Mbps half duplex and 10Mbps full duplex didn't help, but I hit the jackpot with 10Mbps half duplex.

Uploading 86MB to the server took <80 seconds, which works out at >1MB/s, or 5 times faster than the result with full speed 100Mbps! So, I'm inclined to think that my suspicions that the Realtek 8169 is being overrun are most likely correct - comments?

There's a mention on the Net that some people had problems with the RTL Linux driver (hanging, I think) and there is some sort of patch out there, although I don't now if it's compatible with my older 2.4.* kernel. Any observations from your setup, Blake?


Ah, if it writes the file un under a second you're not going to get reliable results.. cache as you saw, was interfearing. Try a 1000MB file.

For the NIC, like I said, there should be no reason that you would not be able to get atelast 20MB/sec out of that NIC if it is running at 1Gbps

I am using a 2.6 kernel and did experience the hanging, however, upgrading to a newer point release kernel resolved the issue. Unfortunately, the 8169 drivers included w/ the kernel seem to change with every 2.6 point release, not always for the better.

The hanging I have experienced, is always a hard lock. The machine will not accept any keyboard input and all ethernet interfaces are dead. Resetting the motherboard is the only way to reboot it.

I was not aware that drivers for this NIC had been back built into the 2.4 kernel. However, the linux drivers from Realtek's website work just fine.
 

blakerwry

Storage? I am Storage!
Joined
Oct 12, 2002
Messages
4,203
Location
Kansas City, USA
Website
justblake.com
Looks like I'm using a 2.6.8 kernel and it has worked great, previous and post kernels are a mixed bag. However, my experience shows the NIC performs at roughly the same speed dispite the kernel version.



Since your workstations are conencted at 100Mbps, do you have a gbit switch or a switch that offers a 1Gbps uplink? if not, you could always test a 100Mbps NIC in the server.


Unfortunately the NIC driver for the 8169 is still somewhat immature, so tools like 'mii-tool' and 'ethtool' do not work to switch your server between ethernet modes or try more advanced options.


For reference, you think you could paste the output of the ifconfig command here? along with the benchmarks from my hdd write test using larger files?
 
Top