Fastest way to share files between VM & Host?

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
I'm trying to rework my storage situation at the moment. My main home desktop has to be a Windows PC because I need Photoshop & Lightroom on it. Unfortunately, the storage management options in Windows have left me a bit disappointed.

What I'm thinking at the moment is that I'd like to have FreeNAS or Opensolaris using ZFS, in a VM, managing the storage and sharing it back to the Windows host at as fast a speed as possible. I used to have a separate file server, but performance, even over GbE was not satisfactory. I'm wondering if I can get faster performance by using local storage even with the overhead and the barrier of the VM.


I don't want to use Windows software RAID (which is what I've been doing since moving from the separate box shared over GbE), because:
1) No online capacity expansion
2) I'm getting to the number of drives and amount of data where I'm pretty much guaranteed an unrecoverable error on a RAID 5 rebuild. I'd like ZFS' checksumming.


Is there are a good way to do this? My understanding at the moment is that I'd be sharing files via CIFS to the host over a virtual network connection. These virtual network connections don't actually appear to be very fast.



With all that said, questions:

1) Is there a faster way?

2) If I'm stuck with mapping a shared drive over a virtual network connection can I expect decent performance? (i.e. much better than GbE? Worth all the trouble?)

3) Should I try to team a bunch of GbE NICs between the Windows box and a ZFS box? Can you even team NICs between Windows and a different OS?
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
Loading large image files is a pain. Lightroom really doesn't like the latency and available bandwidth of a network share. In fact, it would actually crash a lot in v2 (which was a known issue). v3 is better in this respect, but it's still a big PITA.

I moved the files to a software RAID on the main machine, but losing LVM snapshots was a pain. Now I've run out of space and can't expand it (the wonders of Windows Software RAID), so I've decided that I'd like to come up with a higher-performing, safer (snapshots, and checksumming specifically) setup.


Some bandwidth numbers I looked up several months ago seemed to indicate that the speed of a virtualized network connection between host and VM was actually slower than a wired GbE connection --at least with CIFS overhead thrown in anyway. Apparently KVM has made some strides recently but I'd like to see what my options are.

I have next to no experience with Virtualization, but I know several of the members here play with it on a regular basis. Picking brains as it were...
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
It's worth mentioning that I tried setting up some local space on Windows just for the files I'm working on at the moment, but keeping everything in sync and backing it up was a nightmare.

I'm kind of forgetful and end up with a lot of duplicates for fear of losing something. A massive amount of time cleaning house has made me realize everything should be in one place, and that place needs be as local and quick as possible.
 

LunarMist

I can't believe I'm a
Joined
Feb 1, 2003
Messages
15,268
Location
USA
It's worth mentioning that I tried setting up some local space on Windows just for the files I'm working on at the moment, but keeping everything in sync and backing it up was a nightmare.

I'm kind of forgetful and end up with a lot of duplicates for fear of losing something. A massive amount of time cleaning house has made me realize everything should be in one place, and that place needs be as local and quick as possible.
I gave up on LR long ago for some of those reasons.
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
I gave up on LR long ago for some of those reasons.
I'm glad someone else has noticed. It's actually shocking to me how difficult Lightroom makes it avoid duplicates and back things up. I use it because I'm not aware of anything better, but there a few simple improvements that would save me a lot of time.
 

LunarMist

I can't believe I'm a
Joined
Feb 1, 2003
Messages
15,268
Location
USA
I use Phostoshop for working on images and my brain for organization. I have a primitive computer with a bunch of local (SATA and eSATA) drives and 1:1 backups via file synch. I only do photography rarely now, ~15-25K per year, and don't need quick access for commercial purposes.
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,293
Location
USA
I was not able to get good performance out of FreeNAS when I tested it. My cap was around 45-55 MB/sec on sustained transfer of files when using the ZFS option. Using the exact same hardware I was able to get over 100 MB/sec using OpenFiler with their XFS filesystem. If you're experimenting, take a look at it as another option. it also supports NIC teaming right out of the box. I am still considering doing this and then put in a dual NIC into my system and pair it with an HP Procurve GigE switch. I'll separate traffic using VLANs. I did not try opensolaris so I can't comment on its performance.

If you're looking to locally virtualize so that you can gain functionality over the local file system, you may only be in the exact same situation you're in now but with more complexity. If you decide to go the route of a VMWare product (free or paid) the theoretical VM you'll be creating will storage the file system as one or more vmdk files on the local disk which will likely be NTFS under windows (or whatever you may use). you will still be at the mercy of the local filesystem size and expansion constraints. You can increase the size of a virtual drive at any time. If you decide to go with something like VMWare's ESXi, it will completely replace the OS and use it's own system. You'll still have limitations on the size of the virtualized disk based on the block size you select for the VMFS.

There is truth to overhead in virtualizing the network adapter. I know on the VMWare front, they've tried to optimize their own drivers for an OS by way of VMWare tools. The tool include drivers to help improve performance, but the act of virtualizing a network adapter does have overhead. I know Intel has been working on ways to improve this with their VT-c in some of the pro 1000 ET cards. I've never used them to compare any advantages it may offer.
 

Sol

Storage is cool
Joined
Feb 10, 2002
Messages
960
Location
Cardiff (Wales)
In theory using something like NFS over udp might work better for what you're doing. (It just seems like it should intuitively so it seems worth a go but I have nothing to really suggest it would be)

Would an eSata (Or even usb3) external raid enclosure help solve this problem? [this one would seem to suggest that it wouldn't be a cheap solution, and it wouldn't be substantially faster than a network, but it wouldn't be much slower either and it should have lower latency. (I didn't look in to it extensively so there may be better, cheaper options readily available)
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
I was not able to get good performance out of FreeNAS when I tested it. My cap was around 45-55 MB/sec on sustained transfer of files when using the ZFS option. Using the exact same hardware I was able to get over 100 MB/sec using OpenFiler with their XFS filesystem. If you're experimenting, take a look at it as another option.
Good to know. I used (still use) XFS on the Linux fileserver that used to handle these files. It's always performed tremendously well.


It also supports NIC teaming right out of the box. I am still considering doing this and then put in a dual NIC into my system and pair it with an HP Procurve GigE switch. I'll separate traffic using VLANs. I did not try opensolaris so I can't comment on its performance.
I don't think I can run the teamed NICs directly to teamed NICs in Windows though right? Windows doesn't support NIC teaming, or, if it does, it's a different protocol? I looked into this a while back and Linux to Linux was no problem, but Linux to Windows or BSD to Windows was a no go.


If you're looking to locally virtualize so that you can gain functionality over the local file system, you may only be in the exact same situation you're in now but with more complexity. If you decide to go the route of a VMWare product (free or paid) the theoretical VM you'll be creating will storage the file system as one or more vmdk files on the local disk which will likely be NTFS under windows (or whatever you may use). you will still be at the mercy of the local filesystem size and expansion constraints.
VirtualBox will do raw block device access and raw partition access. I can just assign a bunch of disks to the VM. At the moment, I'm planning to use VirtualBox, unless there's a compelling reason not to.


In theory using something like NFS over udp might work better for what you're doing. (It just seems like it should intuitively so it seems worth a go but I have nothing to really suggest it would be)
Last time I compared performance of NFS vs CIFS/SMB on Linux to Linux and to Windows CIFS/SMB was actually faster (~40MB/s over GbE vs ~30MB/s for NFS).

I think there's a free NFS driver for Windows, but is it going to give higher performance than CIFS?


Would an eSata (Or even usb3) external raid enclosure help solve this problem? [this one would seem to suggest that it wouldn't be a cheap solution, and it wouldn't be substantially faster than a network, but it wouldn't be much slower either and it should have lower latency. (I didn't look in to it extensively so there may be better, cheaper options readily available)
The main issue isn't physically connecting the drives to the box. I've got room for 15 drives in the case itself (it's an older CM Stacker).

The main problems I'm trying to solve, that Windows won't give me natively are:


1. Online capacity expansion: I can add new pools of storage to a ZFS or LVM volume based on RAID-Z or RAID 5, respectively. I can't to a Windows RAID5 volume so far as I know (other than JBOD, which wouldn't be so bad except for the other issues cited below).

2. Uptime: At the moment, for this data I'm using JBOD for local storage on the Windows box. If one disk goes down, the whole JBOD needs to be rebuilt, which will mean I'll have to carry my massive disk-filled tower to the off-site backup, connect them together and copy the data off. This is a PITA, and it's inevitably going to happen in the next year or two. I'd like to avoid this if possible by using parity.

3. Data Integrity: I'm at the point now where unrecoverable read errors are a statistical probability. I'd like to add checksumming to the filesystem to detect and recover from these errors. For several years, I ran a batch script that genererated md5sum's for every archived file, so I could check them after copies were made, compare backups etc. Then I decided that, since I could detect an error but wouldn't be able to ever fix it anyway, there wasn't a whole lot of point.

4. Speed: Already mentioned. Ideally I'd like local storage, but the host has to be Windows because Photoshop and Lightroom need native performance. Windows' dynamic disks can't deliver even the most basic requirements I have. This means, the storage has to be either: not local to the machine and served up via ethernet, or managed in a VM with raw block device access...



ZFS is the only filesystem at the moment that satisfies my uptime and data integrity wishes (I would much rather use something that runs on Linux). I know Butter FS is moving along, but it doesn't support parity and its still experimental.




The big questions for me at the moment are:

1) How much of a performance hit do I take sharing a filesystem via CIFS over a virtualized NIC from the host to the VM? Will it be faster than GbE or will I just complicate my life?

At first I assumed, since it's just shuffling bits in memory around it would be much faster than pumping it over a wire, but the more research I did, the more I began to doubt this assumption. I may just have to test it, but I figured maybe someone here might have had experience I can tap into. (Handruin's doesn't sound too encouraging, but was what I expected.)


2) Is there a better way than CIFS over the virtualized NIC? VirtualBox has VirtualBox Shared Folders. These work through an extension added into the OS of the VM. They do have support for OpenSolaris now. Anyone used this file sharing method? Do other virtualization environments have similar functionality that might get around the massive overhead of converting everything to TCP/IP packets and back just to talk to other software running in the same machine?
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
20,329
Location
I am omnipresent
Website
s-laker.org
The only comment I can make here is that I've likewise found NFS to be slower than SMB, regardless of implementation or platform. Windows, or at least Server 2008, does seem to be faster than Samba for file transfers.

NIC teaming on Windows is mostly a function of drivers rather than an OS feature. I sincerely doubt you'd get anywhere trying to make it work across a VM.

I suppose spending $900 on a pair of Intel AT2 10Gbps NICs is out of the question? :D
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
I suppose spending $900 on a pair of Intel AT2 10Gbps NICs is out of the question? :D
I looked up the prices of those yesterday, just in case ;) . $500-600 Canadian up here. I was considering it, but I can't devote those types of funds in the near future, and I need to expand my storage in the next couple weeks unfortunately. That said, the hard-core geek side of me kind of wants to try it. Ironically, if I spent all that money, Solaris isn't listed on the supported OSes lists! (Which seems a bit odd to be honest...)


Does anyone know if Dynamic Disks can be combined multiple times? I.e. a JBOD of RAID5's? I Googled but couldn't find out and don't have extra disks to test it on at the moment.

With that I could at least get expandibility with parity, if not checksumming.


I may end up just being stuck with a variation of my current situation. Some temporary storage on the Windows PC, the rest archived to a server. My life would be so much easier if I could just put it all in one box, on one volume, but that may not be practical given some of my concerns (namely expandibility, parity and checksums).
 

Sol

Storage is cool
Joined
Feb 10, 2002
Messages
960
Location
Cardiff (Wales)
Actually I got the basic aim of the project I just failed to mention how that product would help at all...

It has, expandable capacity (add a drive or replace a drive with a bigger drive) obviously you're not going to pack in as many drives as your stacker case but there are possibly larger units available.

It has some sort of data redundancy. I have no idea if it would do everything you want, but it will at least recover from drive failures.

It's directly attached via eSata so Windows should treat it as a native normal disk. I don't know if that actual device will keep up with that expectation but in theory it should be possible to find a device that could.

So it touches on most of your points. As I mentioned I don't know much about the particular products just that products exist in that space that might help.

You could also look in to iSCSI. It matches what you're trying to do better than SMB or NFS so assuming sanity, it should do it better right?
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
Actually I got the basic aim of the project I just failed to mention how that product would help at all...

It has, expandable capacity (add a drive or replace a drive with a bigger drive) obviously you're not going to pack in as many drives as your stacker case but there are possibly larger units available.
I see what you're getting at. If I bought and filled, say, a 5-drive bay at once, and added more boxes over time, I would have parity and expandibility. Because they appear to Windows as a single drive, I could expand the volume via JBOD.

That's certainly an option. I've always been a bit afraid of the cheapo hardware/software RAID in those boxes though. Despite that, it's certainly worth some thinking.
 

Gilbo

Storage is cool
Joined
Aug 19, 2004
Messages
742
Location
Ottawa, ON
You could also look in to iSCSI. It matches what you're trying to do better than SMB or NFS so assuming sanity, it should do it better right?
The problem with iSCSI is that the target exports direct block access to the device to the initiator. This means I'm stuck with initiator's OS, which, in this case, is Windows with all the shitty limitations for storage management it entails.

Even if I export disks in a Linux box to the initiator via iSCSI, I'm actually using Windows filesystems, and windows volume management to manage them. In this particular (albeit unusual) circumstance, this is actually the worst of both worlds.

If I remember correctly, I can export a logical volume via iSCSI, and get around the software RAID and volume management limitations of Windows, but I'm still using a Windows filesystem. This is better than nothing, because at least I get snapshots via LVM, and parity via Linux Software RAID. No checksumming still though.


Performance should be somewhat better though, without the overhead of the CIFS/SMB, or NFS protocols.

The more I think about it, the more I think I'll just move my fileserver to ZFS and either deal with the network latency and Lightroom's shitty performance, or I'll try my frankenstein, virtualized ZFS plan.
 

Sol

Storage is cool
Joined
Feb 10, 2002
Messages
960
Location
Cardiff (Wales)
If Apple hadn't shut down the OS X ZFS project then getting a Mac (Or more realistically just installing OS X on what you have) would have been the ideal solution (Apart from the cost and the fact that Apple are now evil etc.). Given the comparative state of ZFS on Linux and OS X, and your other options, it may still be a reasonable solution...

Gilbo said:
I see what you're getting at. If I bought and filled, say, a 5-drive bay at once, and added more boxes over time, I would have parity and expandibility. Because they appear to Windows as a single drive, I could expand the volume via JBOD.
Actually I just meant you could buy a 5 or 10 slot drive bay and keep sticking drives in it until you ran out of slots, and then pull out the smallest drive and stick a bigger one in. I figure you can jam 10TB in to a 5 bay device (which I guess is ~6-7TB with parity) and I didn't think you required more than that immediately. (The specific device I linked to supports that kind of ad-hoc expansion)

Ultimately the parity you get from that is no better than what you'd get by using iSCSI with a raided logical volume so it's probably not a lot of help.
 

Handruin

Administrator
Joined
Jan 13, 2002
Messages
13,293
Location
USA
I posted a little while back how I tried Gluster Storage OS. Maybe this would fit what you're trying to do? This would be a way to have elastic storage where you can add nodes over time.

I'm also beginning to research if the Ubuntu Cloud OS will offer a similar structure for storage as it has for compute resources. I'll let you know once I have more time to investigate it. if you could just grow storage without the headache of managing complex arrays and file systems, that seems ideal.
 

blakerwry

Storage? I am Storage!
Joined
Oct 12, 2002
Messages
4,203
Location
Kansas City, USA
Website
justblake.com
Did I miss something or is there a reason I do not see the suggestion that you should get a nice RAID card?

1. Online capacity expansion: check
2. Uptime: check
3. Data Integrity: check
4. Speed: check
 
Top