Homelab upgrade - "Modern" alternatives to NFS, SSHFS?

MajorSauce@sh.itjust.works · edit-2 5 months ago

Homelab upgrade - "Modern" alternatives to NFS, SSHFS?

just_another_person@lemmy.world · 5 months ago

“Boring”? I’d be more interested in what works without causing problems. NFS is bulletproof.

486@lemmy.world · 5 months ago

NFS is bulletproof.

For it to be bulletproof, it would help if it came with security built in. Kerberos is a complex mess.

Trincapinones@lemmy.dbzer0.com · 5 months ago

Yeah, I’ve ended up setting up VLANS in order to not deal with encryption

MajorSauce@sh.itjust.works · 5 months ago

You are 100% right, I meant for the homelab as a whole. I do it for self-hosting purposes, but the journey is a hobby of mine.

So exploring more experimental technologies would be a plus for me.

just_another_person@lemmy.world · 5 months ago

Most of the things you listed require some very specific constraints to even work, let alone work well. If you’re working with just a few machines, no storage array or high bandwidth networking, I’d just stick with NFS.

mitchty@lemmy.sdf.org · 5 months ago

As a recently former hpc/supercomputer dork nfs scales really well. All this talk of encryption etc is weird you normally just do that at the link layer if you’re worried about security between systems. That and v4 to reduce some metadata chattiness and gtg. I’ve tried scaling ceph and s3 for latency on 100/200g links. By far NFS is easier than all the rest to scale. For a homelab? NFS and call it a day, all the clustering file systems will make you do a lot more work than just throwing hard into your nfs mount options and letting clients block io while you reboot. Which for home is probably easiest.

just_another_person@lemmy.world · 5 months ago

I agree as well. No reason to not use it. If there were better ways to build an alternative, one would exist.

Possibly linux@lemmy.zip · 5 months ago

What’s wrong with NFS? It is performant and simple.

forbiddenlake@lemmy.world · 5 months ago

By default, unencrypted, and unauthenticated, and permissions rely on IDs the client can fake.

May or may not be a problem in practice, one should think about their personal threat model.

Mine are read only and unauthenticated because they’re just media files, but I did add unneeded encryption via ktls because it wasn’t too hard to add (I already had a valid certificate to reuse)

Possibly linux@lemmy.zip · 5 months ago

NFS is good for hypervisor level storage. If someone compromises the host system you are in trouble.

486@lemmy.world · 5 months ago

If someone compromises the host system you are in trouble.

Not only the host. You have to trust every client to behave, as @forbiddenlake already mentioned, NFS relies on IDs that clients can easily fake to pretend they are someone else. Without rolling out all the Kerberos stuff, there really is no security when it comes to NFS.

Possibly linux@lemmy.zip · 5 months ago

You misunderstand. The hypervisor is the client. Stuff higher in the stack only sees raw storage. (By hypervisors I also mean docker and kubernetes) From a security perspective you just set an IP allow list

486@lemmy.world · 5 months ago

Sure, if you have exactly one client that can access the server and you can ensure physical security of the actual network, I suppose it is fine. Still, those are some severe limitations and show how limited the ancient NFS protocol is, even in version 4.

vext01@lemmy.sdf.org · 5 months ago

NFS is fine if you can lock it down at the network level, but otherwise it’s Not For Security.

Appoxo@lemmy.dbzer0.com · edit-2 5 months ago

NFS + Kerberos?

But everything I read about NFS and so on: You deploy it on a dedicated storage LAN and not in your usual networking LAN.

vext01@lemmy.sdf.org · 5 months ago

I tried it once. NFSv4 isn’t simple like NFSv3 is. Fewer systems support it too.

linuxguy@lemmy.gregw.us · 5 months ago

Gotta agree. Even better if backed by zfs.

bluGill@fedia.io · 5 months ago

It is a pain to figure out how to give everyone the same user id. I only have a couple computers at home. I’ve never figured out how to make LDAP work (including laptops which might not have network access when I’m on the road). Worse some systems start with userid 1000, some 1001. NFS is a real mess - but I use it because I haven’t found anything better for unix.

scumola@sh.itjust.works · 5 months ago

I’d only use sshfs if there’s no other alternative. Like if you had to copy over a slow internet link and sync wasn’t available.

NFS is fine for local network filesystems. I use it everywhere and it’s great. Learn to use autos and NFS is just automatic everywhere you need it.

azron@lemmy.ml · 5 months ago

*autofs

ShortN0te@lemmy.ml · 5 months ago

sshfs is somewhat unmaintained, only “high-impact issues” are being addressed https://github.com/libfuse/sshfs

I would go for NFS.

chtk@feddit.nl · 5 months ago

And if you need to mount a directory over SSH, I can recommend rclone and its mount subcommand.

Appoxo@lemmy.dbzer0.com · 5 months ago

But NFS has mediocre snapshotting capabilities (unless his setup also includes >10g nics)

ShortN0te@lemmy.ml · 5 months ago

I assume you are referring to Filesystem Snapshotting? For what reason do you want to do that on the client and not on the FS host?

Appoxo@lemmy.dbzer0.com · 5 months ago

I have my NFS storage mounted via 2.5G and use qcow2 disks. It is slow to snapshot…

Maybe I understand your question wrong?

tal@lemmy.today · 5 months ago

NFS doesn’t do snapshotting, which is what I assumed that you meant and I’d guess ShortN0te also assumed.

If you’re talking about qcow2 snapshots, that happens at the qcow2 level. NFS doesn’t have any idea that qemu is doing a snapshot operation.

On a related note: if you are invoking a VM using a filesystem images stored on an NFS mount, I would be careful, unless you are absolutely certain that this is safe for the version of NFS and the specific caching options for both NFS and qemu that you are using.

I’ve tried to take a quick look. There’s a large stack involved, and I’m only looking at it quickly.

To avoid data loss via power loss, filesystems – and thus the filesystem images backing VMs using filesystems – require write ordering to be maintained. That is, they need to have the ability to do a write and have it go to actual, nonvolatile storage prior to any subsequent writes.

At a hard disk protocol level, like for SCSI, there are BARRIER operations. These don’t force something to disk immediately, but they do guarantee that all writes prior to the BARRIER are on nonvolatile storage prior to writes subsequent to it.

I don’t believe that Linux has any userspace way for an process to request a write barrier. There is not an fwritebarrier() call. This means that the only way to impose write ordering is to call fsync()/sync() or use similar-such operations. These force data to nonvolatile storage, and do not return until it is there. The downside is that this is slow. Programs that are frequently doing such synchronizations cannot issue writes very quickly, and are very sensitive to latency to their nonvolatile storage.

From the qemu(1) man page:

         By  default, the cache.writeback=on mode is used. It will report data writes as completed as soon as the data is
       present in the host page cache. This is safe as long as your guest OS makes sure to correctly flush disk  caches
         where  needed.  If  your  guest OS does not handle volatile disk write caches correctly and your host crashes or
         loses power, then the guest may experience data corruption.

         For such guests, you should consider using cache.writeback=off.  This means that the host  page  cache  will  be
         used  to  read and write data, but write notification will be sent to the guest only after QEMU has made sure to
         flush each write to the disk. Be aware that this has a major impact on performance.

I’m fairly sure that this is a rather larger red flag than it might appear, if one simply assumes that Linux must be doing things “correctly”.

Linux doesn’t guarantee that a write to position A goes to disk prior to a write to position B. That means that if your machine crashes or loses power, with the default settings, even for drive images sorted on a filesystem on a local host, with default you can potentially corrupt a filesystem image.

https://docs.kernel.org/block/blk-mq.html

Note

Neither the block layer nor the device protocols guarantee the order of completion of requests. This must be handled by higher layers, like the filesystem.

POSIX does not guarantee that write() operations to different locations in a file are ordered.

https://stackoverflow.com/questions/7463925/guarantees-of-order-of-the-operations-on-file

So by default – which is what you might be doing, wittingly or unwittingly – if you’re using a disk image on a filesystem, qemu simply doesn’t care about write ordering to nonvolatile storage. It does writes. it does not care about the order in which they hit the disk. It is not calling fsync() or using analogous functionality (like O_DIRECT).

NFS entering the picture complicates this further.

https://www.man7.org/linux/man-pages/man5/nfs.5.html

The sync mount option The NFS client treats the sync mount option differently than some other file systems (refer to mount(8) for a description of the generic sync and async mount options). If neither sync nor async is specified (or if the async option is specified), the NFS client delays sending application writes to the server until any of these events occur:

         Memory pressure forces reclamation of system memory
         resources.

         An application flushes file data explicitly with sync(2),
         msync(2), or fsync(3).

         An application closes a file with close(2).

         The file is locked/unlocked via fcntl(2).

  In other words, under normal circumstances, data written by an
  application may not immediately appear on the server that hosts
  the file.

  If the sync option is specified on a mount point, any system call
  that writes data to files on that mount point causes that data to
  be flushed to the server before the system call returns control to
  user space.  This provides greater data cache coherence among
  clients, but at a significant performance cost.

  Applications can use the O_SYNC open flag to force application
  writes to individual files to go to the server immediately without
  the use of the sync mount option.

So, strictly-speaking, this doesn’t make any guarantees about what NFS does. It says that it’s fine for the NFS client to send nothing to the server at all on write(). The only time a write() to a file makes it to the server, if you’re using the default NFS mount options. If it’s not going to the server, it definitely cannot be flushed to nonvolatile storage.

Now, I don’t know this for a fact – would have to go digging around in the NFS client you’re using. But it would be compatible with the guarantees listed, and I’d guess that probably, the NFS client isn’t keeping a log of all the write()s and then replaying them in order. If it did so, for it to meaningfully affect what’s on nonvolatile storage, the NFS server would have to fsync() the file after each write being flushed to nonvolatile storage. Instead, it’s probably just keeping a list of dirty data in the file, and then flushing it to the NFS server at close().

That is, say you have a program that opens a file filled with all ‘0’ characters, and does:

write ‘1’ to position 1.
write ‘1’ to position 5000.
write ‘2’ to position 1.
write ‘2’ to position 5000.

At close() time, the NFS client probably doesn’t flush “1” to position 1, then “1” to position 5000, then “2” to position 1, then “2” to position 5000. It’s probably just flushing “2” to position 1, and then “2” to position 5000, because when you close the file, that’s what’s in the list of dirty data in the file.

The thing is that unless the NFS client retains a log of all those write operations, there’s no way to send the writes to the server in a way that avoid putting the file into a corrupt state if power is lost. It doesn’t matter whether it writes the “2” at position 1 or the “2” at position 5000. In either case, it’s creating a situation where, for a moment, one of those two positions has a “0”, and the other has a “2”. If there’s a failure at that point – the server loses power, the network connection is severed – that’s the state in which the file winds up in. That’s a state that is inconsistent, should never have arisen. And if the file is a filesystem image, then the filesystem might be corrupt.

So I’d guess that at both of those two points in the stack – the NFS client writing data to the server, and the server block device scheduler, permit inconsistent state if there’s no fsync()/sync()/etc being issued, which appears to be the default behavior for qemu. And running on NFS probably creates a larger window for a failure to induce corruption.

It’s possible that using qemu’s iSCSI backend avoids this issue, assuming that the iSCSI target avoids reordering. That’d avoid qemu going through the NFS layer.

I’m not going to dig further into this at the moment. I might be incorrect. But I felt that I should at least mention it, since filesystem images on NFS sounded a bit worrying.

Appoxo@lemmy.dbzer0.com · edit-2 5 months ago

Thanks for digging this “shallow” (lol. What you dug up is equal to my senior technician explaining the full tech stack of a client).
Anyway, I host the system disk on local disk and NFS storage acts as the mass storage for my VMs like my media server for jellyfin).
And I also do daily backups with Veeam Backup and Replication of both my most important files of my media server and the important VMs.
So in case of a data failure it should be more or less fine.

Wouldnt the sync option also confirm that every write also arrived on the disk?
Because I did mount the NFS (Storage host: TrueNAS, Hypervisor: Proxmox) in sync mode.

tal@lemmy.today · edit-2 5 months ago

Wouldnt the sync option also confirm that every write also arrived on the disk?

If you’re mounting with the NFS sync option, that’ll avoid the “wait until close and probably reorder writes at the NFS layer” issue I mentioned, so that’d address one of the two issues, and the one that’s specific to NFS.

That’ll force each write to go, in order, to the NFS server, which I’d expect would avoid problems with the network connection being lost while flushing deferred writes. I don’t think that it actually forces it to nonvolatile storage on the server at that time, so if the server loses power, that could still be an issue, but that’s the same problem one would get when running with a local filesystem image with the “less-safe” options for qemu and the client machine loses power.

ShortN0te@lemmy.ml · 5 months ago

If i understand you correctly, your Server is accessing the VM disk images via a NFS share?

That does not sound efficient at all.

Appoxo@lemmy.dbzer0.com · 5 months ago

No other easy option I figured out.
Didnt manage to understand iSCSI in the time I was patient with it and was desperate to finish the project and use my stuff.
Thus NFS.

nesc@lemmy.cafe · 5 months ago

Gluster is ~~shit~~ really bad, garage and minio are great. If you want something tested and insanely powerful go with ceph, it has everything. Garage is fine for smaller installations, and it’s very new and not that stable yet.

Possibly linux@lemmy.zip · 5 months ago

Ceph isn’t something you want to jump into without research

corsicanguppy@lemmy.ca · 5 months ago

go with ceph[:] it has everything

I heard running an object store as a filesystem was considered risky, but that’s not why it sometimes hoses your storage.

nesc@lemmy.cafe · 5 months ago

Last time I had a problem with ceph losing data was during 0.10, does it still happen?

MajorSauce@sh.itjust.works · 5 months ago

Darn, Garage is the only one that I successfully deployed a test cluster.

I will dive more carefully into Ceph, the documentation is a bit heavy, but if the effort is worth it…

Thanks.

nesc@lemmy.cafe · 5 months ago

I had great experience with garage at first, but it crapped itself after a month, it was like half a year ago and the problem was fixed, still left me with a bit of anxiety.

Possibly linux@lemmy.zip · 5 months ago

You need to know what you are doing with Ceph. It can scale to Exobyte levels but you need to do it right.

non_burglar@lemmy.world · 5 months ago

Your workload just won’t see much difference with any of them, so take your pick.

NFS is old, but if you add security constraints, it works really well. If you want to tune for bandwidth, try iSCSI , bonus points if you get zfs-over-iSCSI working with tuned block size. This last one is blazing fast if you have zfs at each and you do Zfs snapshots.

Beyond that, you’re getting into very tuned SAN things, which people build their careers on, its a real rabbit hole.

Possibly linux@lemmy.zip · 5 months ago

NFS with security does harm performance. For raw throughput it is best to use no encryption. Instead, use physical security.

non_burglar@lemmy.world · 5 months ago

I don’t know what you’re on about, I’m talking about segregating with vlans and firewall.

If you’re encrypting your San connection, your architecture is wrong.

Possibly linux@lemmy.zip · 5 months ago

That’s what I though you were saying

non_burglar@lemmy.world · 5 months ago

Oh, OK. I should have elaborated.

Yes, agreed. It’s so difficult to secure NFS that it’s best to treat it like a local connection and just lock it right down, physically and logically.

When i can, I use iscsi, but tuned NFS is almost as fast. I have a much higher workload than op, and i still am unable to bottleneck.

Possibly linux@lemmy.zip · 5 months ago

Have you ever used NFS in a larger production environment? Many companies coming from VMware have expensive SAN systems and Proxmox doesn’t have great support for iscsi

non_burglar@lemmy.world · 5 months ago

Yes, i have. Same security principles in 2005 as today.

Proxmox iscsi support is fine.

Possibly linux@lemmy.zip · 5 months ago

It really isn’t.

You can’t automatically create new disks with the create new VM wizard.

Also I hope you aren’t using the same security principals as 2005. The landscape has evolved immensity.

PunkiBas@lemmy.world · 5 months ago

I’m using ceph on my proxmox cluster but only for the server data, all my jellyfin media goes into a separate NAS using NFS as it doesn’t really need the high availability and everything else that comes with ceph.

It’s been working great, You can set everything up through the Proxmox GUI and it’ll show up as any other storage for the VMs. You need enterprise grade NVMEs for it though or it’ll chew through them in no time. Also a separate network connection for ceph traffic if you’re moving a lot of data.

Very happy with this setup.

LiPoly · 5 months ago

If you want to try something that’s quite new and mostly unexplored, look into NVMe over TCP. I really like the concept, but it appears to be too new to be production ready. Might be a good fit for your adventurous endeavors.

gloriousspearfish@feddit.dk · 5 months ago

This is just block device over network, it will not allow the use cases OP is asking for. You will still need a filesystem and a file-serving service on top of that.

LiPoly · 5 months ago

I agree, but it’s clear that OP doesn’t want a real solution, because those apparently are boring. Instead, they want to try something new. NVMe/TCP is something new. And it still allows for having VMs on one system and storage on another, so it’s not entirely off topic.

Naomi Amethyst 🏳️‍⚧️@lemmy.amethyst.name · 5 months ago

I use Ceph/CephFS myself for my own 671TiB array (382TiB raw used, 252TiB-ish data stored) – I find it a much more robust and better architected solution than Gluster. It supports distributed block devices (RBD), filesystems (CephFS), and object storage (RGW). NFS is pretty solid though for basic remote mounting filesystems.

Jeena@piefed.jeena.net · 5 months ago

I think you will need to have a mix, not everything is S3 compatible.

But I also like S3 quite a lot.

MajorSauce@sh.itjust.works · 5 months ago

I think I am on the same page.

I will provably keep Plex/Stash out of S3, but Nextckoud could be worth it? (1TB with lots of documents and medias).

How would you go for Plex/Stash storage?

Keeping it as a LVM in Proxmox?

Monkey With A Shell@lemmy.socdojo.com · 5 months ago

I’ve used MinIO as the object store on both Lemmy and Mastodon, and in retrospect I wonder why. Unless you have clustered servers and a lot of data to move it’s really just adding complexity for the sake of complexity. I find that the bigger gains come from things like creating bonded network channels and sorting out a good balance in the disk layout to keep your I/O in check.

Xanza@lemm.ee · 5 months ago

I preach this to people everywhere I go and seldom do they listen. There’s no reason for object storage for a non-enterprise environment. Using it in homelabs is just…mostly insane…

Monkey With A Shell@lemmy.socdojo.com · 5 months ago

Generally yes, but it can be useful as a learning thing. A lot of my homelab use is for purposes of practicing with different techs in a setting where if it melts down it’s just your stuff. At work they tend to take offense of you break prod.

lambalicious@lemmy.sdf.org · 5 months ago

Fam, the modern alternative to SSHFS is literally SSHFS.

All that said, if your use case is mostly downloading and uploading files but not moving them between remotes, then overlaying webdav on whatever you feel comfy on (and that’s already what eg.: Nexctloud does, IIRC) should serve well.

catloaf@lemm.ee · 5 months ago

What are you hosting the storage on? Are you providing this storage to apps, containers, VMs, proxmox, your desktop/laptop/phone?

MajorSauce@sh.itjust.works · 5 months ago

Currently, most of the data in on a bare-metal TrueNAS.

Since the nodes will come with each 32TB of storage, this would be plenty for the foreseeable future (currently only using 20TB across everything).

The data should be available to Proxmox VMs (for their disk images) and selfhosted apps (mainly Nextcloud and Arr apps).

A bonus would be to have a quick/easy way to “mount” some volume to a Linux Desktop to do some file management.

catloaf@lemm.ee · 5 months ago

Proxmox supports ceph natively, and you can mount it from a workstation too, I think. I assume it operates in a shared mode, unlike iscsi.

If the apps are running on a VM in proxmox, then the underlying storage doesn’t matter to them.

NFS is probably the most mature option, but I don’t know if proxmox officially supports it.

MajorSauce@sh.itjust.works · 5 months ago

Proxmox does support NFS

But let’s say that I would like to decommission my TrueNAS and thus having the storage exclusively on the 3-node server, how would I interlay Proxmox+Storage?

(Much appreciated btw)

catloaf@lemm.ee · 5 months ago

I think the best option for distributed storage is ceph.

Appoxo@lemmy.dbzer0.com · 5 months ago

At least something that’s distributed and fail safe (assuming OP targets this goal).
And if proxmox doesnt support it natively, someone could probably still config it local on the underlying debian OS.

31337@sh.itjust.works · 5 months ago

NFS gives me the best performance. I’ve tried GlusterFS (not at home, for work), and it was kind of a pain to set up and maintain.