I recently moved my files to a new zfs-pool and used that chance to properly configure my datasets.

This led me to discovering zfs-deduplication.

As most of my storage is used by my jellyfin library (~7-8Tb), which is mostly uncompressed bluray rips I thought I might be able to save some storage using deduplication in addition to compression.

Has anyone here used that for similar files before? What was your experience with it?

I am not too worried about performance. The dataset in question is rarely changed. Basically only when I add more media every couple of months. I also have overshot my cpu-target when originally configuring my server so there is a lot of headroom there. I have 32Gb of ram which is not really fully utilized either (but I also would not mind upgrading to 64 too much).

My main concern is that I am unsure it is useful. I suspect just because of the amount of data and similarity in type there would statistically be a lot of block-level duplication but I could not find any real world data or experiences on that.

    • friend_of_satan@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      10 months ago

      I was also going to link this. I started using zfs 10-ish years ago and used dedup when it came out, and it was really not worth it except for archiving a bunch of stuff I knew had gigs of duplicate data. Performance was so poor.

    • Cousin Mose@lemmy.hogru.ch
      link
      fedilink
      English
      arrow-up
      3
      ·
      10 months ago

      I’m in almost the exact same situation as OP, 8 TB of raw Blu-ray dumps except I’m on XFS. I ran duperemove and freed ~200 GB.

      • needanke@feddit.orgOP
        link
        fedilink
        English
        arrow-up
        2
        ·
        10 months ago

        I think I was a bit unclear on that, I meant uncompressed rips as in I ripped the relevant media to unkompressed mkvs, I didn’t save the entire disk dump. I also have mostly such rips, but also a bit of media from other sourches ™ which is already compressed. So I suspect my results would be even worse.

        • Cousin Mose@lemmy.hogru.ch
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          I agree. Most of my duplicates came from the raw disc files. I too dump some content to MKV (mainly TV episodes) but those files likely have much less duplication, though I do recall some of the duplicates coming from The Office in MKV.

          (I do wonder if those The Office duplicates were something like the opening title, or scenes from the episode showing clips from previous episodes because it seems highly unlikely that the raw video streams were similar.)