I'm glad you're happy. I don't feel like its my place to reach out to a software company/author for tools I won't be using, but wanted to post some comments so that others should be wary of.
Others may want to investigate things. And if you found something that works for you, then that is fantastic. For myself, I've found a different solution to drive shortage problems, and haven't had the need (or want) to look at anything else as my system, tailor made for my environment, fits me like a glove.
-
-
I am just astounded how for some people it is very important to think that they must already know something they clearly don't, even in light of overwhelming evidence to the contrary, and even when the cost of acquiring such new knowledge is relatively low.jclausius likes this. -
Revise the procedure to include that control benchmark, run the tests, and post the results. I can wait. -
My anecdotal evidence using both a tablet and the massive rig with the gumstick SSD's in RAID is to my satisfaction, both of which make it easy to believe the vendor's claims as illustrated by their official benchmarks on their site.
If you wish to validate these claims for yourself - or refute them - you are welcome to run your own benchmarks, and draw your own conclusions, of course.
As I wrote earlier, I would be interested in any new benchmarks - to clarify, those that actually involve DiskZIP, of course.
From a conceptual or theoretical perspective, there's really nothing left to discuss here. -
So I make the claim that running compression impacts CPU performance in a negative way, you disagree with that statement, we go back and forward about theory, you still disagree with said statement, I provide hard numbers to demonstrate how that theory applies, and you still disagree with that statement?
I’ve made my claim and proved it. You’ve made a counter-claim and have yet to prove it.
Pot, meet kettle. I’m not going to bother engaging with you further about performance until you provide your testing results. -
You did not test DiskZIP at all. -
——-
Frankly, I’m not comfortable installing that specific software on my computer due to the content of its website, asking for personal information, and having to deal with running it over a TB of data and then removing it afterwards, just to argue with someone who’s massively hyped about one very specific software package while ignoring nearly everything I’ve said about the compression process itself, agnostic to what frontend is being used.
Clearly, you’re comfortable with the software and seem to be hell-bent on refuting my results. Again, I invite you to run your own tests to counter my tests and demonstrate that I am wrong.
Stating once again, I’ll wait. Until then, farewell. -
-
Your personal experience notwithstanding, the compression ratios that people get will vary greatly. Typical VMs will compress tremendously because much of the space on the VM is actually free space (although many VM managers are careful about how much space they actually consume). Executables will also typically compress well, as will text files (human languages have a lot of redundancy). Images and videos stored in typical formats won't compress at all unless very specialized algorithms are used, which might be too expensive to use for compressing large amounts of data. I have about 3 TB of images and videos, and I'd be very surprised if any compression at all would be achieved. No, I can't try DiskZIP because I'm not running Windows, but entropy is still entropy, and unless it's doing something like transcoding the files to drop the quality (which is cheating, since that's lossy compression), you're just not going to reduce the amount of space needed, unless you find duplicate images. Period. Your backup may have been mostly images and videos, but unless mostly is better defined, it's meaningless. This is all basic information theory, patent (pending or otherwise) or not. It doesn't matter whether you're compressing files or an entire disk image, there's an irreducible minimum you can achieve by lossless compression, and the closer you get to that, the more CPU time is required to get there.
The "online disk compression" and "offline disk compression" aren't very well described. "Offline disk compression" sounds like it takes over your system while it's compressing the partition. So you're paying a big price while that's going on -- you can't use your computer at all. Sure, maybe you can arrange to do that when you don't care (again: YMMV), but that doesn't apply to everyone. So you may not pay an immediate penalty when writing to the disk -- maybe it writes data uncompressed -- but there will be a penalty when reading from it. That penalty might not be in the way of throughput, but it will certainly be in terms of CPU usage, as was demonstrated.
If the writes to disk aren't immediately compressed, then either you lose the benefit for anything you write after initial compression (which I presume is what's meant by "online compression") or you're going to pay, sooner or later, that price in CPU consumption. Which again, may be possible to conceal by being clever in when that's done, but it can't be avoided.
But the really big concern I have, from that thread you pointed us at, was the remark "Just to let you all know our latest released versions fully support Windows 10 Fall Creators Update". What happens when Microsoft makes an incompatible change that breaks it (particularly if DiskZIP goes out of business)? How do you arrange it so that you haven't just lost all of your data, a la Stacker (other than a bootable recovery, which you'd better have set up ahead of time and which will likely be very time consuming to run)? I'd want to be very, very careful indeed about anything third party that hooks into the filesystem and does things with it that render it inaccessible to the OEM kernel. That applies to Linux, too. At my previous company we were required to use a particular third party encryption tool (rather than LUKS, which is already built into Linux). I was one of a fair number of people who entirely lost my disk when something went wrong and the recovery image failed. For that matter, those for whom it worked weren't a lot better off; the recovery image was basically a DOS image that took about a week to decrypt a 500GB disk. Most people simply reinstalled from scratch. -
Getting back to the issue at hand, I'm not picking sides. However, in the case of a claim regarding DiskZip, and being better for reasons X, Y and Z, the onus is on the person making the claim to come up with actual numbers to back it up. It isn't on the person who disagrees. And if there's no proof, then the best that can be said is that it worked for someone in their situation. -
Decompression Time(CPU) + Read Time(Compact Data) < Read Time (Uncompressed Data)
Jarhead hasn't tested this formula. He ran a completely unrelated test, while making it sound like he was actually testing DiskZIP. He wasn't, its as simple as that.
His claims that a compressed disk would affect a CPU benchmark is also pure nonsense. He claimed this because initially he would not question the validity of DiskZIP's storage performance benchmarks. Of course, it is easier to just question the validity of DiskZIP's storage benchmarks, which he's now figured out.
While there is something to what everyone else on this thread has said and contributed, Jarhead is just trolling and adds no value to the discussion whatsoever. -
Comparisons to 30 years ago aren't entirely relevant today due to hardware differences. Yes, CPUs are a lot faster (my laptop is at a minimum better than 100x the raw CPU power of my 90 MHz Pentium of 20+ years ago), but storage also has very different performance characteristics, in throughput, latency, and IO/sec. Modern SSDs achieve 300-500 MB/sec (SATA) all the way up to 3 GB/sec (NVMe) vs. the 1 or 2 MB/sec that that 90 MHz Pentium had -- that's more than the CPU performance improvement. Latency of floppies was measured in the hundreds of milliseconds and spinning rust in the range of 10 ms; spinning disks (SATA, at any rate, with command queuing) typically achieve maybe 150 IO/sec, while SATA SSD's can typically achieve maybe 80K IO/sec with 100 us latency and NVMe more like 400K IO/sec with 20 us latency (multithreaded in both cases -- single thread numbers are considerably lower due to that latency).
The formula you expressed (" Decompression Time(CPU) + Read Time(Compact Data) < Read Time (Uncompressed Data)") is not some kind of mathematical law. It may or may not be the case in a given application. For example, a very specialized compressor I built for certain test data takes about 20 seconds to compress 100 MB or so of data down to 160 KB. That's obviously much more time than it would take to read the data from any remotely modern disk, but my purpose here is not to reduce read time but to reduce storage needs for purposes of test archiving.Vistar Shook and msintle like this. -
I was never burned by disk compression back in the '90s, which is maybe why I am also able to trust its modern incarnations easily.
I can understand if you were burned back then, you would also be more suspicious of a modern implementation in turn. Sometimes, experience conditions us to be negative, where such negativity may not ultimately be justified.
I couldn't hold any of this against anyone, but trolling really helps no one at all, and should be stopped.
Just like Moore's Law, this formula does hold today, as it did 30 years ago.
I agree it does not need to, but it is DiskZIP's success and claim to fame that it in fact does. That is what's interesting here.
Testing a completely unrelated compression algorithm in a completely unrelated environment does not in any way disprove this law. -
Not to mention that I'd sure be curious how it avoids CPU overhead. I've worked in large system scalability for much of my career. It wasn't uncommon, as systems scaled up, to see no change in throughput microbenchmarks, but looking at CPU system time we'd see it increase. For microbenchmarks that doesn't matter. For people trying to run databases and database apps and analytics on those systems, that extra system time translates into less CPU time available to the user. Not to mention that as things got even faster down the road it might be early warning of a problem to be. What's going to happen when we have Optane or even lower latency storage technologies, for example? The answer might wind up being "don't use this with very fast storage", which is reasonable in this situation, but then people might have paid for something not useful for long.
A better analogy would be Amdahl's Law, which states that the limit to parallel speedup is the fraction of work that cannot be parallelized. If 1% of your workload can't be parallelized, you're limited to 99x speedup no matter how well you can parallelize the remainder (say you have 4000 CPUs and enough parallelism to take advantage of it). The point is that you have to attack the worst bottleneck, wherever it may lay; the improvement you can get from everything else combined is limited to that which is outside the bottleneck.Vistar Shook, Jarhead, jclausius and 1 other person like this. -
Let's face it - the only way to ensure data longevity is to back it up. Multiple times. Your underlying hardware could die, for crying out loud - and even your backups could die.
You might end up blaming the software running on top for the defect, such as DiskZIP, where it might actually be innocent.
I had a friend who was running mission critical software on a 4 disk RAID 1 array. All four of his drives failed over time. He didn't have backups. And yes, some people got very upset. Not a good situation. At least, it wasn't safety critical stuff.
DiskZIP made my life easier with backups as well. I just copy the compressed disk file, and that's it, for the whole volume. A simple file copy. I keep multiple copies of that file, spread out across weeks/months, which helps me check for issues introduced in an earlier image, but not discovered until later.
I understand this software is not for everyone. I have no qualms against reasonable arguments, and I'm not on a mission to convert anyone. My interest in how many people actually convert is identical to my ownership stake of the business!
It's just disappointing to have so much prejudice. If we were talking about people instead of software, this amount of prejudice would be called racism.
Nobody actually uses the software here except me by all your own admission, yet everybody here is making claims about what the software does and how it works. You may have just claimed, for example, that the underlying storage format used by DiskZIP is not supported by Microsoft!
Isn't it funny that I'm actually making the least claims here, and I am the only real user of the software here?
On the matter of the everlasting CPU overhead - of course there's CPU overhead! What's interesting is, the CPU overhead plus the time it takes to read compact data turns out to be less than the time it takes to read uncompressed data. That's the whole point of the exercise! -
In fact, DiskZIP recommends you use stronger compression algorithms on slower storage subsystems for maximum acceleration:
But of course, this may change any time; although with CPUs getting constantly faster and more parallel too, the law may hold into the foreseeable future. -
Simply copying the compressed disk file is not a good way to do backups. It's a lot more time-consuming than incremental backups, and if you do upgrade and find that the new version of the OS doesn't play ball with DiskZIP, you have that recovery problem, right? And you can't very easily do partial restores from that kind of backup. What assurance do you have that DiskZIP is doing extensive QA on their file format, and that it's designed to be robust against failures so that the remainder of the data is accessible? That's an important part of filesystem design. It's not that hard to design something that works if everything goes right; the hard part is what happens if things go wrong.
Is DiskZIP storing data in a container file living within the conventional filesystem? That's one way to go about it, all right, but then you have to garbage collect that container file.
If you try to compress incompressible data, the result is that that data actually expands. Yes, you can compare compressed to uncompressed, and only store it compressed if it's compacted -- but you still need somewhere to record the fact that that piece of data is not compressed. Is that case handled?
You claim "what's interesting is, the CPU overhead plus the time it takes to read compact data turns out to be less than the time it takes to read uncompressed data" is only partially true. It's not necessarily true if that CPU overhead could have been used for something else, and that something else is important. Say for example you're doing SETI At Home in the background that's soaking up all spare CPU cycles. CPU cycles used for decompression are cycles that SETI At Home can't use. So no, that overhead is not free of opportunity cost.
Yes, I know I'm asking a lot of "if" questions here. But those are the kinds of questions that someone using a filesystem should ask.Vistar Shook and Jarhead like this. -
Charles P. Jefferies Lead Moderator Super Moderator
when will be 8TB 2.5 portable hard drive in the consumer market?
Discussion in 'Desktop Hardware' started by kenny1999, Oct 28, 2017.