when will be 8TB 2.5 portable hard drive in the consumer market | Page 2 | NotebookReview

jclausius Notebook Virtuoso

Reputations:: 6,160

Messages:: 3,265

Likes Received:: 2,573

Trophy Points:: 231

I'm glad you're happy. I don't feel like its my place to reach out to a software company/author for tools I won't be using, but wanted to post some comments so that others should be wary of.

Others may want to investigate things. And if you found something that works for you, then that is fantastic. For myself, I've found a different solution to drive shortage problems, and haven't had the need (or want) to look at anything else as my system, tailor made for my environment, fits me like a glove.

Last edited: Mar 29, 2018

jclausius, Mar 29, 2018

#51

msintle likes this.

msintle Notebook Consultant

Reputations:: 15

Messages:: 181

Likes Received:: 62

Trophy Points:: 41

jclausius said: ↑

I'm glad you're happy. I don't feel like its my place to reach out to a software company/author for tools I won't be using, but wanted to bring them up as something to be wary of.

Others may want to investigate things. And if you found something that works for you, then that is fantastic. For myself, I've found a different solution to drive shortage problems, and haven't had the need (or want) to look at anything else as my system, tailor made for my environment, fits me like a glove.

Click to expand...

Thank you.

I am just astounded how for some people it is very important to think that they must already know something they clearly don't, even in light of overwhelming evidence to the contrary, and even when the cost of acquiring such new knowledge is relatively low.

msintle, Mar 29, 2018

#52

jclausius likes this.

Jarhead 恋の♡アカサタナ

Reputations:: 5,036

Messages:: 12,168

Likes Received:: 3,134

Trophy Points:: 681

msintle said: ↑

Thank you.

I am just astounded how for some people it is very important to think that they must already know something they clearly don't, even in light of overwhelming evidence to the contrary, and even when the cost of acquiring such new knowledge is relatively low.

Click to expand...

I’ll be glad to take a look at your testing results comparing system performance on a system without DiskZIP installed/running on your system, and after it is installed on your system. As your testing proposal stands, you did not include testing for system performance without it installed and running, so I don’t know what sort of conclusions you are planning to make.

Revise the procedure to include that control benchmark, run the tests, and post the results. I can wait.

Jarhead, Mar 29, 2018

#53

msintle Notebook Consultant

Reputations:: 15

Messages:: 181

Likes Received:: 62

Trophy Points:: 41

Jarhead said: ↑

I’ll be glad to take a look at your testing results comparing system performance on a system without DiskZIP installed/running on your system, and after it is installed on your system. As your testing proposal stands, you did not include testing for system performance without it installed and running, so I don’t know what sort of conclusions you are planning to make.

Revise the procedure to include that control benchmark, run the tests, and post the results. I can wait.

Click to expand...

To be frank, I have nothing to prove.

My anecdotal evidence using both a tablet and the massive rig with the gumstick SSD's in RAID is to my satisfaction, both of which make it easy to believe the vendor's claims as illustrated by their official benchmarks on their site.

If you wish to validate these claims for yourself - or refute them - you are welcome to run your own benchmarks, and draw your own conclusions, of course.

As I wrote earlier, I would be interested in any new benchmarks - to clarify, those that actually involve DiskZIP, of course.

From a conceptual or theoretical perspective, there's really nothing left to discuss here.

msintle, Mar 29, 2018

#54

Jarhead 恋の♡アカサタナ

Reputations:: 5,036

Messages:: 12,168

Likes Received:: 3,134

Trophy Points:: 681

So I make the claim that running compression impacts CPU performance in a negative way, you disagree with that statement, we go back and forward about theory, you still disagree with said statement, I provide hard numbers to demonstrate how that theory applies, and you still disagree with that statement?

I’ve made my claim and proved it. You’ve made a counter-claim and have yet to prove it.

Pot, meet kettle. I’m not going to bother engaging with you further about performance until you provide your testing results.

Jarhead, Mar 29, 2018

#55

msintle Notebook Consultant

Reputations:: 15

Messages:: 181

Likes Received:: 62

Trophy Points:: 41

Jarhead said: ↑

So I make the claim that running compression impacts CPU performance in a negative way, you disagree with that statement, we go back and forward about theory, you still disagree with said statement, I provide hard numbers to demonstrate how that theory applies, and you still disagree with that statement?

I’ve made my claim and proved it. You’ve made a counter-claim and have yet to prove it.

Pot, meet kettle. I’m not going to bother engaging with you further about performance until you provide your testing results.

Click to expand...

Clearly this is false, and makes me wonder about your motivations.

You did not test DiskZIP at all.

msintle, Mar 29, 2018

#56

Jarhead 恋の♡アカサタナ

Reputations:: 5,036

Messages:: 12,168

Likes Received:: 3,134

Trophy Points:: 681

msintle said: ↑

Clearly this is false, and makes me wonder about your motivations.

You did not test DiskZIP at all.

Click to expand...

Clearly you have issues with reading comprehension. Reread that post again.

——-

Frankly, I’m not comfortable installing that specific software on my computer due to the content of its website, asking for personal information, and having to deal with running it over a TB of data and then removing it afterwards, just to argue with someone who’s massively hyped about one very specific software package while ignoring nearly everything I’ve said about the compression process itself, agnostic to what frontend is being used.

Clearly, you’re comfortable with the software and seem to be hell-bent on refuting my results. Again, I invite you to run your own tests to counter my tests and demonstrate that I am wrong.

Stating once again, I’ll wait. Until then, farewell.

Jarhead, Mar 29, 2018

#57

msintle Notebook Consultant

Reputations:: 15

Messages:: 181

Likes Received:: 62

Trophy Points:: 41

Jarhead said: ↑

Clearly you have issues with reading comprehension. Reread that post again.

——-

Frankly, I’m not comfortable installing that specific software on my computer due to the content of its website, asking for personal information, and having to deal with running it over a TB of data and then removing it afterwards, just to argue with someone who’s massively hyped about one very specific software package while ignoring nearly everything I’ve said about the compression process itself, agnostic to what frontend is being used.

Clearly, you’re comfortable with the software and seem to be hell-bent on refuting my results. Again, I invite you to run your own tests to counter my tests and demonstrate that I am wrong.

Stating once again, I’ll wait. Until then, farewell.

Click to expand...

I am reporting you for trolling me on this thread.

msintle, Mar 29, 2018

#58

rlk Notebook Evangelist

Reputations:: 146

Messages:: 607

Likes Received:: 316

Trophy Points:: 76

msintle said: ↑

As a CS graduate myself, I find this post a bit condescending. Edit: Especially the part where you belittle the dramatic compression gains `as seen in my benchmarks`.

The official DiskZIP benchmarks - and my personal experience - show that the storage subsystem runs measurably faster with disk compression enabled than without. DiskZIP works exactly as designed in this regard, accelerating disk read speeds.

You've referenced "CPU benchmarks" which doesn't make sense, as a pure CPU benchmark (say wPrime) would be 100% unaffected by disk compression. I've asked you to clarify what other kind of CPU benchmark you've had in mind, to which you responded with the platitudes above.

What would be helpful to other users (and myself) is if you actually dug in and compiled the CPU benchmarks you actually have in mind (if any) and post the results here.

Referencing completely unrelated *file* compression with lrzip just doesn't serve any purpose other than obfuscate the conversation.

Click to expand...

With something like this -- even more than with most software -- YMMV.

Your personal experience notwithstanding, the compression ratios that people get will vary greatly. Typical VMs will compress tremendously because much of the space on the VM is actually free space (although many VM managers are careful about how much space they actually consume). Executables will also typically compress well, as will text files (human languages have a lot of redundancy). Images and videos stored in typical formats won't compress at all unless very specialized algorithms are used, which might be too expensive to use for compressing large amounts of data. I have about 3 TB of images and videos, and I'd be very surprised if any compression at all would be achieved. No, I can't try DiskZIP because I'm not running Windows, but entropy is still entropy, and unless it's doing something like transcoding the files to drop the quality (which is cheating, since that's lossy compression), you're just not going to reduce the amount of space needed, unless you find duplicate images. Period. Your backup may have been mostly images and videos, but unless mostly is better defined, it's meaningless. This is all basic information theory, patent (pending or otherwise) or not. It doesn't matter whether you're compressing files or an entire disk image, there's an irreducible minimum you can achieve by lossless compression, and the closer you get to that, the more CPU time is required to get there.

The "online disk compression" and "offline disk compression" aren't very well described. "Offline disk compression" sounds like it takes over your system while it's compressing the partition. So you're paying a big price while that's going on -- you can't use your computer at all. Sure, maybe you can arrange to do that when you don't care (again: YMMV), but that doesn't apply to everyone. So you may not pay an immediate penalty when writing to the disk -- maybe it writes data uncompressed -- but there will be a penalty when reading from it. That penalty might not be in the way of throughput, but it will certainly be in terms of CPU usage, as was demonstrated.

If the writes to disk aren't immediately compressed, then either you lose the benefit for anything you write after initial compression (which I presume is what's meant by "online compression") or you're going to pay, sooner or later, that price in CPU consumption. Which again, may be possible to conceal by being clever in when that's done, but it can't be avoided.

But the really big concern I have, from that thread you pointed us at, was the remark "Just to let you all know our latest released versions fully support Windows 10 Fall Creators Update". What happens when Microsoft makes an incompatible change that breaks it (particularly if DiskZIP goes out of business)? How do you arrange it so that you haven't just lost all of your data, a la Stacker (other than a bootable recovery, which you'd better have set up ahead of time and which will likely be very time consuming to run)? I'd want to be very, very careful indeed about anything third party that hooks into the filesystem and does things with it that render it inaccessible to the OEM kernel. That applies to Linux, too. At my previous company we were required to use a particular third party encryption tool (rather than LUKS, which is already built into Linux). I was one of a fair number of people who entirely lost my disk when something went wrong and the recovery image failed. For that matter, those for whom it worked weren't a lot better off; the recovery image was basically a DOS image that took about a week to decrypt a 500GB disk. Most people simply reinstalled from scratch.

rlk, Mar 29, 2018

#59

jclausius and msintle like this.

jclausius Notebook Virtuoso

Reputations:: 6,160

Messages:: 3,265

Likes Received:: 2,573

Trophy Points:: 231

msintle said: ↑

Thank you.

I am just astounded how for some people it is very important to think that they must already know something they clearly don't, even in light of overwhelming evidence to the contrary, and even when the cost of acquiring such new knowledge is relatively low.

Click to expand...

Yes. But if a friend would ask me personally about using a compressed drive, I would always recommend to NOT put anything in a compressed drive - but not really for perf. reasons, but on other grounds as I want the data to be raw... I want... no need to be able to move a drive from system to system be it a Mac, Linux or PC. However, that is my opinion, and I realize Diffr'nt strokes for diffr'nt folks.

Getting back to the issue at hand, I'm not picking sides. However, in the case of a claim regarding DiskZip, and being better for reasons X, Y and Z, the onus is on the person making the claim to come up with actual numbers to back it up. It isn't on the person who disagrees. And if there's no proof, then the best that can be said is that it worked for someone in their situation.

jclausius, Mar 29, 2018

#60

Jarhead, rlk and msintle like this.

msintle Notebook Consultant

Reputations:: 15

Messages:: 181

Likes Received:: 62

Trophy Points:: 41

jclausius said: ↑

Yes. But if a friend would ask me personally about using a compressed drive, I would always recommend to NOT put anything in a compressed drive - but not really for perf. reasons, but on other grounds as I want the data to be raw... I want... no need to be able to move a drive from system to system be it a Mac, Linux or PC. However, that is my opinion, and I realize Diffr'nt strokes for diffr'nt folks.

Getting back to the issue at hand, I'm not picking sides. However, in the case of a claim regarding DiskZip, and being better for reasons X, Y and Z, the onus is on the person making the claim to come up with actual numbers to back it up. It isn't on the person who disagrees. And if there's no proof, then the best that can be said is that it worked for someone in their situation.

Click to expand...

There are benchmarks though, as found on the author's website. And I have no reason to doubt them, especially when the formula makes a ton of sense, and my personal experience is in line with the formula:

Decompression Time(CPU) + Read Time(Compact Data) < Read Time (Uncompressed Data)

Jarhead hasn't tested this formula. He ran a completely unrelated test, while making it sound like he was actually testing DiskZIP. He wasn't, its as simple as that.

His claims that a compressed disk would affect a CPU benchmark is also pure nonsense. He claimed this because initially he would not question the validity of DiskZIP's storage performance benchmarks. Of course, it is easier to just question the validity of DiskZIP's storage benchmarks, which he's now figured out.

While there is something to what everyone else on this thread has said and contributed, Jarhead is just trolling and adds no value to the discussion whatsoever.

msintle, Mar 29, 2018

#61

rlk Notebook Evangelist

Reputations:: 146

Messages:: 607

Likes Received:: 316

Trophy Points:: 76

msintle said: ↑

Even 30 years ago when we had single core CPUs running at meager 10's of MHz only, disk compression speeded up systems - for example, Windows 3.1 booted in 21 seconds instead of 23, due to the same formula I've shared above. And the formula still holds.

Raw CPU performance is of course unaffected by disk compression, since its got nothing to do with disk I/O whatsoever.

Could you clarify what kind of CPU performance you have in mind exactly?

Click to expand...

Well, the decompression is performed by the CPU, so it consumes CPU cycles that might otherwise be put to other use. It's not a hardware (or disk firmware) solution that's offloaded from the host.

Comparisons to 30 years ago aren't entirely relevant today due to hardware differences. Yes, CPUs are a lot faster (my laptop is at a minimum better than 100x the raw CPU power of my 90 MHz Pentium of 20+ years ago), but storage also has very different performance characteristics, in throughput, latency, and IO/sec. Modern SSDs achieve 300-500 MB/sec (SATA) all the way up to 3 GB/sec (NVMe) vs. the 1 or 2 MB/sec that that 90 MHz Pentium had -- that's more than the CPU performance improvement. Latency of floppies was measured in the hundreds of milliseconds and spinning rust in the range of 10 ms; spinning disks (SATA, at any rate, with command queuing) typically achieve maybe 150 IO/sec, while SATA SSD's can typically achieve maybe 80K IO/sec with 100 us latency and NVMe more like 400K IO/sec with 20 us latency (multithreaded in both cases -- single thread numbers are considerably lower due to that latency).

The formula you expressed (" Decompression Time(CPU) + Read Time(Compact Data) < Read Time (Uncompressed Data)") is not some kind of mathematical law. It may or may not be the case in a given application. For example, a very specialized compressor I built for certain test data takes about 20 seconds to compress 100 MB or so of data down to 160 KB. That's obviously much more time than it would take to read the data from any remotely modern disk, but my purpose here is not to reduce read time but to reduce storage needs for purposes of test archiving.

rlk, Mar 29, 2018

#62

Vistar Shook and msintle like this.

msintle Notebook Consultant

Reputations:: 15

Messages:: 181

Likes Received:: 62

Trophy Points:: 41

rlk said: ↑

With something like this -- even more than with most software -- YMMV.

Your personal experience notwithstanding, the compression ratios that people get will vary greatly. Typical VMs will compress tremendously because much of the space on the VM is actually free space (although many VM managers are careful about how much space they actually consume). Executables will also typically compress well, as will text files (human languages have a lot of redundancy). Images and videos stored in typical formats won't compress at all unless very specialized algorithms are used, which might be too expensive to use for compressing large amounts of data. I have about 3 TB of images and videos, and I'd be very surprised if any compression at all would be achieved. No, I can't try DiskZIP because I'm not running Windows, but entropy is still entropy, and unless it's doing something like transcoding the files to drop the quality (which is cheating, since that's lossy compression), you're just not going to reduce the amount of space needed, unless you find duplicate images. Period. Your backup may have been mostly images and videos, but unless mostly is better defined, it's meaningless. This is all basic information theory, patent (pending or otherwise) or not. It doesn't matter whether you're compressing files or an entire disk image, there's an irreducible minimum you can achieve by lossless compression, and the closer you get to that, the more CPU time is required to get there.

The "online disk compression" and "offline disk compression" aren't very well described. "Offline disk compression" sounds like it takes over your system while it's compressing the partition. So you're paying a big price while that's going on -- you can't use your computer at all. Sure, maybe you can arrange to do that when you don't care (again: YMMV), but that doesn't apply to everyone. So you may not pay an immediate penalty when writing to the disk -- maybe it writes data uncompressed -- but there will be a penalty when reading from it. That penalty might not be in the way of throughput, but it will certainly be in terms of CPU usage, as was demonstrated.

If the writes to disk aren't immediately compressed, then either you lose the benefit for anything you write after initial compression (which I presume is what's meant by "online compression") or you're going to pay, sooner or later, that price in CPU consumption. Which again, may be possible to conceal by being clever in when that's done, but it can't be avoided.

But the really big concern I have, from that thread you pointed us at, was the remark "Just to let you all know our latest released versions fully support Windows 10 Fall Creators Update". What happens when Microsoft makes an incompatible change that breaks it (particularly if DiskZIP goes out of business)? How do you arrange it so that you haven't just lost all of your data, a la Stacker (other than a bootable recovery, which you'd better have set up ahead of time and which will likely be very time consuming to run)? I'd want to be very, very careful indeed about anything third party that hooks into the filesystem and does things with it that render it inaccessible to the OEM kernel. That applies to Linux, too. At my previous company we were required to use a particular third party encryption tool (rather than LUKS, which is already built into Linux). I was one of a fair number of people who entirely lost my disk when something went wrong and the recovery image failed. For that matter, those for whom it worked weren't a lot better off; the recovery image was basically a DOS image that took about a week to decrypt a 500GB disk. Most people simply reinstalled from scratch.

Click to expand...

All of these are valid points, if a bit on the paranoid/overly cautious side.

I was never burned by disk compression back in the '90s, which is maybe why I am also able to trust its modern incarnations easily.

I can understand if you were burned back then, you would also be more suspicious of a modern implementation in turn. Sometimes, experience conditions us to be negative, where such negativity may not ultimately be justified.

I couldn't hold any of this against anyone, but trolling really helps no one at all, and should be stopped.

rlk said: ↑

Well, the decompression is performed by the CPU, so it consumes CPU cycles that might otherwise be put to other use. It's not a hardware (or disk firmware) solution that's offloaded from the host.

Comparisons to 30 years ago aren't entirely relevant today due to hardware differences. Yes, CPUs are a lot faster (my laptop is at a minimum better than 100x the raw CPU power of my 90 MHz Pentium of 20+ years ago), but storage also has very different performance characteristics, in throughput, latency, and IO/sec. Modern SSDs achieve 300-500 MB/sec (SATA) all the way up to 3 GB/sec (NVMe) vs. the 1 or 2 MB/sec that that 90 MHz Pentium had -- that's more than the CPU performance improvement. Latency of floppies was measured in the hundreds of milliseconds and spinning rust in the range of 10 ms; spinning disks (SATA, at any rate, with command queuing) typically achieve maybe 150 IO/sec, while SATA SSD's can typically achieve maybe 80K IO/sec with 100 us latency and NVMe more like 400K IO/sec with 20 us latency (multithreaded in both cases -- single thread numbers are considerably lower due to that latency).

The formula you expressed (" Decompression Time(CPU) + Read Time(Compact Data) < Read Time (Uncompressed Data)") is not some kind of mathematical law. It may or may not be the case in a given application. For example, a very specialized compressor I built for certain test data takes about 20 seconds to compress 100 MB or so of data down to 160 KB. That's obviously much more time than it would take to read the data from any remotely modern disk, but my purpose here is not to reduce read time but to reduce storage needs for purposes of test archiving.

Click to expand...

Again, all very valid points.

Just like Moore's Law, this formula does hold today, as it did 30 years ago.

I agree it does not need to, but it is DiskZIP's success and claim to fame that it in fact does. That is what's interesting here.

Testing a completely unrelated compression algorithm in a completely unrelated environment does not in any way disprove this law.

msintle, Mar 29, 2018

#63

rlk Notebook Evangelist

Reputations:: 146

Messages:: 607

Likes Received:: 316

Trophy Points:: 76

msintle said: ↑

All of these are valid points, if a bit on the paranoid/overly cautious side.

I was never burned by disk compression back in the '90s, which is maybe why I am also able to trust its modern incarnations easily.

Click to expand...

I'm a software engineer who has been in systems my entire career. I'm paid to be paranoid (among other things). This isn't some kind of game where if it seg faults, oh well. This is things have to be robust and work. Every time. Because if they don't, Very Big Companies will Yell and Scream Very Loudly. Oh, and in some cases people might die. Literally. When it comes to filesystems, yes, I'm very cautious and I don't want to use an underlying storage format that isn't supported by the base OS vendor.

Not to mention that I'd sure be curious how it avoids CPU overhead. I've worked in large system scalability for much of my career. It wasn't uncommon, as systems scaled up, to see no change in throughput microbenchmarks, but looking at CPU system time we'd see it increase. For microbenchmarks that doesn't matter. For people trying to run databases and database apps and analytics on those systems, that extra system time translates into less CPU time available to the user. Not to mention that as things got even faster down the road it might be early warning of a problem to be. What's going to happen when we have Optane or even lower latency storage technologies, for example? The answer might wind up being "don't use this with very fast storage", which is reasonable in this situation, but then people might have paid for something not useful for long.

msintle said: ↑

Again, all very valid points.

Just like Moore's Law, this formula does hold today, as it did 30 years ago.

I agree it does not need to, but it is DiskZIP's success and claim to fame that it in fact does. That is what's interesting here.

Testing a completely unrelated compression algorithm in a completely unrelated environment does not in any way disprove this law.

Click to expand...

No, not "just like Moore's Law". It's an ad hoc observation that holds in your case but may not in others' cases. If you're CPU-bound but not I/O-bound -- which, I will grant, isn't often true in consumer applications (and even business ones) -- then reducing CPU consumption, even at the cost of extra I/O, can be the right tradeoff.

A better analogy would be Amdahl's Law, which states that the limit to parallel speedup is the fraction of work that cannot be parallelized. If 1% of your workload can't be parallelized, you're limited to 99x speedup no matter how well you can parallelize the remainder (say you have 4000 CPUs and enough parallelism to take advantage of it). The point is that you have to attack the worst bottleneck, wherever it may lay; the improvement you can get from everything else combined is limited to that which is outside the bottleneck.

rlk, Mar 29, 2018

#64

Vistar Shook, Jarhead, jclausius and 1 other person like this.

msintle Notebook Consultant

Reputations:: 15

Messages:: 181

Likes Received:: 62

Trophy Points:: 41

rlk said: ↑

I'm a software engineer who has been in systems my entire career. I'm paid to be paranoid (among other things). This isn't some kind of game where if it seg faults, oh well. This is things have to be robust and work. Every time. Because if they don't, Very Big Companies will Yell and Scream Very Loudly. Oh, and in some cases people might die. Literally. When it comes to filesystems, yes, I'm very cautious and I don't want to use an underlying storage format that isn't supported by the base OS vendor.

Not to mention that I'd sure be curious how it avoids CPU overhead. I've worked in large system scalability for much of my career. It wasn't uncommon, as systems scaled up, to see no change in throughput microbenchmarks, but looking at CPU system time we'd see it increase. For microbenchmarks that doesn't matter. For people trying to run databases and database apps and analytics on those systems, that extra system time translates into less CPU time available to the user. Not to mention that as things got even faster down the road it might be early warning of a problem to be. What's going to happen when we have Optane or even lower latency storage technologies, for example? The answer might wind up being "don't use this with very fast storage", which is reasonable in this situation, but then people might have paid for something not useful for long.

Click to expand...

This is all nice and fair, but it all sorts of begs the question, doesn't it?

Let's face it - the only way to ensure data longevity is to back it up. Multiple times. Your underlying hardware could die, for crying out loud - and even your backups could die.

You might end up blaming the software running on top for the defect, such as DiskZIP, where it might actually be innocent.

I had a friend who was running mission critical software on a 4 disk RAID 1 array. All four of his drives failed over time. He didn't have backups. And yes, some people got very upset. Not a good situation. At least, it wasn't safety critical stuff.

DiskZIP made my life easier with backups as well. I just copy the compressed disk file, and that's it, for the whole volume. A simple file copy. I keep multiple copies of that file, spread out across weeks/months, which helps me check for issues introduced in an earlier image, but not discovered until later.

I understand this software is not for everyone. I have no qualms against reasonable arguments, and I'm not on a mission to convert anyone. My interest in how many people actually convert is identical to my ownership stake of the business!

It's just disappointing to have so much prejudice. If we were talking about people instead of software, this amount of prejudice would be called racism.

Nobody actually uses the software here except me by all your own admission, yet everybody here is making claims about what the software does and how it works. You may have just claimed, for example, that the underlying storage format used by DiskZIP is not supported by Microsoft!

Isn't it funny that I'm actually making the least claims here, and I am the only real user of the software here?

On the matter of the everlasting CPU overhead - of course there's CPU overhead! What's interesting is, the CPU overhead plus the time it takes to read compact data turns out to be less than the time it takes to read uncompressed data. That's the whole point of the exercise!

msintle, Mar 29, 2018

#65

msintle Notebook Consultant

Reputations:: 15

Messages:: 181

Likes Received:: 62

Trophy Points:: 41

rlk said: ↑

No, not "just like Moore's Law". It's an ad hoc observation that holds in your case but may not in others' cases. If you're CPU-bound but not I/O-bound -- which, I will grant, isn't often true in consumer applications (and even business ones) -- then reducing CPU consumption, even at the cost of extra I/O, can be the right tradeoff.

A better analogy would be Amdahl's Law, which states that the limit to parallel speedup is the fraction of work that cannot be parallelized. If 1% of your workload can't be parallelized, you're limited to 99x speedup no matter how well you can parallelize the remainder (say you have 4000 CPUs and enough parallelism to take advantage of it). The point is that you have to attack the worst bottleneck, wherever it may lay; the improvement you can get from everything else combined is limited to that which is outside the bottleneck.

Click to expand...

If it holds in my case of a RAID 0 array across three Samsung 960 PRO NVMe SSDs, I'm pretty sure it holds in the case of slower storage subsystems.

In fact, DiskZIP recommends you use stronger compression algorithms on slower storage subsystems for maximum acceleration:

But of course, this may change any time; although with CPUs getting constantly faster and more parallel too, the law may hold into the foreseeable future.

msintle, Mar 29, 2018

#66

rlk Notebook Evangelist

Reputations:: 146

Messages:: 607

Likes Received:: 316

Trophy Points:: 76

msintle said: ↑

This is all nice and fair, but it all sorts of begs the question, doesn't it?

Let's face it - the only way to ensure data longevity is to back it up. Multiple times. Your underlying hardware could die, for crying out loud - and even your backups could die.

You might end up blaming the software running on top for the defect, such as DiskZIP, where it might actually be innocent.

I had a friend who was running mission critical software on a 4 disk RAID 1 array. All four of his drives failed over time. He didn't have backups. And yes, some people got very upset. Not a good situation. At least, it wasn't safety critical stuff.

DiskZIP made my life easier with backups as well. I just copy the compressed disk file, and that's it, for the whole volume. A simple file copy. I keep multiple copies of that file, spread out across weeks/months, which helps me check for issues introduced in an earlier image, but not discovered until later.

I understand this software is not for everyone. I have no qualms against reasonable arguments, and I'm not on a mission to convert anyone. My interest in how many people actually convert is identical to my ownership stake of the business!

It's just disappointing to have so much prejudice. If we were talking about people instead of software, this amount of prejudice would be called racism.

Nobody actually uses the software here except me by all your own admission, yet everybody here is making claims about what the software does and how it works. You may have just claimed, for example, that the underlying storage format used by DiskZIP is not supported by Microsoft!

Isn't it funny that I'm actually making the least claims here, and I am the only real user of the software here?

On the matter of the everlasting CPU overhead - of course there's CPU overhead! What's interesting is, the CPU overhead plus the time it takes to read compact data turns out to be less than the time it takes to read uncompressed data. That's the whole point of the exercise!

Click to expand...

Well OK, yes, backups. But you still want your underlying filesystem to be rock solid reliable.

Simply copying the compressed disk file is not a good way to do backups. It's a lot more time-consuming than incremental backups, and if you do upgrade and find that the new version of the OS doesn't play ball with DiskZIP, you have that recovery problem, right? And you can't very easily do partial restores from that kind of backup. What assurance do you have that DiskZIP is doing extensive QA on their file format, and that it's designed to be robust against failures so that the remainder of the data is accessible? That's an important part of filesystem design. It's not that hard to design something that works if everything goes right; the hard part is what happens if things go wrong.

Is DiskZIP storing data in a container file living within the conventional filesystem? That's one way to go about it, all right, but then you have to garbage collect that container file.

If you try to compress incompressible data, the result is that that data actually expands. Yes, you can compare compressed to uncompressed, and only store it compressed if it's compacted -- but you still need somewhere to record the fact that that piece of data is not compressed. Is that case handled?

You claim "what's interesting is, the CPU overhead plus the time it takes to read compact data turns out to be less than the time it takes to read uncompressed data" is only partially true. It's not necessarily true if that CPU overhead could have been used for something else, and that something else is important. Say for example you're doing SETI At Home in the background that's soaking up all spare CPU cycles. CPU cycles used for decompression are cycles that SETI At Home can't use. So no, that overhead is not free of opportunity cost.

Yes, I know I'm asking a lot of "if" questions here. But those are the kinds of questions that someone using a filesystem should ask.

rlk, Mar 29, 2018

#67

Vistar Shook and Jarhead like this.

Charles P. Jefferies Lead Moderator Super Moderator

Reputations:: 22,339

Messages:: 36,639

Likes Received:: 5,080

Trophy Points:: 931

We're done here.

Charles

Charles P. Jefferies, Mar 29, 2018

#68

KY_BULLET and Jarhead like this.

when will be 8TB 2.5 portable hard drive in the consumer market?

jclausius Notebook Virtuoso

msintle Notebook Consultant

Jarhead 恋の♡アカサタナ

msintle Notebook Consultant

Jarhead 恋の♡アカサタナ

msintle Notebook Consultant

Jarhead 恋の♡アカサタナ

msintle Notebook Consultant

rlk Notebook Evangelist

jclausius Notebook Virtuoso

msintle Notebook Consultant

rlk Notebook Evangelist

msintle Notebook Consultant

rlk Notebook Evangelist

msintle Notebook Consultant

msintle Notebook Consultant

rlk Notebook Evangelist

Charles P. Jefferies Lead Moderator Super Moderator