World's most popular travel blog for travel bloggers.

Why not space out memory allocations?

, , No Comments
Problem Detail: 

In ext4 file system, the files are spaced out as far apart as reasonably possible to allow for efficient reallocation. Why do we not do this in memory?

Why not allocate one memory as page 20, and the next large allocation as page 100 to allow for excess expansion before and after the current allocation? I feel that makes sense for large frequently changing-in-size buffers. For smaller buffers, doing so would probably be a drain on memory because we have to allocate each page for a small amount of bytes(but perhaps do it within a block too in the malloc impl). This would also provide greater memory corruption prevention, allowing segfaults(or windows equivalent) to happen quicker(specifically for larger buffers).

Why don't we do this? I get it on modern 32-bit systems because of limited address space, but why not on 64-bit?

Asked By : JavaProphet
Answered By : D.W.

Why is memory different from a hard disk?

Because hard disks have different performance characteristics than RAM.

With hard disks, seeking is extremely slow. Sequential reads are much faster than random-access reads. In other words, it's much faster to read data that is stored consecutively than to store data that's scattered all around the disk. Therefore, there are benefits to making sure your files are contiguous, as much as possible.

RAM doesn't have the same property. With RAM, there is no penalty for "seeking". You can pick any page and read from it, and the cost will be the same, no matter where the page is located. For instance, reading page X and then page Y will have the same cost regardless of whether X and Y are adjacent or not.

So, you end up with different algorithms that are optimized for hardware with different performance characteristics.

Why do hard disks have this strange property? Because they are based on a rotating piece of iron. If you read block X on the hard disk, and next want to read block Y, but Y is on the other side of the platter, then you'll have to wait for the platter to finish rotating far enough for Y to be under the read head. That takes time. RAM doesn't have this property.

Interestingly, the differences are becoming less these days. Today, SSD (solid state storage devices, based on Flash storage) are becoming a popular replacement for hard disks. SSD's have their own performance characteristics, but they're more like RAM than like magnetic hard disks. This means that the filesystem optimizations you'd do for a SSD-oriented filesystem are different from the ones you'd do for a hard-drive-oriented filesystem. Many of today's filesystems were designed decades ago when magnetic hard disks were dominant, and optimize for the performance characteristics of magnetic hard disks.

Why not harden software by deliberately leaving space between buffers?

You are implicitly proposing a scheme for making software more robust against out-of-bounds errors and buffer overruns: leave plenty of unallocated space between each pair of buffers allocated on the heap.

That's not a bad idea. It's been proposed before and studied in considerable depth. It does add some robustness against bugs. It can also make buffer overrun attacks harder, especially if you randomize the location of all heap buffers. It's also relatively easy to deploy in a way that is compatible with legacy software. So it does have some quite attractive advantages.

However, it also has some disadvantages. It imposes some performance overhead -- not a huge amount, but enough to be a bit painful. Why does it impose performance overhead? Largely because it reduces spatial locality. Consider two 16-byte allocations. Current allocators try to pack them together into a single page. In your scheme, we'd have to put each allocation onto its own page. That means we have a lot more pages, which means that we get a lower TLB hit rate.

There are different variants on this scheme, with different performance implications, but they all run up against the challenge: in exchange for the robustness and security benefits, there is some non-zero performance overhead. So that's one reason why people might not adopt it.

The other big reason is: we're using platforms and allocators that were designed decades ago. At the time, security wasn't as critical as it is today, so this sounded like a pretty unattractive proposal: I get to implement something more complicated and fiddly, and it'll make my entire system slower? No thanks.

Today given the magnitude of security problems we're facing, there's gradually increasing interest in these alternatives. But they still face an uphill battle, because of the non-zero performance implications.

To learn more about academic research on this subject, read research papers on the Diehard, Dieharder, and Archipelago systems. They discuss the design in a very accessible way and measure the performance and other costs of such a scheme. If you're still intrigued, you can explore the literature where other researchers have explored other points in the design space to understand their implications.

Bottom line: you're not the first to think of this, and it is a pretty attractive idea, but it also comes with some significant costs.

Best Answer from StackOverflow

Question Source :

3200 people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback