C - Memory Card - B-Tree

Question

C - Memory Card - B-Tree

I am trying to create a memory card of a huge file (about 100 GB) to save a B-Tree with billions of key-value pairs. The memory is small to store all the data in memory, so I'm trying to display a file from disk and instead of using malloc, I return and increase the pointer to the displayed area.

#define MEMORY_SIZE 300000000 unsigned char *mem_buffer; void *start_ptr; void *my_malloc(int size) { unsigned char *ptr = mem_buffer; mem_buffer += size; return ptr; } void *my_calloc(int size, int object_size) { unsigned char *ptr = mem_buffer; mem_buffer += (size * object_size); return ptr; } void init(const char *file_path) { int fd = open(file_path, O_RDWR, S_IREAD | S_IWRITE); if (fd < 0) { perror("Could not open file for memory mapping"); exit(1); } start_ptr = mmap(NULL, MEMORY_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); mem_buffer = (unsigned char *) start_ptr; if (mem_buffer == MAP_FAILED) { perror("Could not memory map file"); exit(1); } printf("Successfully mapped file.\n"); } void unmap() { if (munmap(start_ptr, MEMORY_SIZE) < 0) { perror("Could not unmap file"); exit(1); } printf("Successfully unmapped file.\n"); }

The main method:

 int main(int argc, char **argv) { init(argv[1]); unsigned char *arr = (unsigned char *) my_malloc(6); arr[0] = 'H'; arr[1] = 'E'; arr[2] = 'L'; arr[3] = 'L'; arr[4] = 'O'; arr[5] = '\0'; unsigned char *arr2 = (unsigned char *) my_malloc(5); arr2[0] = 'M'; arr2[1] = 'I'; arr2[2] = 'A'; arr2[3] = 'U'; arr2[4] = '\0'; printf("Memory mapped string1: %s\n", arr); printf("Memory mapped string2: %s\n", arr2); struct my_btree_node *root = NULL; insert(&root, arr, 10); insert(&root, arr2, 20); print_tree(root, 0, false); // cin.ignore(); unmap(); return EXIT_SUCCESS; }

The problem is that I get Cannot allocate memory ( errno is 12 ) if the requested size is larger than the actual memory or Segmentation fault if the requested space is outside the displayed area. I was told that files can be displayed larger than the actual memory.

Will the system manage the file on its own or am I responsible for comparing only the amount of free memory, and when accessing further space I have to cancel and match another offset.

thanks

EDIT

OS: Ubuntu 14.04 LTS x86_64

bin / washMachine: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libraries), for GNU / Linux 2.6.24, BuildID [sha1] = 9dc831c97ce41b0c6a77b639121584bf76deb47d, does not strip

+6

c mmap

aQuip Mar 23 '15 at 14:42

source share

2 answers

Not knowing which operating system you are running on, the best thing I have is that your operating system does not allow unlimited memory overruns or considers the MAP_PRIVATE mapping to RLIMIT_DATA ulimit. Both mean that your code will not work.

Basically you told mmap with MAP_PRIVATE "map this file, but any changes I make in the displayed area treat them as local memory allocations in this program." The trick with mapping files in such cases is that you allow the operating system to write pages to disk if you run out of memory. Since you told the operating system that it is not allowed to write information, it cannot do this.

The solution is to use MAP_SHARED , but make sure you understand the manual page for mmap and what MAP_SHARED does. Also, make sure that you either only display as much as the file size, or ftruncate so that the file is as large as you need it.

Also read the manual page for mmap regarding the length argument. Some operating systems allow the size not to be a multiple of the page size, but it is very uncontrollable, rounding the size to page size.

0

Art Mar 23 '15 at 15:31

source share

Ulfalizer · Accepted Answer · 2015-03-23T15:59:27+0000

First, make sure that you are running on a 64-bit processor in 64-bit mode. On a 32-bit CPU, the address space of your process is only 2 bytes (sup> 32) (four gigabytes), and there is no way to install 100 GB in all of this at once - there is simply not enough address. (In addition, most of this address space will already be used by other mappings or reserved by the kernel.)

Secondly, problems can arise, even if the mapping fits into the address space. The memory that is displayed in your process (this also includes, for example, your program code and data segments, as well as shared libraries) is divided into units of pages (usually 4 KB for each x86), where each page requires some metadata in core and MMU . This is another resource that can be exhausted when creating huge memory mappings.

As pointed out in Mmap () the entire large file , you can try using MAP_SHARED . This can allow the kernel to allocate memory for display lazily, because pages are being sent to them, because it knows that it can always change the page to a file on disk if there is not enough memory. With MAP_PRIVATE kernel should allocate a new page every time the page changes (since this change should not be performed), which would be unsafe to do lazily if the system runs out of memory and swaps.

You may also need to pass MAP_NORESERVE to mmap() when allocating more memory than physical memory, or set /proc/sys/vm/overcommit_memory (see proc(5) ) to 1 (which is a bit ugly because although).

On my system, which is similar to yours with 8 GB of RAM and 8 GB of swap, only MAP_SHARED enough for a 40 GB mmap() file. MAP_PRIVATE works with MAP_NORESERVE too.

If this does not work, you are likely to come across a MMU restriction. Many modern processor architectures support huge pages that are larger than the default page size. The point of huge pages is that you need fewer pages to match the same amount of memory (assuming a large match), which reduces the amount of metadata and can make address translation and context switches more efficient. The disadvantage of large pages is a decrease in granularity of the display and an increase in loss (internal fragmentation) when only a small part of the page is used.

MAP_SHARED and some random file with huge pages are unlikely to work, by the way (in case MAP_SHARED not enough to fix the problem). The file must be in hugetlbfs .

Passing MAP_HUGETLB to mmap() of query distributions using huge pages (although this might just be for anonymous comparisons, where it also seems that huge pages should be automatic on many systems nowadays). You may also need to cheat with /proc/sys/vm/nr_hugepages and /proc/sys/vm/nr_overcommit_hugepages - see this thread and the Documentation / vm / hugetlbpage.txt file in the kernel sources.

Beware of alignment issues when writing your own memory allocator. Hope this is not too sluggish, but see this answer .

As a side note, any memory you get from a memory mapped file must actually exist in the file. If the file is smaller than matching, and you want to have access to "extra" memory, you can first enlarge the file using ftruncate(2) . (This may not greatly increase the size of the disk if the file system supports sparse files with file openings.)

C - Memory Card - B-Tree

More articles: