PAGES:
The flags are broken up into three categories:
Which flag to use when??
- The physical pages are the basic unit of memory management for the Kernel.
- The MMU(memory management unit) manages the memory in terms of page sizes.
- Generally a 32 bit architecture has 4KB page size and a 64 bit architecture has 8KB page size.
- Kernel stores info about these pages(physical pages) in its structure struct page.
- This structure is defined in <linux/mm.h>.
struct page {
page_flags_t flags;
atomic_t _count;
atomic_t _mapcount;
unsigned long private;
struct address_space *mapping;
pgoff_t index;
struct list_head lru;
void *virtual;
};
- Some important fields are-
- The flags field stores the status of the page. Such flags include whether the page is dirty(it has been modified) or whether it is locked in memory. There are 32 different flags available. The flag values are defined in <linux/page-flags.h>.
- _count field means how many instance virtual pages are there for the given physical page. When the value of _count reaches zero that means noone is using the page at current.
- virtual this is the address of the page in virtual memory. For highmem(highmemory >896MB of virtual memory) this field is zero.
- The goal of this data structure is to describe the physical pages and not the data contained in that page.
ZONES:
- The kernel divides its 1GB virtual address space into three zones -ZONE_DMA(<16MB), ZONE_NORMAL(16MB-896MB) and ZONE_HIGHMEM(>896MB).
- Kernel groups pages with similar properties into separate zones.
- The zones have no physical relevance, it has just logical relevance.
- Each zone is represented by struct zone, which is defined in <linux/mmzone.h>:
- For more details on ZONES , read my other post on linux addressing.
GETTING PAGES:
- Kernel allows us with some interfaces to allocate and free memory within kernel space.
- All these interfaces allocate memory with page-sized granularity and are declared in <linux/gfp.h>.
- We can either allocate physical contiguous memory or only virtual contiguous memory.
- One should never attempt to allocate memory for userspace from the kernel - this is a huge violation of the kernel's abstraction layering.
- Instead have userspace mmap pages owned by your driver directly into its address space or have userspace ask how much space it needs. Userspace allocates, then grabs the memory from the kernel.
- There no way to allocate contiguous physical memory from userspace in linux.
- This is because a user space program has no way of controlling or even knowing if the underlying memory is contiguous or not.
- The core function is struct page * alloc_pages(unsigned int gfp_mask, unsigned int order)
- This allocates 2order (that is, 1 << order) contiguous physical pages and returns a pointer to the first page's page structure;on error it returns NULL.
- To convert a given page(physical) to its logical address we can use the function-void * page_address(struct page *page).
- This function returns a pointer to the logical address where our allocated physical pages resides.
- If we just need the virtual address of the pages( we don't need page structure) we can use the function -unsigned long __get_free_pages(unsigned int gfp_mask, unsigned int order).
- The pages thus obtained are contiguous in virtual space.
- This function also uses the core function alloc_pages, but it directly gives us the starting address of the first page.
- If we just need a single page(order 0 then we have two functions, one for physical and other for logical-
struct page * alloc_page(unsigned int gfp_mask)
unsigned long __get_free_page(unsigned int gfp_mask)
- If we need page filled with zero( for security issues we want to initialize memory with all zeros so that if we need to pass this memory to user space then the user space will get access to the contents written on this memory location previously) we can use this function unsigned long get_zeroed_page(unsigned int gfp_mask)
- This function works the same as __get_free_page(), except that the allocated page is then zero-filled
- To free the pages we have some functions-
void __free_pages(struct page *page, unsigned int order)
void free_pages(unsigned long addr, unsigned int order)
void free_page(unsigned long addr)
- Allocation of page/s may fail so we must define a handler to handle such situations.
kmalloc()
- The kmalloc() function's operation is very similar to that of user-space's familiar malloc() routine, with the exception of the addition of a flags parameter.
- This is used when we want to allocate a small chunk of memory in bytes size.
- For bigger sized memory, the previous page allocation functions is a good option.
- Mostly in Kernel we use Kmalloc() for memory allocation.
- The function is declared in <linux/slab.h>
- void * kmalloc(size_t size, int flags)
- The function returns a pointer to a region of memory that is at least size bytes in length.
- The region of memory allocated is physically contiguous.
- On error, it returns NULL.
- Kernel allocations almost always succeed, unless there is an insufficient amount of memory available.
- Still we must check for NULL after all calls to kmalloc() and handle the error appropriately.
eg. struct abc *ptr; ptr = kmalloc(sizeof(struct abc), GFP_KERNEL); if (!ptr) /* handle error ... */
The GFP_KERNEL flag specifies the behavior of the memory allocator while trying to obtain the memory to return to the caller of kmalloc().
gfp_mask Flags
In this section we will discuss about the flags that we used in kmalloc and other low level page functions.
The flags are broken up into three categories:
- action modifiers
- zone modifiers
- types.
- Action modifiers specify how the kernel is supposed to allocate the requested memory.
- In certain situations, only certain methods can be employed to allocate memory.
- For example, interrupt handlers must instruct the kernel not to sleep (because interrupt handlers cannot reschedule) in the course of allocating memory.
- Zone modifiers specify from where to allocate memory.
- As we saw in the article on linux addressing (http://learnlinuxconcepts.blogspot.in/2014/02/linux-addressing.html) the kernel divides physical memory into multiple zones, each of which serves a different purpose.
- Zone modifiers specify from which of these zones to allocate.
- Type flags specify a combination of action and zone modifiers as needed by a certain type of memory allocation.
- Type flags simplify specifying numerous modifiers; instead, we generally specify just one type flag.
- All the flags are declared in <linux/gfp.h>.
- The file <linux/slab.h> includes this header, however, so we don't often need not include it directly.
Action modifiers-
Flag
|
Description
|
---|---|
__GFP_WAIT
|
The allocator can sleep.
|
__GFP_HIGH
|
The allocator can access emergency pools.
|
__GFP_IO
|
The allocator can start disk I/O.
|
__GFP_FS
|
The allocator can start filesystem I/O.
|
__GFP_COLD
|
The allocator should use cache cold pages.
|
__GFP_NOWARN
|
The allocator will not print failure warnings.
|
__GFP_REPEAT
|
The allocator will repeat the allocation if it fails, but the
allocation can potentially fail.
|
__GFP_NOFAIL
|
The allocator will indefinitely repeat the allocation. The
allocation cannot fail.
|
__GFP_NORETRY
|
The allocator will never retry if the allocation
fails.
|
__GFP_NO_GROW
|
Used internally by the slab layer.
|
__GFP_COMP
|
Add compound page metadata. Used internally by the
hugetlb code.
|
- These allocations can be specified together. For example,ptr = kmalloc(size, __GFP_WAIT | __GFP_IO | __GFP_FS);
- Lets see how this allocation will work--
- It will instruct the page allocator (function finally comes to alloc_pages() as we had seen before) that the allocation can-
- block
- perform I/O
- perform filesystem operations, if needed.
- This allows the kernel great freedom in how it can find the free memory to satisfy the allocation.
Zone Modifier-
- Zone modifiers specify from which memory zone the allocation should originate.
- Normally, allocations can be fulfilled from any zone.
- The kernel prefers ZONE_NORMAL, however, to ensure that the other zones have free pages when they are needed.
- There are only two zone modifiers because there are only two zones other than ZONE_NORMAL (which is where, by default, allocations originate).
Flag
|
Description
|
---|---|
__GFP_DMA
|
Allocate only from ZONE_DMA
|
__GFP_HIGHMEM
|
Allocate from ZONE_HIGHMEM or
ZONE_NORMAL
|
- If none of the flags are specified, the kernel fulfills the allocation from either ZONE_DMA or ZONE_NORMAL, with a strong preference to satisfy the allocation from ZONE_NORMAL.
- We cannot specify __GFP_HIGHMEM to either __get_free_pages() or kmalloc() because these both return a logical address, and not a page structure.
- Though it is possible that these functions would allocate memory that is not currently mapped in the kernel's virtual address space and, thus, does not have a logical address.
- Only alloc_pages() can allocate high memory.
Type Flags-
- The type flags specify the required action and zone modifiers to fulfill a particular type of transaction.
- Therefore, there is a good news that kernel code tends to use the correct type flag and not specify the various number of flags it would want to define.
GFP_ATOMIC
|
|
GFP_NOIO
|
|
GFP_NOFS
|
|
GFP_KERNEL
|
|
GFP_USER
|
|
GFP_HIGHUSER
|
|
GFP_DMA
|
What all action modifier files are internally involved in Type Flags ?
GFP_ATOMIC
|
__GFP_HIGH
|
GFP_NOIO
|
__GFP_WAIT
|
GFP_NOFS
|
(__GFP_WAIT | __GFP_IO)
|
GFP_KERNEL
|
(__GFP_WAIT | __GFP_IO | __GFP_FS)
|
GFP_USER
|
(__GFP_WAIT | __GFP_IO | __GFP_FS)
|
GFP_HIGHUSER
|
(__GFP_WAIT | __GFP_IO | __GFP_FS |
__GFP_HIGHMEM)
|
GFP_DMA
|
__GFP_DMA
|
Lets try to understand important Type flags.
GFP_KERNEL flag-
- The vast majority of allocations in the kernel use the GFP_KERNEL flag.
- The resulting allocation can sleep as it is normal priority allocation.
- Because the call can block, this flag can be used only from process context that can safely reschedule (that is, no locks are held and so on).
- Because this flag does not make any stipulations as to how the kernel may obtain the requested memory, the memory allocation has a high probability of succeeding.
GFP_ATOMIC flag-
- The GFP_ATOMIC flag is at the extreme end as compared to GFP_KERNEL flag.
- This flag specifies a memory allocation that cannot sleep, the allocation is very restrictive in the memory it can obtain for the caller.
- If no sufficiently sized contiguous chunk of memory is available, the kernel is not very likely to free memory because it cannot put the caller to sleep.
- Conversely, the GFP_KERNEL allocation can put the caller to sleep to swap inactive pages to disk, flush dirty pages to disk, and so on.
- Because GFP_ATOMIC is unable to perform any of these actions, it has less of a chance of succeeding (at least when memory is low) compared to GFP_KERNEL allocations
- Still the GFP_ATOMIC flag is the only option when the current code is unable to sleep, such as with interrupt handlers, softirqs, and tasklets.
GFP_NOIO and GFP_NOFS flags-
- In between these two flags are GFP_NOIO and GFP_NOFS.
- Allocations initiated with these flags might block, but they refrain from performing certain other operations.
- A GFP_NOIO allocation does not initiate any disk I/O whatsoever to fulfill the request
- On the other hand, GFP_NOFS might initiate disk I/O, but does not initiate filesystem I/O.
- One question that immediately comes to our mind. Why might you need these flags?
- They are needed for certain low-level block I/O or filesystem code, respectively
- Imagine if a common path in the filesystem code allocated memory without the GFP_NOFS flag. The allocation could result in more filesystem operations, which would then beget other allocations and, thus, more filesystem operations! This could continue indefinitely.
- Code such as this that invokes the allocator must ensure that the allocator also does not execute it, or else the allocation can create a deadlock.
- Not surprisingly, the kernel uses these two flags only in few places.
GFP_DMA flag-
- The GFP_DMA flag is used to specify that the allocator must satisfy the request from ZONE_DMA.
- This flag is used by device drivers, which need DMA-able memory for their devices. Normally, we combine this flag with the GFP_ATOMIC or GFP_KERNEL flag
Situation
|
Solution
|
---|---|
Process context, can sleep
|
Use GFP_KERNEL
|
Process context, cannot sleep
|
Use GFP_ATOMIC, or perform your allocations with
GFP_KERNEL at an earlier or later point when you can
sleep
|
Interrupt handler
|
Use GFP_ATOMIC
|
Softirq
|
Use GFP_ATOMIC
|
Tasklet
|
Use GFP_ATOMIC
|
Need DMA-able memory, can sleep
|
Use (GFP_DMA | GFP_KERNEL)
|
Need DMA-able memory, cannot sleep
|
kfree()
- kfree undoes the work done by kmalloc().
- This function is declared in <linux/slab.h>.
- void kfree(const void *ptr).
- use it only for those blocks of memory that was previously allocated using kmalloc().
- eg. char *buf;
buffer = kmalloc(BUF_SIZE, GFP_ATOMIC); if (!buffer) /* error allocting memory ! */
kfree(buffer);
vmalloc()
- This Kernel function is similar to user space function malloc().
- Both vmalloc() and malloc() returns virtually contiguous memory but not necessarily physically contiguous.
- In kernel we normally use kmalloc() and seldom use vmalloc().
- vmalloc is used when the requested memory size is quite big as it may not be possible to allocate a large block of contiguous memory via kmalloc() and it may fail.
The vmalloc() function is declared in <linux/vmalloc.h> and defined in mm/vmalloc.c.
Usage is identical to user-space's malloc(): void * vmalloc(unsigned long size).
Usage of vmalloc() also affects the system performance.
- To free an allocation obtained via vmalloc(), we use
void vfree(void *addr).
Good work.... keep updating :)
ReplyDeletevery good link.. keep up the good work :)
ReplyDeletethanks, keep visiting the blog so that I stay encouraged to write more :D
ReplyDeletevery nice posts..keep doing good job mate!!!!!!!!
ReplyDelete64 bit architectures have 8KB page size - not true for x86-64
ReplyDeleteNice. It Helps me a lot to understand with this simple description.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeletethanx man
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThanks for sharing useful information. I learned something new from your bog. Its very interesting and informative. keep updating. If you are looking for any Big Data related information, please visit our website Big Data training in Bangalore.
ReplyDeleteExcellent
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteTechnobrilliant Learning Solution Bringing to you Job oriented course.
ReplyDeleteAre you looking to upgrade your IT skills and get ahead in your career? Then look no further than Job Oriented IT Courses! Our courses provide you with the tools and knowledge needed to succeed. Learn Full Stack Development, Salesforce, Java, Web Designing, Software Testing and more - all with 100% placement assistance! Take the first step into a brighter future today with Job Oriented IT Courses.
Batch : Online/Offline Available
https://technobrilliant.com/