In the Linux kernel, the following vulnerability has been resolved:
s390/fpu: Re-add exception handling in load_fpu_state()
With the recent rewrite of the fpu code exception handling for the
lfpc instruction within load_fpu_state() was erroneously removed.
Add it again to prevent that loading invalid floating point register
values cause an unhandled specification exception.
In the Linux kernel, the following vulnerability has been resolved:
drm/vmwgfx: Fix a deadlock in dma buf fence polling
Introduce a version of the fence ops that on release doesn't remove
the fence from the pending list, and thus doesn't require a lock to
fix poll->fence wait->fence unref deadlocks.
vmwgfx overwrites the wait callback to iterate over the list of all
fences and update their status, to do that it holds a lock to prevent
the list modifcations from other threads. The fence destroy callback
both deletes the fence and removes it from the list of pending
fences, for which it holds a lock.
dma buf polling cb unrefs a fence after it's been signaled: so the poll
calls the wait, which signals the fences, which are being destroyed.
The destruction tries to acquire the lock on the pending fences list
which it can never get because it's held by the wait from which it
was called.
Old bug, but not a lot of userspace apps were using dma-buf polling
interfaces. Fix those, in particular this fixes KDE stalls/deadlock.
In the Linux kernel, the following vulnerability has been resolved:
net: usb: qmi_wwan: fix memory leak for not ip packets
Free the unused skb when not ip packets arrive.
Access control for plugin data sources protected by the ReqActions json field of the plugin.json is bypassed if the user or service account is granted associated access to any other data source, as the ReqActions check was not scoped to each specific datasource. The account must have prior query access to the impacted datasource.
Module savepoints could be abused to inject references to malicious code delivered through the same domain. Attackers could perform malicious API requests or extract information from the users account. Exploiting this vulnerability requires temporary access to an account or successful social engineering to make a user follow a prepared link to a malicious account. Please deploy the provided updates and patch releases. The savepoint module path has been restricted to modules that provide the feature, excluding any arbitrary or non-existing modules. No publicly available exploits are known.
In the Linux kernel, the following vulnerability has been resolved:
remoteproc: imx_rproc: Skip over memory region when node value is NULL
In imx_rproc_addr_init() "nph = of_count_phandle_with_args()" just counts
number of phandles. But phandles may be empty. So of_parse_phandle() in
the parsing loop (0 < a < nph) may return NULL which is later dereferenced.
Adjust this issue by adding NULL-return check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
[Fixed title to fit within the prescribed 70-75 charcters]
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to truncate preallocated blocks in f2fs_file_open()
chenyuwen reports a f2fs bug as below:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000011
fscrypt_set_bio_crypt_ctx+0x78/0x1e8
f2fs_grab_read_bio+0x78/0x208
f2fs_submit_page_read+0x44/0x154
f2fs_get_read_data_page+0x288/0x5f4
f2fs_get_lock_data_page+0x60/0x190
truncate_partial_data_page+0x108/0x4fc
f2fs_do_truncate_blocks+0x344/0x5f0
f2fs_truncate_blocks+0x6c/0x134
f2fs_truncate+0xd8/0x200
f2fs_iget+0x20c/0x5ac
do_garbage_collect+0x5d0/0xf6c
f2fs_gc+0x22c/0x6a4
f2fs_disable_checkpoint+0xc8/0x310
f2fs_fill_super+0x14bc/0x1764
mount_bdev+0x1b4/0x21c
f2fs_mount+0x20/0x30
legacy_get_tree+0x50/0xbc
vfs_get_tree+0x5c/0x1b0
do_new_mount+0x298/0x4cc
path_mount+0x33c/0x5fc
__arm64_sys_mount+0xcc/0x15c
invoke_syscall+0x60/0x150
el0_svc_common+0xb8/0xf8
do_el0_svc+0x28/0xa0
el0_svc+0x24/0x84
el0t_64_sync_handler+0x88/0xec
It is because inode.i_crypt_info is not initialized during below path:
- mount
- f2fs_fill_super
- f2fs_disable_checkpoint
- f2fs_gc
- f2fs_iget
- f2fs_truncate
So, let's relocate truncation of preallocated blocks to f2fs_file_open(),
after fscrypt_file_open().
In the Linux kernel, the following vulnerability has been resolved:
dma: fix call order in dmam_free_coherent
dmam_free_coherent() frees a DMA allocation, which makes the
freed vaddr available for reuse, then calls devres_destroy()
to remove and free the data structure used to track the DMA
allocation. Between the two calls, it is possible for a
concurrent task to make an allocation with the same vaddr
and add it to the devres list.
If this happens, there will be two entries in the devres list
with the same vaddr and devres_destroy() can free the wrong
entry, triggering the WARN_ON() in dmam_match.
Fix by destroying the devres entry before freeing the DMA
allocation.
kokonut //net/encryption
http://sponge2/b9145fe6-0f72-4325-ac2f-a84d81075b03
In the Linux kernel, the following vulnerability has been resolved:
md: fix deadlock between mddev_suspend and flush bio
Deadlock occurs when mddev is being suspended while some flush bio is in
progress. It is a complex issue.
T1. the first flush is at the ending stage, it clears 'mddev->flush_bio'
and tries to submit data, but is blocked because mddev is suspended
by T4.
T2. the second flush sets 'mddev->flush_bio', and attempts to queue
md_submit_flush_data(), which is already running (T1) and won't
execute again if on the same CPU as T1.
T3. the third flush inc active_io and tries to flush, but is blocked because
'mddev->flush_bio' is not NULL (set by T2).
T4. mddev_suspend() is called and waits for active_io dec to 0 which is inc
by T3.
T1 T2 T3 T4
(flush 1) (flush 2) (third 3) (suspend)
md_submit_flush_data
mddev->flush_bio = NULL;
.
. md_flush_request
. mddev->flush_bio = bio
. queue submit_flushes
. .
. . md_handle_request
. . active_io + 1
. . md_flush_request
. . wait !mddev->flush_bio
. .
. . mddev_suspend
. . wait !active_io
. .
. submit_flushes
. queue_work md_submit_flush_data
. //md_submit_flush_data is already running (T1)
.
md_handle_request
wait resume
The root issue is non-atomic inc/dec of active_io during flush process.
active_io is dec before md_submit_flush_data is queued, and inc soon
after md_submit_flush_data() run.
md_flush_request
active_io + 1
submit_flushes
active_io - 1
md_submit_flush_data
md_handle_request
active_io + 1
make_request
active_io - 1
If active_io is dec after md_handle_request() instead of within
submit_flushes(), make_request() can be called directly intead of
md_handle_request() in md_submit_flush_data(), and active_io will
only inc and dec once in the whole flush process. Deadlock will be
fixed.
Additionally, the only difference between fixing the issue and before is
that there is no return error handling of make_request(). But after
previous patch cleaned md_write_start(), make_requst() only return error
in raid5_make_request() by dm-raid, see commit 41425f96d7aa ("dm-raid456,
md/raid456: fix a deadlock for dm-raid456 while io concurrent with
reshape)". Since dm always splits data and flush operation into two
separate io, io size of flush submitted by dm always is 0, make_request()
will not be called in md_submit_flush_data(). To prevent future
modifications from introducing issues, add WARN_ON to ensure
make_request() no error is returned in this context.
In the Linux kernel, the following vulnerability has been resolved:
block: initialize integrity buffer to zero before writing it to media
Metadata added by bio_integrity_prep is using plain kmalloc, which leads
to random kernel memory being written media. For PI metadata this is
limited to the app tag that isn't used by kernel generated metadata,
but for non-PI metadata the entire buffer leaks kernel memory.
Fix this by adding the __GFP_ZERO flag to allocations for writes.
In the Linux kernel, the following vulnerability has been resolved:
cgroup/cpuset: Prevent UAF in proc_cpuset_show()
An UAF can happen when /proc/cpuset is read as reported in [1].
This can be reproduced by the following methods:
1.add an mdelay(1000) before acquiring the cgroup_lock In the
cgroup_path_ns function.
2.$cat /proc/<pid>/cpuset repeatly.
3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/
$umount /sys/fs/cgroup/cpuset/ repeatly.
The race that cause this bug can be shown as below:
(umount) | (cat /proc/<pid>/cpuset)
css_release | proc_cpuset_show
css_release_work_fn | css = task_get_css(tsk, cpuset_cgrp_id);
css_free_rwork_fn | cgroup_path_ns(css->cgroup, ...);
cgroup_destroy_root | mutex_lock(&cgroup_mutex);
rebind_subsystems |
cgroup_free_root |
| // cgrp was freed, UAF
| cgroup_path_ns_locked(cgrp,..);
When the cpuset is initialized, the root node top_cpuset.css.cgrp
will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will
allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated
&cgroup_root.cgrp. When the umount operation is executed,
top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp.
The problem is that when rebinding to cgrp_dfl_root, there are cases
where the cgroup_root allocated by setting up the root for cgroup v1
is cached. This could lead to a Use-After-Free (UAF) if it is
subsequently freed. The descendant cgroups of cgroup v1 can only be
freed after the css is released. However, the css of the root will never
be released, yet the cgroup_root should be freed when it is unmounted.
This means that obtaining a reference to the css of the root does
not guarantee that css.cgrp->root will not be freed.
Fix this problem by using rcu_read_lock in proc_cpuset_show().
As cgroup_root is kfree_rcu after commit d23b5c577715
("cgroup: Make operations on the cgroup root_list RCU safe"),
css->cgroup won't be freed during the critical section.
To call cgroup_path_ns_locked, css_set_lock is needed, so it is safe to
replace task_get_css with task_css.
[1] https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd
In the Linux kernel, the following vulnerability has been resolved:
soc: xilinx: rename cpu_number1 to dummy_cpu_number
The per cpu variable cpu_number1 is passed to xlnx_event_handler as
argument "dev_id", but it is not used in this function. So drop the
initialization of this variable and rename it to dummy_cpu_number.
This patch is to fix the following call trace when the kernel option
CONFIG_DEBUG_ATOMIC_SLEEP is enabled:
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0 #53
Hardware name: Xilinx Versal vmk180 Eval board rev1.1 (QSPI) (DT)
Call trace:
dump_backtrace+0xd0/0xe0
show_stack+0x18/0x40
dump_stack_lvl+0x7c/0xa0
dump_stack+0x18/0x34
__might_resched+0x10c/0x140
__might_sleep+0x4c/0xa0
__kmem_cache_alloc_node+0xf4/0x168
kmalloc_trace+0x28/0x38
__request_percpu_irq+0x74/0x138
xlnx_event_manager_probe+0xf8/0x298
platform_probe+0x68/0xd8
In the Linux kernel, the following vulnerability has been resolved:
soc: qcom: pdr: protect locator_addr with the main mutex
If the service locator server is restarted fast enough, the PDR can
rewrite locator_addr fields concurrently. Protect them by placing
modification of those fields under the main pdr->lock.
In the Linux kernel, the following vulnerability has been resolved:
lib: objagg: Fix general protection fault
The library supports aggregation of objects into other objects only if
the parent object does not have a parent itself. That is, nesting is not
supported.
Aggregation happens in two cases: Without and with hints, where hints
are a pre-computed recommendation on how to aggregate the provided
objects.
Nesting is not possible in the first case due to a check that prevents
it, but in the second case there is no check because the assumption is
that nesting cannot happen when creating objects based on hints. The
violation of this assumption leads to various warnings and eventually to
a general protection fault [1].
Before fixing the root cause, error out when nesting happens and warn.
[1]
general protection fault, probably for non-canonical address 0xdead000000000d90: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1083 Comm: kworker/1:9 Tainted: G W 6.9.0-rc6-custom-gd9b4f1cca7fb #7
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_sp_acl_erp_bf_insert+0x25/0x80
[...]
Call Trace:
<TASK>
mlxsw_sp_acl_atcam_entry_add+0x256/0x3c0
mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
process_one_work+0x151/0x370
worker_thread+0x2cb/0x3e0
kthread+0xd0/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
In the Linux kernel, the following vulnerability has been resolved:
bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG
When BPF_TRAMP_F_CALL_ORIG is set, the trampoline calls
__bpf_tramp_enter() and __bpf_tramp_exit() functions, passing them
the struct bpf_tramp_image *im pointer as an argument in R0.
The trampoline generation code uses emit_addr_mov_i64() to emit
instructions for moving the bpf_tramp_image address into R0, but
emit_addr_mov_i64() assumes the address to be in the vmalloc() space
and uses only 48 bits. Because bpf_tramp_image is allocated using
kzalloc(), its address can use more than 48-bits, in this case the
trampoline will pass an invalid address to __bpf_tramp_enter/exit()
causing a kernel crash.
Fix this by using emit_a64_mov_i64() in place of emit_addr_mov_i64()
as it can work with addresses that are greater than 48-bits.
In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
When loading a EXT program without specifying `attr->attach_prog_fd`,
the `prog->aux->dst_prog` will be null. At this time, calling
resolve_prog_type() anywhere will result in a null pointer dereference.
Example stack trace:
[ 8.107863] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
[ 8.108262] Mem abort info:
[ 8.108384] ESR = 0x0000000096000004
[ 8.108547] EC = 0x25: DABT (current EL), IL = 32 bits
[ 8.108722] SET = 0, FnV = 0
[ 8.108827] EA = 0, S1PTW = 0
[ 8.108939] FSC = 0x04: level 0 translation fault
[ 8.109102] Data abort info:
[ 8.109203] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 8.109399] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 8.109614] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 8.109836] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101354000
[ 8.110011] [0000000000000004] pgd=0000000000000000, p4d=0000000000000000
[ 8.112624] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 8.112783] Modules linked in:
[ 8.113120] CPU: 0 PID: 99 Comm: may_access_dire Not tainted 6.10.0-rc3-next-20240613-dirty #1
[ 8.113230] Hardware name: linux,dummy-virt (DT)
[ 8.113390] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 8.113429] pc : may_access_direct_pkt_data+0x24/0xa0
[ 8.113746] lr : add_subprog_and_kfunc+0x634/0x8e8
[ 8.113798] sp : ffff80008283b9f0
[ 8.113813] x29: ffff80008283b9f0 x28: ffff800082795048 x27: 0000000000000001
[ 8.113881] x26: ffff0000c0bb2600 x25: 0000000000000000 x24: 0000000000000000
[ 8.113897] x23: ffff0000c1134000 x22: 000000000001864f x21: ffff0000c1138000
[ 8.113912] x20: 0000000000000001 x19: ffff0000c12b8000 x18: ffffffffffffffff
[ 8.113929] x17: 0000000000000000 x16: 0000000000000000 x15: 0720072007200720
[ 8.113944] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
[ 8.113958] x11: 0720072007200720 x10: 0000000000f9fca4 x9 : ffff80008021f4e4
[ 8.113991] x8 : 0101010101010101 x7 : 746f72705f6d656d x6 : 000000001e0e0f5f
[ 8.114006] x5 : 000000000001864f x4 : ffff0000c12b8000 x3 : 000000000000001c
[ 8.114020] x2 : 0000000000000002 x1 : 0000000000000000 x0 : 0000000000000000
[ 8.114126] Call trace:
[ 8.114159] may_access_direct_pkt_data+0x24/0xa0
[ 8.114202] bpf_check+0x3bc/0x28c0
[ 8.114214] bpf_prog_load+0x658/0xa58
[ 8.114227] __sys_bpf+0xc50/0x2250
[ 8.114240] __arm64_sys_bpf+0x28/0x40
[ 8.114254] invoke_syscall.constprop.0+0x54/0xf0
[ 8.114273] do_el0_svc+0x4c/0xd8
[ 8.114289] el0_svc+0x3c/0x140
[ 8.114305] el0t_64_sync_handler+0x134/0x150
[ 8.114331] el0t_64_sync+0x168/0x170
[ 8.114477] Code: 7100707f 54000081 f9401c00 f9403800 (b9400403)
[ 8.118672] ---[ end trace 0000000000000000 ]---
One way to fix it is by forcing `attach_prog_fd` non-empty when
bpf_prog_load(). But this will lead to `libbpf_probe_bpf_prog_type`
API broken which use verifier log to probe prog type and will log
nothing if we reject invalid EXT prog before bpf_check().
Another way is by adding null check in resolve_prog_type().
The issue was introduced by commit 4a9c7bbe2ed4 ("bpf: Resolve to
prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT") which wanted
to correct type resolution for BPF_PROG_TYPE_TRACING programs. Before
that, the type resolution of BPF_PROG_TYPE_EXT prog actually follows
the logic below:
prog->aux->dst_prog ? prog->aux->dst_prog->type : prog->type;
It implies that when EXT program is not yet attached to `dst_prog`,
the prog type should be EXT itself. This code worked fine in the past.
So just keep using it.
Fix this by returning `prog->type` for BPF_PROG_TYPE_EXT if `dst_prog`
is not present in resolve_prog_type().
In the Linux kernel, the following vulnerability has been resolved:
virtio_net: Fix napi_skb_cache_put warning
After the commit bdacf3e34945 ("net: Use nested-BH locking for
napi_alloc_cache.") was merged, the following warning began to appear:
WARNING: CPU: 5 PID: 1 at net/core/skbuff.c:1451 napi_skb_cache_put+0x82/0x4b0
__warn+0x12f/0x340
napi_skb_cache_put+0x82/0x4b0
napi_skb_cache_put+0x82/0x4b0
report_bug+0x165/0x370
handle_bug+0x3d/0x80
exc_invalid_op+0x1a/0x50
asm_exc_invalid_op+0x1a/0x20
__free_old_xmit+0x1c8/0x510
napi_skb_cache_put+0x82/0x4b0
__free_old_xmit+0x1c8/0x510
__free_old_xmit+0x1c8/0x510
__pfx___free_old_xmit+0x10/0x10
The issue arises because virtio is assuming it's running in NAPI context
even when it's not, such as in the netpoll case.
To resolve this, modify virtnet_poll_tx() to only set NAPI when budget
is available. Same for virtnet_poll_cleantx(), which always assumed that
it was in a NAPI context.
In the Linux kernel, the following vulnerability has been resolved:
xdp: fix invalid wait context of page_pool_destroy()
If the driver uses a page pool, it creates a page pool with
page_pool_create().
The reference count of page pool is 1 as default.
A page pool will be destroyed only when a reference count reaches 0.
page_pool_destroy() is used to destroy page pool, it decreases a
reference count.
When a page pool is destroyed, ->disconnect() is called, which is
mem_allocator_disconnect().
This function internally acquires mutex_lock().
If the driver uses XDP, it registers a memory model with
xdp_rxq_info_reg_mem_model().
The xdp_rxq_info_reg_mem_model() internally increases a page pool
reference count if a memory model is a page pool.
Now the reference count is 2.
To destroy a page pool, the driver should call both page_pool_destroy()
and xdp_unreg_mem_model().
The xdp_unreg_mem_model() internally calls page_pool_destroy().
Only page_pool_destroy() decreases a reference count.
If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), we
will face an invalid wait context warning.
Because xdp_unreg_mem_model() calls page_pool_destroy() with
rcu_read_lock().
The page_pool_destroy() internally acquires mutex_lock().
Splat looks like:
=============================
[ BUG: Invalid wait context ]
6.10.0-rc6+ #4 Tainted: G W
-----------------------------
ethtool/1806 is trying to lock:
ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150
other info that might help us debug this:
context-{5:5}
3 locks held by ethtool/1806:
stack backtrace:
CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
__lock_acquire+0x1681/0x4de0
? _printk+0x64/0xe0
? __pfx_mark_lock.part.0+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
lock_acquire+0x1b3/0x580
? mem_allocator_disconnect+0x73/0x150
? __wake_up_klogd.part.0+0x16/0xc0
? __pfx_lock_acquire+0x10/0x10
? dump_stack_lvl+0x91/0xc0
__mutex_lock+0x15c/0x1690
? mem_allocator_disconnect+0x73/0x150
? __pfx_prb_read_valid+0x10/0x10
? mem_allocator_disconnect+0x73/0x150
? __pfx_llist_add_batch+0x10/0x10
? console_unlock+0x193/0x1b0
? lockdep_hardirqs_on+0xbe/0x140
? __pfx___mutex_lock+0x10/0x10
? tick_nohz_tick_stopped+0x16/0x90
? __irq_work_queue_local+0x1e5/0x330
? irq_work_queue+0x39/0x50
? __wake_up_klogd.part.0+0x79/0xc0
? mem_allocator_disconnect+0x73/0x150
mem_allocator_disconnect+0x73/0x150
? __pfx_mem_allocator_disconnect+0x10/0x10
? mark_held_locks+0xa5/0xf0
? rcu_is_watching+0x11/0xb0
page_pool_release+0x36e/0x6d0
page_pool_destroy+0xd7/0x440
xdp_unreg_mem_model+0x1a7/0x2a0
? __pfx_xdp_unreg_mem_model+0x10/0x10
? kfree+0x125/0x370
? bnxt_free_ring.isra.0+0x2eb/0x500
? bnxt_free_mem+0x5ac/0x2500
xdp_rxq_info_unreg+0x4a/0xd0
bnxt_free_mem+0x1356/0x2500
bnxt_close_nic+0xf0/0x3b0
? __pfx_bnxt_close_nic+0x10/0x10
? ethnl_parse_bit+0x2c6/0x6d0
? __pfx___nla_validate_parse+0x10/0x10
? __pfx_ethnl_parse_bit+0x10/0x10
bnxt_set_features+0x2a8/0x3e0
__netdev_update_features+0x4dc/0x1370
? ethnl_parse_bitset+0x4ff/0x750
? __pfx_ethnl_parse_bitset+0x10/0x10
? __pfx___netdev_update_features+0x10/0x10
? mark_held_locks+0xa5/0xf0
? _raw_spin_unlock_irqrestore+0x42/0x70
? __pm_runtime_resume+0x7d/0x110
ethnl_set_features+0x32d/0xa20
To fix this problem, it uses rhashtable_lookup_fast() instead of
rhashtable_lookup() with rcu_read_lock().
Using xa without rcu_read_lock() here is safe.
xa is freed by __xdp_mem_allocator_rcu_free() and this is called by
call_rcu() of mem_xa_remove().
The mem_xa_remove() is called by page_pool_destroy() if a reference
count reaches 0.
The xa is already protected by the reference count mechanism well in the
control plane.
So removing rcu_read_lock() for page_pool_destroy() is safe.
In the Linux kernel, the following vulnerability has been resolved:
media: v4l: async: Fix NULL pointer dereference in adding ancillary links
In v4l2_async_create_ancillary_links(), ancillary links are created for
lens and flash sub-devices. These are sub-device to sub-device links and
if the async notifier is related to a V4L2 device, the source sub-device
of the ancillary link is NULL, leading to a NULL pointer dereference.
Check the notifier's sd field is non-NULL in
v4l2_async_create_ancillary_links().
[Sakari Ailus: Reword the subject and commit messages slightly.]
In the Linux kernel, the following vulnerability has been resolved:
s390/uv: Don't call folio_wait_writeback() without a folio reference
folio_wait_writeback() requires that no spinlocks are held and that
a folio reference is held, as documented. After we dropped the PTL, the
folio could get freed concurrently. So grab a temporary reference.
In the Linux kernel, the following vulnerability has been resolved:
media: mediatek: vcodec: Handle invalid decoder vsi
Handle an invalid decoder vsi in vpu_dec_init to ensure the decoder vsi
is valid for future use.
In the Linux kernel, the following vulnerability has been resolved:
drm/qxl: Add check for drm_cvt_mode
Add check for the return value of drm_cvt_mode() and return the error if
it fails in order to avoid NULL pointer dereference.
In the Linux kernel, the following vulnerability has been resolved:
ext4: fix infinite loop when replaying fast_commit
When doing fast_commit replay an infinite loop may occur due to an
uninitialized extent_status struct. ext4_ext_determine_insert_hole() does
not detect the replay and calls ext4_es_find_extent_range(), which will
return immediately without initializing the 'es' variable.
Because 'es' contains garbage, an integer overflow may happen causing an
infinite loop in this function, easily reproducible using fstest generic/039.
This commit fixes this issue by unconditionally initializing the structure
in function ext4_es_find_extent_range().
Thanks to Zhang Yi, for figuring out the real problem!
In the Linux kernel, the following vulnerability has been resolved:
PCI: keystone: Fix NULL pointer dereference in case of DT error in ks_pcie_setup_rc_app_regs()
If IORESOURCE_MEM is not provided in Device Tree due to
any error, resource_list_first_type() will return NULL and
pci_parse_request_of_pci_ranges() will just emit a warning.
This will cause a NULL pointer dereference. Fix this bug by adding NULL
return check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
In the Linux kernel, the following vulnerability has been resolved:
ASoC: amd: Adjust error handling in case of absent codec device
acpi_get_first_physical_node() can return NULL in several cases (no such
device, ACPI table error, reference count drop to 0, etc).
Existing check just emit error message, but doesn't perform return.
Then this NULL pointer is passed to devm_acpi_dev_add_driver_gpios()
where it is dereferenced.
Adjust this error handling by adding error code return.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
In the Linux kernel, the following vulnerability has been resolved:
ipvs: properly dereference pe in ip_vs_add_service
Use pe directly to resolve sparse warning:
net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression
In the Linux kernel, the following vulnerability has been resolved:
s390/dasd: fix error checks in dasd_copy_pair_store()
dasd_add_busid() can return an error via ERR_PTR() if an allocation
fails. However, two callsites in dasd_copy_pair_store() do not check
the result, potentially resulting in a NULL pointer dereference. Fix
this by checking the result with IS_ERR() and returning the error up
the stack.
In the Linux kernel, the following vulnerability has been resolved:
mailbox: mtk-cmdq: Move devm_mbox_controller_register() after devm_pm_runtime_enable()
When mtk-cmdq unbinds, a WARN_ON message with condition
pm_runtime_get_sync() < 0 occurs.
According to the call tracei below:
cmdq_mbox_shutdown
mbox_free_channel
mbox_controller_unregister
__devm_mbox_controller_unregister
...
The root cause can be deduced to be calling pm_runtime_get_sync() after
calling pm_runtime_disable() as observed below:
1. CMDQ driver uses devm_mbox_controller_register() in cmdq_probe()
to bind the cmdq device to the mbox_controller, so
devm_mbox_controller_unregister() will automatically unregister
the device bound to the mailbox controller when the device-managed
resource is removed. That means devm_mbox_controller_unregister()
and cmdq_mbox_shoutdown() will be called after cmdq_remove().
2. CMDQ driver also uses devm_pm_runtime_enable() in cmdq_probe() after
devm_mbox_controller_register(), so that devm_pm_runtime_disable()
will be called after cmdq_remove(), but before
devm_mbox_controller_unregister().
To fix this problem, cmdq_probe() needs to move
devm_mbox_controller_register() after devm_pm_runtime_enable() to make
devm_pm_runtime_disable() be called after
devm_mbox_controller_unregister().
In the Linux kernel, the following vulnerability has been resolved:
landlock: Don't lose track of restrictions on cred_transfer
When a process' cred struct is replaced, this _almost_ always invokes
the cred_prepare LSM hook; but in one special case (when
KEYCTL_SESSION_TO_PARENT updates the parent's credentials), the
cred_transfer LSM hook is used instead. Landlock only implements the
cred_prepare hook, not cred_transfer, so KEYCTL_SESSION_TO_PARENT causes
all information on Landlock restrictions to be lost.
This basically means that a process with the ability to use the fork()
and keyctl() syscalls can get rid of all Landlock restrictions on
itself.
Fix it by adding a cred_transfer hook that does the same thing as the
existing cred_prepare hook. (Implemented by having hook_cred_prepare()
call hook_cred_transfer() so that the two functions are less likely to
accidentally diverge in the future.)
In the Linux kernel, the following vulnerability has been resolved:
mm/mglru: fix div-by-zero in vmpressure_calc_level()
evict_folios() uses a second pass to reclaim folios that have gone through
page writeback and become clean before it finishes the first pass, since
folio_rotate_reclaimable() cannot handle those folios due to the
isolation.
The second pass tries to avoid potential double counting by deducting
scan_control->nr_scanned. However, this can result in underflow of
nr_scanned, under a condition where shrink_folio_list() does not increment
nr_scanned, i.e., when folio_trylock() fails.
The underflow can cause the divisor, i.e., scale=scanned+reclaimed in
vmpressure_calc_level(), to become zero, resulting in the following crash:
[exception RIP: vmpressure_work_fn+101]
process_one_work at ffffffffa3313f2b
Since scan_control->nr_scanned has no established semantics, the potential
double counting has minimal risks. Therefore, fix the problem by not
deducting scan_control->nr_scanned in evict_folios().
In the Linux kernel, the following vulnerability has been resolved:
exfat: fix potential deadlock on __exfat_get_dentry_set
When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array
is allocated in __exfat_get_entry_set. The problem is that the bh-array is
allocated with GFP_KERNEL. It does not make sense. In the following cases,
a deadlock for sbi->s_lock between the two processes may occur.
CPU0 CPU1
---- ----
kswapd
balance_pgdat
lock(fs_reclaim)
exfat_iterate
lock(&sbi->s_lock)
exfat_readdir
exfat_get_uniname_from_ext_entry
exfat_get_dentry_set
__exfat_get_dentry_set
kmalloc_array
...
lock(fs_reclaim)
...
evict
exfat_evict_inode
lock(&sbi->s_lock)
To fix this, let's allocate bh-array with GFP_NOFS.
In the Linux kernel, the following vulnerability has been resolved:
sysctl: always initialize i_uid/i_gid
Always initialize i_uid/i_gid inside the sysfs core so set_ownership()
can safely skip setting them.
Commit 5ec27ec735ba ("fs/proc/proc_sysctl.c: fix the default values of
i_uid/i_gid on /proc/sys inodes.") added defaults for i_uid/i_gid when
set_ownership() was not implemented. It also missed adjusting
net_ctl_set_ownership() to use the same default values in case the
computation of a better value failed.
In the Linux kernel, the following vulnerability has been resolved:
drm/gma500: fix null pointer dereference in cdv_intel_lvds_get_modes
In cdv_intel_lvds_get_modes(), the return value of drm_mode_duplicate()
is assigned to mode, which will lead to a NULL pointer dereference on
failure of drm_mode_duplicate(). Add a check to avoid npd.
In the Linux kernel, the following vulnerability has been resolved:
drm/gma500: fix null pointer dereference in psb_intel_lvds_get_modes
In psb_intel_lvds_get_modes(), the return value of drm_mode_duplicate() is
assigned to mode, which will lead to a possible NULL pointer dereference
on failure of drm_mode_duplicate(). Add a check to avoid npd.
In the Linux kernel, the following vulnerability has been resolved:
cifs: fix potential null pointer use in destroy_workqueue in init_cifs error path
Dan Carpenter reported a Smack static checker warning:
fs/smb/client/cifsfs.c:1981 init_cifs()
error: we previously assumed 'serverclose_wq' could be null (see line 1895)
The patch which introduced the serverclose workqueue used the wrong
oredering in error paths in init_cifs() for freeing it on errors.
In the Linux kernel, the following vulnerability has been resolved:
udf: Avoid using corrupted block bitmap buffer
When the filesystem block bitmap is corrupted, we detect the corruption
while loading the bitmap and fail the allocation with error. However the
next allocation from the same bitmap will notice the bitmap buffer is
already loaded and tries to allocate from the bitmap with mixed results
(depending on the exact nature of the bitmap corruption). Fix the
problem by using BH_verified bit to indicate whether the bitmap is valid
or not.
In the Linux kernel, the following vulnerability has been resolved:
ext4: check dot and dotdot of dx_root before making dir indexed
Syzbot reports a issue as follows:
============================================
BUG: unable to handle page fault for address: ffffed11022e24fe
PGD 23ffee067 P4D 23ffee067 PUD 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 PID: 5079 Comm: syz-executor306 Not tainted 6.10.0-rc5-g55027e689933 #0
Call Trace:
<TASK>
make_indexed_dir+0xdaf/0x13c0 fs/ext4/namei.c:2341
ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2451
ext4_rename fs/ext4/namei.c:3936 [inline]
ext4_rename2+0x26e5/0x4370 fs/ext4/namei.c:4214
[...]
============================================
The immediate cause of this problem is that there is only one valid dentry
for the block to be split during do_split, so split==0 results in out of
bounds accesses to the map triggering the issue.
do_split
unsigned split
dx_make_map
count = 1
split = count/2 = 0;
continued = hash2 == map[split - 1].hash;
---> map[4294967295]
The maximum length of a filename is 255 and the minimum block size is 1024,
so it is always guaranteed that the number of entries is greater than or
equal to 2 when do_split() is called.
But syzbot's crafted image has no dot and dotdot in dir, and the dentry
distribution in dirblock is as follows:
bus dentry1 hole dentry2 free
|xx--|xx-------------|...............|xx-------------|...............|
0 12 (8+248)=256 268 256 524 (8+256)=264 788 236 1024
So when renaming dentry1 increases its name_len length by 1, neither hole
nor free is sufficient to hold the new dentry, and make_indexed_dir() is
called.
In make_indexed_dir() it is assumed that the first two entries of the
dirblock must be dot and dotdot, so bus and dentry1 are left in dx_root
because they are treated as dot and dotdot, and only dentry2 is moved
to the new leaf block. That's why count is equal to 1.
Therefore add the ext4_check_dx_root() helper function to add more sanity
checks to dot and dotdot before starting the conversion to avoid the above
issue.
In the Linux kernel, the following vulnerability has been resolved:
ext4: make sure the first directory block is not a hole
The syzbot constructs a directory that has no dirblock but is non-inline,
i.e. the first directory block is a hole. And no errors are reported when
creating files in this directory in the following flow.
ext4_mknod
...
ext4_add_entry
// Read block 0
ext4_read_dirblock(dir, block, DIRENT)
bh = ext4_bread(NULL, inode, block, 0)
if (!bh && (type == INDEX || type == DIRENT_HTREE))
// The first directory block is a hole
// But type == DIRENT, so no error is reported.
After that, we get a directory block without '.' and '..' but with a valid
dentry. This may cause some code that relies on dot or dotdot (such as
make_indexed_dir()) to crash.
Therefore when ext4_read_dirblock() finds that the first directory block
is a hole report that the filesystem is corrupted and return an error to
avoid loading corrupted data from disk causing something bad.
In the Linux kernel, the following vulnerability has been resolved:
fs/ntfs3: Update log->page_{mask,bits} if log->page_size changed
If an NTFS file system is mounted to another system with different
PAGE_SIZE from the original system, log->page_size will change in
log_replay(), but log->page_{mask,bits} don't change correspondingly.
This will cause a panic because "u32 bytes = log->page_size - page_off"
will get a negative value in the later read_log_page().
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix return value of f2fs_convert_inline_inode()
If device is readonly, make f2fs_convert_inline_inode()
return EROFS instead of zero, otherwise it may trigger
panic during writeback of inline inode's dirty page as
below:
f2fs_write_single_data_page+0xbb6/0x1e90 fs/f2fs/data.c:2888
f2fs_write_cache_pages fs/f2fs/data.c:3187 [inline]
__f2fs_write_data_pages fs/f2fs/data.c:3342 [inline]
f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3369
do_writepages+0x359/0x870 mm/page-writeback.c:2634
filemap_fdatawrite_wbc+0x125/0x180 mm/filemap.c:397
__filemap_fdatawrite_range mm/filemap.c:430 [inline]
file_write_and_wait_range+0x1aa/0x290 mm/filemap.c:788
f2fs_do_sync_file+0x68a/0x1ae0 fs/f2fs/file.c:276
generic_write_sync include/linux/fs.h:2806 [inline]
f2fs_file_write_iter+0x7bd/0x24e0 fs/f2fs/file.c:4977
call_write_iter include/linux/fs.h:2114 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xa72/0xc90 fs/read_write.c:590
ksys_write+0x1a0/0x2c0 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: handle inconsistent state in nilfs_btnode_create_block()
Syzbot reported that a buffer state inconsistency was detected in
nilfs_btnode_create_block(), triggering a kernel bug.
It is not appropriate to treat this inconsistency as a bug; it can occur
if the argument block address (the buffer index of the newly created
block) is a virtual block number and has been reallocated due to
corruption of the bitmap used to manage its allocation state.
So, modify nilfs_btnode_create_block() and its callers to treat it as a
possible filesystem error, rather than triggering a kernel bug.
In the Linux kernel, the following vulnerability has been resolved:
ice: Add a per-VF limit on number of FDIR filters
While the iavf driver adds a s/w limit (128) on the number of FDIR
filters that the VF can request, a malicious VF driver can request more
than that and exhaust the resources for other VFs.
Add a similar limit in ice.
In the Linux kernel, the following vulnerability has been resolved:
irqchip/imx-irqsteer: Handle runtime power management correctly
The power domain is automatically activated from clk_prepare(). However, on
certain platforms like i.MX8QM and i.MX8QXP, the power-on handling invokes
sleeping functions, which triggers the 'scheduling while atomic' bug in the
context switch path during device probing:
BUG: scheduling while atomic: kworker/u13:1/48/0x00000002
Call trace:
__schedule_bug+0x54/0x6c
__schedule+0x7f0/0xa94
schedule+0x5c/0xc4
schedule_preempt_disabled+0x24/0x40
__mutex_lock.constprop.0+0x2c0/0x540
__mutex_lock_slowpath+0x14/0x20
mutex_lock+0x48/0x54
clk_prepare_lock+0x44/0xa0
clk_prepare+0x20/0x44
imx_irqsteer_resume+0x28/0xe0
pm_generic_runtime_resume+0x2c/0x44
__genpd_runtime_resume+0x30/0x80
genpd_runtime_resume+0xc8/0x2c0
__rpm_callback+0x48/0x1d8
rpm_callback+0x6c/0x78
rpm_resume+0x490/0x6b4
__pm_runtime_resume+0x50/0x94
irq_chip_pm_get+0x2c/0xa0
__irq_do_set_handler+0x178/0x24c
irq_set_chained_handler_and_data+0x60/0xa4
mxc_gpio_probe+0x160/0x4b0
Cure this by implementing the irq_bus_lock/sync_unlock() interrupt chip
callbacks and handle power management in them as they are invoked from
non-atomic context.
[ tglx: Rewrote change log, added Fixes tag ]
In the Linux kernel, the following vulnerability has been resolved:
scsi: qla2xxx: During vport delete send async logout explicitly
During vport delete, it is observed that during unload we hit a crash
because of stale entries in outstanding command array. For all these stale
I/O entries, eh_abort was issued and aborted (fast_fail_io = 2009h) but
I/Os could not complete while vport delete is in process of deleting.
BUG: kernel NULL pointer dereference, address: 000000000000001c
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
Workqueue: qla2xxx_wq qla_do_work [qla2xxx]
RIP: 0010:dma_direct_unmap_sg+0x51/0x1e0
RSP: 0018:ffffa1e1e150fc68 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000021 RCX: 0000000000000001
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff8ce208a7a0d0
RBP: ffff8ce208a7a0d0 R08: 0000000000000000 R09: ffff8ce378aac9c8
R10: ffff8ce378aac8a0 R11: ffffa1e1e150f9d8 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8ce378aac9c8 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8d217f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000001c CR3: 0000002089acc000 CR4: 0000000000350ee0
Call Trace:
<TASK>
qla2xxx_qpair_sp_free_dma+0x417/0x4e0
? qla2xxx_qpair_sp_compl+0x10d/0x1a0
? qla2x00_status_entry+0x768/0x2830
? newidle_balance+0x2f0/0x430
? dequeue_entity+0x100/0x3c0
? qla24xx_process_response_queue+0x6a1/0x19e0
? __schedule+0x2d5/0x1140
? qla_do_work+0x47/0x60
? process_one_work+0x267/0x440
? process_one_work+0x440/0x440
? worker_thread+0x2d/0x3d0
? process_one_work+0x440/0x440
? kthread+0x156/0x180
? set_kthread_struct+0x50/0x50
? ret_from_fork+0x22/0x30
</TASK>
Send out async logout explicitly for all the ports during vport delete.
In the Linux kernel, the following vulnerability has been resolved:
scsi: qla2xxx: Fix for possible memory corruption
Init Control Block is dereferenced incorrectly. Correctly dereference ICB