In the Linux kernel, the following vulnerability has been resolved:
tty: goldfish: Use tty_port_destroy() to destroy port
In goldfish_tty_probe(), the port initialized through tty_port_init()
should be destroyed in error paths.In goldfish_tty_remove(), qtty->port
also should be destroyed or else might leak resources.
Fix the above by calling tty_port_destroy().
In the Linux kernel, the following vulnerability has been resolved:
usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
The list_for_each_entry_safe() macro saves the current item (n) and
the item after (n+1), so that n can be safely removed without
corrupting the list. However, when traversing the list and removing
items using gadget giveback, the DWC3 lock is briefly released,
allowing other routines to execute. There is a situation where, while
items are being removed from the cancelled_list using
dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
routine is running in parallel (due to UDC unbind). As the cleanup
routine removes n, and the pullup disable removes n+1, once the
cleanup retakes the DWC3 lock, it references a request who was already
removed/handled. With list debug enabled, this leads to a panic.
Ensure all instances of the macro are replaced where gadget giveback
is used.
Example call stack:
Thread#1:
__dwc3_gadget_ep_set_halt() - CLEAR HALT
-> dwc3_gadget_ep_cleanup_cancelled_requests()
->list_for_each_entry_safe()
->dwc3_gadget_giveback(n)
->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
->spin_unlock
->Thread#2 executes
...
->dwc3_gadget_giveback(n+1)
->Already removed!
Thread#2:
dwc3_gadget_pullup()
->waiting for dwc3 spin_lock
...
->Thread#1 released lock
->dwc3_stop_active_transfers()
->dwc3_remove_requests()
->fetches n+1 item from cancelled_list (n removed by Thread#1)
->dwc3_gadget_giveback()
->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list]
->spin_unlock
In the Linux kernel, the following vulnerability has been resolved:
um: Fix out-of-bounds read in LDT setup
syscall_stub_data() expects the data_count parameter to be the number of
longs, not bytes.
==================================================================
BUG: KASAN: stack-out-of-bounds in syscall_stub_data+0x70/0xe0
Read of size 128 at addr 000000006411f6f0 by task swapper/1
CPU: 0 PID: 1 Comm: swapper Not tainted 5.18.0+ #18
Call Trace:
show_stack.cold+0x166/0x2a7
__dump_stack+0x3a/0x43
dump_stack_lvl+0x1f/0x27
print_report.cold+0xdb/0xf81
kasan_report+0x119/0x1f0
kasan_check_range+0x3a3/0x440
memcpy+0x52/0x140
syscall_stub_data+0x70/0xe0
write_ldt_entry+0xac/0x190
init_new_ldt+0x515/0x960
init_new_context+0x2c4/0x4d0
mm_init.constprop.0+0x5ed/0x760
mm_alloc+0x118/0x170
0x60033f48
do_one_initcall+0x1d7/0x860
0x60003e7b
kernel_init+0x6e/0x3d4
new_thread_handler+0x1e7/0x2c0
The buggy address belongs to stack of task swapper/1
and is located at offset 64 in frame:
init_new_ldt+0x0/0x960
This frame has 2 objects:
[32, 40) 'addr'
[64, 80) 'desc'
==================================================================
In the Linux kernel, the following vulnerability has been resolved:
blk-iolatency: Fix inflight count imbalances and IO hangs on offline
iolatency needs to track the number of inflight IOs per cgroup. As this
tracking can be expensive, it is disabled when no cgroup has iolatency
configured for the device. To ensure that the inflight counters stay
balanced, iolatency_set_limit() freezes the request_queue while manipulating
the enabled counter, which ensures that no IO is in flight and thus all
counters are zero.
Unfortunately, iolatency_set_limit() isn't the only place where the enabled
counter is manipulated. iolatency_pd_offline() can also dec the counter and
trigger disabling. As this disabling happens without freezing the q, this
can easily happen while some IOs are in flight and thus leak the counts.
This can be easily demonstrated by turning on iolatency on an one empty
cgroup while IOs are in flight in other cgroups and then removing the
cgroup. Note that iolatency shouldn't have been enabled elsewhere in the
system to ensure that removing the cgroup disables iolatency for the whole
device.
The following keeps flipping on and off iolatency on sda:
echo +io > /sys/fs/cgroup/cgroup.subtree_control
while true; do
mkdir -p /sys/fs/cgroup/test
echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency
sleep 1
rmdir /sys/fs/cgroup/test
sleep 1
done
and there's concurrent fio generating direct rand reads:
fio --name test --filename=/dev/sda --direct=1 --rw=randread \
--runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k
while monitoring with the following drgn script:
while True:
for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()):
for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list):
blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node')
pd = blkg.pd[prog['blkcg_policy_iolatency'].plid]
if pd.value_() == 0:
continue
iolat = container_of(pd, 'struct iolatency_grp', 'pd')
inflight = iolat.rq_wait.inflight.counter.value_()
if inflight:
print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} '
f'{cgroup_path(css.cgroup).decode("utf-8")}')
time.sleep(1)
The monitoring output looks like the following:
inflight=1 sda /user.slice
inflight=1 sda /user.slice
...
inflight=14 sda /user.slice
inflight=13 sda /user.slice
inflight=17 sda /user.slice
inflight=15 sda /user.slice
inflight=18 sda /user.slice
inflight=17 sda /user.slice
inflight=20 sda /user.slice
inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19
inflight=19 sda /user.slice
inflight=19 sda /user.slice
If a cgroup with stuck inflight ends up getting throttled, the throttled IOs
will never get issued as there's no completion event to wake it up leading
to an indefinite hang.
This patch fixes the bug by unifying enable handling into a work item which
is automatically kicked off from iolatency_set_min_lat_nsec() which is
called from both iolatency_set_limit() and iolatency_pd_offline() paths.
Punting to a work item is necessary as iolatency_pd_offline() is called
under spinlocks while freezing a request_queue requires a sleepable context.
This also simplifies the code reducing LOC sans the comments and avoids the
unnecessary freezes which were happening whenever a cgroup's latency target
is newly set or cleared.
In the Linux kernel, the following vulnerability has been resolved:
serial: 8250_aspeed_vuart: Fix potential NULL dereference in aspeed_vuart_probe
platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference.
In the Linux kernel, the following vulnerability has been resolved:
remoteproc: mtk_scp: Fix a potential double free
'scp->rproc' is allocated using devm_rproc_alloc(), so there is no need
to free it explicitly in the remove function.
In the Linux kernel, the following vulnerability has been resolved:
usb: usbip: fix a refcount leak in stub_probe()
usb_get_dev() is called in stub_device_alloc(). When stub_probe() fails
after that, usb_put_dev() needs to be called to release the reference.
Fix this by moving usb_put_dev() to sdev_free error path handling.
Find this by code review.
In the Linux kernel, the following vulnerability has been resolved:
watchdog: rzg2l_wdt: Fix 32bit overflow issue
The value of timer_cycle_us can be 0 due to 32bit overflow.
For eg:- If we assign the counter value "0xfff" for computing
maxval.
This patch fixes this issue by appending ULL to 1024, so that
it is promoted to 64bit.
This patch also fixes the warning message, 'watchdog: Invalid min and
max timeout values, resetting to 0!'.
In the Linux kernel, the following vulnerability has been resolved:
net: ethernet: ti: am65-cpsw-nuss: Fix some refcount leaks
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
am65_cpsw_init_cpts() and am65_cpsw_nuss_probe() don't release
the refcount in error case.
Add missing of_node_put() to avoid refcount leak.
In the Linux kernel, the following vulnerability has been resolved:
md: fix double free of io_acct_set bioset
Now io_acct_set is alloc and free in personality. Remove the codes that
free io_acct_set in md_free and md_stop.
In the Linux kernel, the following vulnerability has been resolved:
soc: rockchip: Fix refcount leak in rockchip_grf_init
of_find_matching_node_and_match returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to avoid f2fs_bug_on() in dec_valid_node_count()
As Yanming reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=215897
I have encountered a bug in F2FS file system in kernel v5.17.
The kernel should enable CONFIG_KASAN=y and CONFIG_KASAN_INLINE=y. You can
reproduce the bug by running the following commands:
The kernel message is shown below:
kernel BUG at fs/f2fs/f2fs.h:2511!
Call Trace:
f2fs_remove_inode_page+0x2a2/0x830
f2fs_evict_inode+0x9b7/0x1510
evict+0x282/0x4e0
do_unlinkat+0x33a/0x540
__x64_sys_unlinkat+0x8e/0xd0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The root cause is: .total_valid_block_count or .total_valid_node_count
could fuzzed to zero, then once dec_valid_node_count() was called, it
will cause BUG_ON(), this patch fixes to print warning info and set
SBI_NEED_FSCK into CP instead of panic.
In the Linux kernel, the following vulnerability has been resolved:
sfc: fix considering that all channels have TX queues
Normally, all channels have RX and TX queues, but this is not true if
modparam efx_separate_tx_channels=1 is used. In that cases, some
channels only have RX queues and others only TX queues (or more
preciselly, they have them allocated, but not initialized).
Fix efx_channel_has_tx_queues to return the correct value for this case
too.
Messages shown at probe time before the fix:
sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0
------------[ cut here ]------------
netdevice: ens6f0np0: failed to initialise TXQ -1
WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc]
[...] stripped
RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc]
[...] stripped
Call Trace:
efx_init_tx_queue+0xaa/0xf0 [sfc]
efx_start_channels+0x49/0x120 [sfc]
efx_start_all+0x1f8/0x430 [sfc]
efx_net_open+0x5a/0xe0 [sfc]
__dev_open+0xd0/0x190
__dev_change_flags+0x1b3/0x220
dev_change_flags+0x21/0x60
[...] stripped
Messages shown at remove time before the fix:
sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues
sfc 0000:03:00.0 ens6f0np0: failed to flush queues
In the Linux kernel, the following vulnerability has been resolved:
scsi: sd: Fix potential NULL pointer dereference
If sd_probe() sees an early error before sdkp->device is initialized,
sd_zbc_release_disk() is called. This causes a NULL pointer dereference
when sd_is_zoned() is called inside that function. Avoid this by removing
the call to sd_zbc_release_disk() in sd_probe() error path.
This change is safe and does not result in zone information memory leakage
because the zone information for a zoned disk is allocated only when
sd_revalidate_disk() is called, at which point sdkp->disk_dev is fully set,
resulting in sd_disk_release() being called when needed to cleanup a disk
zone information using sd_zbc_release_disk().
In the Linux kernel, the following vulnerability has been resolved:
rtc: mt6397: check return value after calling platform_get_resource()
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
In the Linux kernel, the following vulnerability has been resolved:
tipc: check attribute length for bearer name
syzbot reported uninit-value:
=====================================================
BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:644 [inline]
BUG: KMSAN: uninit-value in string+0x4f9/0x6f0 lib/vsprintf.c:725
string_nocheck lib/vsprintf.c:644 [inline]
string+0x4f9/0x6f0 lib/vsprintf.c:725
vsnprintf+0x2222/0x3650 lib/vsprintf.c:2806
vprintk_store+0x537/0x2150 kernel/printk/printk.c:2158
vprintk_emit+0x28b/0xab0 kernel/printk/printk.c:2256
vprintk_default+0x86/0xa0 kernel/printk/printk.c:2283
vprintk+0x15f/0x180 kernel/printk/printk_safe.c:50
_printk+0x18d/0x1cf kernel/printk/printk.c:2293
tipc_enable_bearer net/tipc/bearer.c:371 [inline]
__tipc_nl_bearer_enable+0x2022/0x22a0 net/tipc/bearer.c:1033
tipc_nl_bearer_enable+0x6c/0xb0 net/tipc/bearer.c:1042
genl_family_rcv_msg_doit net/netlink/genetlink.c:731 [inline]
- Do sanity check the attribute length for TIPC_NLA_BEARER_NAME.
- Do not use 'illegal name' in printing message.
In the Linux kernel, the following vulnerability has been resolved:
watchdog: ts4800_wdt: Fix refcount leak in ts4800_wdt_probe
of_parse_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.
Add missing of_node_put() in some error paths.
In the Linux kernel, the following vulnerability has been resolved:
tcp: tcp_rtx_synack() can be called from process context
Laurent reported the enclosed report [1]
This bug triggers with following coditions:
0) Kernel built with CONFIG_DEBUG_PREEMPT=y
1) A new passive FastOpen TCP socket is created.
This FO socket waits for an ACK coming from client to be a complete
ESTABLISHED one.
2) A socket operation on this socket goes through lock_sock()
release_sock() dance.
3) While the socket is owned by the user in step 2),
a retransmit of the SYN is received and stored in socket backlog.
4) At release_sock() time, the socket backlog is processed while
in process context.
5) A SYNACK packet is cooked in response of the SYN retransmit.
6) -> tcp_rtx_synack() is called in process context.
Before blamed commit, tcp_rtx_synack() was always called from BH handler,
from a timer handler.
Fix this by using TCP_INC_STATS() & NET_INC_STATS()
which do not assume caller is in non preemptible context.
[1]
BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180
caller is tcp_rtx_synack.part.0+0x36/0xc0
CPU: 10 PID: 2180 Comm: epollpep Tainted: G OE 5.16.0-0.bpo.4-amd64 #1 Debian 5.16.12-1~bpo11+1
Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021
Call Trace:
<TASK>
dump_stack_lvl+0x48/0x5e
check_preemption_disabled+0xde/0xe0
tcp_rtx_synack.part.0+0x36/0xc0
tcp_rtx_synack+0x8d/0xa0
? kmem_cache_alloc+0x2e0/0x3e0
? apparmor_file_alloc_security+0x3b/0x1f0
inet_rtx_syn_ack+0x16/0x30
tcp_check_req+0x367/0x610
tcp_rcv_state_process+0x91/0xf60
? get_nohz_timer_target+0x18/0x1a0
? lock_timer_base+0x61/0x80
? preempt_count_add+0x68/0xa0
tcp_v4_do_rcv+0xbd/0x270
__release_sock+0x6d/0xb0
release_sock+0x2b/0x90
sock_setsockopt+0x138/0x1140
? __sys_getsockname+0x7e/0xc0
? aa_sk_perm+0x3e/0x1a0
__sys_setsockopt+0x198/0x1e0
__x64_sys_setsockopt+0x21/0x30
do_syscall_64+0x38/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
In the Linux kernel, the following vulnerability has been resolved:
driver core: fix deadlock in __device_attach
In __device_attach function, The lock holding logic is as follows:
...
__device_attach
device_lock(dev) // get lock dev
async_schedule_dev(__device_attach_async_helper, dev); // func
async_schedule_node
async_schedule_node_domain(func)
entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
/* when fail or work limit, sync to execute func, but
__device_attach_async_helper will get lock dev as
well, which will lead to A-A deadlock. */
if (!entry || atomic_read(&entry_count) > MAX_WORK) {
func;
else
queue_work_node(node, system_unbound_wq, &entry->work)
device_unlock(dev)
As shown above, when it is allowed to do async probes, because of
out of memory or work limit, async work is not allowed, to do
sync execute instead. it will lead to A-A deadlock because of
__device_attach_async_helper getting lock dev.
To fix the deadlock, move the async_schedule_dev outside device_lock,
as we can see, in async_schedule_node_domain, the parameter of
queue_work_node is system_unbound_wq, so it can accept concurrent
operations. which will also not change the code logic, and will
not lead to deadlock.
In the Linux kernel, the following vulnerability has been resolved:
firmware: dmi-sysfs: Fix memory leak in dmi_sysfs_register_handle
kobject_init_and_add() takes reference even when it fails.
According to the doc of kobject_init_and_add()
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object.
Fix this issue by calling kobject_put().
In the Linux kernel, the following vulnerability has been resolved:
amt: fix possible memory leak in amt_rcv()
If an amt receives packets and it finds socket.
If it can't find a socket, it should free a received skb.
But it doesn't.
So, a memory leak would possibly occur.
In the Linux kernel, the following vulnerability has been resolved:
net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry()
The "fsp->location" variable comes from user via ethtool_get_rxnfc().
Check that it is valid to prevent an out of bounds read.
In the Linux kernel, the following vulnerability has been resolved:
net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_register
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.
mv88e6xxx_mdio_register() pass the device node to of_mdiobus_register().
We don't need the device node after it.
Add missing of_node_put() to avoid refcount leak.
In the Linux kernel, the following vulnerability has been resolved:
ksmbd: fix reference count leak in smb_check_perm_dacl()
The issue happens in a specific path in smb_check_perm_dacl(). When
"id" and "uid" have the same value, the function simply jumps out of
the loop without decrementing the reference count of the object
"posix_acls", which is increased by get_acl() earlier. This may
result in memory leaks.
Fix it by decreasing the reference count of "posix_acls" before
jumping to label "check_access_bits".
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Off by one in dm_dmub_outbox1_low_irq()
The > ARRAY_SIZE() should be >= ARRAY_SIZE() to prevent an out of bounds
access.
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to clear dirty inode in f2fs_evict_inode()
As Yanming reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=215904
The kernel message is shown below:
kernel BUG at fs/f2fs/inode.c:825!
Call Trace:
evict+0x282/0x4e0
__dentry_kill+0x2b2/0x4d0
shrink_dentry_list+0x17c/0x4f0
shrink_dcache_parent+0x143/0x1e0
do_one_tree+0x9/0x30
shrink_dcache_for_umount+0x51/0x120
generic_shutdown_super+0x5c/0x3a0
kill_block_super+0x90/0xd0
kill_f2fs_super+0x225/0x310
deactivate_locked_super+0x78/0xc0
cleanup_mnt+0x2b7/0x480
task_work_run+0xc8/0x150
exit_to_user_mode_prepare+0x14a/0x150
syscall_exit_to_user_mode+0x1d/0x40
do_syscall_64+0x48/0x90
The root cause is: inode node and dnode node share the same nid,
so during f2fs_evict_inode(), dnode node truncation will invalidate
its NAT entry, so when truncating inode node, it fails due to
invalid NAT entry, result in inode is still marked as dirty, fix
this issue by clearing dirty for inode and setting SBI_NEED_FSCK
flag in filesystem.
output from dump.f2fs:
[print_node_info: 354] Node ID [0xf:15] is inode
i_nid[0] [0x f : 15]
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to do sanity check on block address in f2fs_do_zero_range()
As Yanming reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=215894
I have encountered a bug in F2FS file system in kernel v5.17.
I have uploaded the system call sequence as case.c, and a fuzzed image can
be found in google net disk
The kernel should enable CONFIG_KASAN=y and CONFIG_KASAN_INLINE=y. You can
reproduce the bug by running the following commands:
kernel BUG at fs/f2fs/segment.c:2291!
Call Trace:
f2fs_invalidate_blocks+0x193/0x2d0
f2fs_fallocate+0x2593/0x4a70
vfs_fallocate+0x2a5/0xac0
ksys_fallocate+0x35/0x70
__x64_sys_fallocate+0x8e/0xf0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The root cause is, after image was fuzzed, block mapping info in inode
will be inconsistent with SIT table, so in f2fs_fallocate(), it will cause
panic when updating SIT with invalid blkaddr.
Let's fix the issue by adding sanity check on block address before updating
SIT table with it.
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to do sanity check for inline inode
Yanming reported a kernel bug in Bugzilla kernel [1], which can be
reproduced. The bug message is:
The kernel message is shown below:
kernel BUG at fs/inode.c:611!
Call Trace:
evict+0x282/0x4e0
__dentry_kill+0x2b2/0x4d0
dput+0x2dd/0x720
do_renameat2+0x596/0x970
__x64_sys_rename+0x78/0x90
do_syscall_64+0x3b/0x90
[1] https://bugzilla.kernel.org/show_bug.cgi?id=215895
The bug is due to fuzzed inode has both inline_data and encrypted flags.
During f2fs_evict_inode(), as the inode was deleted by rename(), it
will cause inline data conversion due to conflicting flags. The page
cache will be polluted and the panic will be triggered in clear_inode().
Try fixing the bug by doing more sanity checks for inline data inode in
sanity_check_inode().
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to do sanity check on total_data_blocks
As Yanming reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=215916
The kernel message is shown below:
kernel BUG at fs/f2fs/segment.c:2560!
Call Trace:
allocate_segment_by_default+0x228/0x440
f2fs_allocate_data_block+0x13d1/0x31f0
do_write_page+0x18d/0x710
f2fs_outplace_write_data+0x151/0x250
f2fs_do_write_data_page+0xef9/0x1980
move_data_page+0x6af/0xbc0
do_garbage_collect+0x312f/0x46f0
f2fs_gc+0x6b0/0x3bc0
f2fs_balance_fs+0x921/0x2260
f2fs_write_single_data_page+0x16be/0x2370
f2fs_write_cache_pages+0x428/0xd00
f2fs_write_data_pages+0x96e/0xd50
do_writepages+0x168/0x550
__writeback_single_inode+0x9f/0x870
writeback_sb_inodes+0x47d/0xb20
__writeback_inodes_wb+0xb2/0x200
wb_writeback+0x4bd/0x660
wb_workfn+0x5f3/0xab0
process_one_work+0x79f/0x13e0
worker_thread+0x89/0xf60
kthread+0x26a/0x300
ret_from_fork+0x22/0x30
RIP: 0010:new_curseg+0xe8d/0x15f0
The root cause is: ckpt.valid_block_count is inconsistent with SIT table,
stat info indicates filesystem has free blocks, but SIT table indicates
filesystem has no free segment.
So that during garbage colloection, it triggers panic when LFS allocator
fails to find free segment.
This patch tries to fix this issue by checking consistency in between
ckpt.valid_block_count and block accounted from SIT.
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: memleak flow rule from commit path
Abort path release flow rule object, however, commit path does not.
Update code to destroy these objects before releasing the transaction.
In the Linux kernel, the following vulnerability has been resolved:
efi: Do not import certificates from UEFI Secure Boot for T2 Macs
On Apple T2 Macs, when Linux attempts to read the db and dbx efi variables
at early boot to load UEFI Secure Boot certificates, a page fault occurs
in Apple firmware code and EFI runtime services are disabled with the
following logs:
[Firmware Bug]: Page fault caused by firmware at PA: 0xffffb1edc0068000
WARNING: CPU: 3 PID: 104 at arch/x86/platform/efi/quirks.c:735 efi_crash_gracefully_on_page_fault+0x50/0xf0
(Removed some logs from here)
Call Trace:
<TASK>
page_fault_oops+0x4f/0x2c0
? search_bpf_extables+0x6b/0x80
? search_module_extables+0x50/0x80
? search_exception_tables+0x5b/0x60
kernelmode_fixup_or_oops+0x9e/0x110
__bad_area_nosemaphore+0x155/0x190
bad_area_nosemaphore+0x16/0x20
do_kern_addr_fault+0x8c/0xa0
exc_page_fault+0xd8/0x180
asm_exc_page_fault+0x1e/0x30
(Removed some logs from here)
? __efi_call+0x28/0x30
? switch_mm+0x20/0x30
? efi_call_rts+0x19a/0x8e0
? process_one_work+0x222/0x3f0
? worker_thread+0x4a/0x3d0
? kthread+0x17a/0x1a0
? process_one_work+0x3f0/0x3f0
? set_kthread_struct+0x40/0x40
? ret_from_fork+0x22/0x30
</TASK>
---[ end trace 1f82023595a5927f ]---
efi: Froze efi_rts_wq and disabled EFI Runtime Services
integrity: Couldn't get size: 0x8000000000000015
integrity: MODSIGN: Couldn't get UEFI db list
efi: EFI Runtime Services are disabled!
integrity: Couldn't get size: 0x8000000000000015
integrity: Couldn't get UEFI dbx list
integrity: Couldn't get size: 0x8000000000000015
integrity: Couldn't get mokx list
integrity: Couldn't get size: 0x80000000
So we avoid reading these UEFI variables and thus prevent the crash.
In the Linux kernel, the following vulnerability has been resolved:
SUNRPC: Trap RDMA segment overflows
Prevent svc_rdma_build_writes() from walking off the end of a Write
chunk's segment array. Caught with KASAN.
The test that this fix replaces is invalid, and might have been left
over from an earlier prototype of the PCL work.
In the Linux kernel, the following vulnerability has been resolved:
ata: pata_octeon_cf: Fix refcount leak in octeon_cf_probe
of_find_device_by_node() takes reference, we should use put_device()
to release it when not need anymore.
Add missing put_device() to avoid refcount leak.
In the Linux kernel, the following vulnerability has been resolved:
powerpc/papr_scm: don't requests stats with '0' sized stats buffer
Sachin reported [1] that on a POWER-10 lpar he is seeing a kernel panic being
reported with vPMEM when papr_scm probe is being called. The panic is of the
form below and is observed only with following option disabled(profile) for the
said LPAR 'Enable Performance Information Collection' in the HMC:
Kernel attempted to write user page (1c) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on write at 0x0000001c
Faulting instruction address: 0xc008000001b90844
Oops: Kernel access of bad area, sig: 11 [#1]
<snip>
NIP [c008000001b90844] drc_pmem_query_stats+0x5c/0x270 [papr_scm]
LR [c008000001b92794] papr_scm_probe+0x2ac/0x6ec [papr_scm]
Call Trace:
0xc00000000941bca0 (unreliable)
papr_scm_probe+0x2ac/0x6ec [papr_scm]
platform_probe+0x98/0x150
really_probe+0xfc/0x510
__driver_probe_device+0x17c/0x230
<snip>
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception
On investigation looks like this panic was caused due to a 'stat_buffer' of
size==0 being provided to drc_pmem_query_stats() to fetch all performance
stats-ids of an NVDIMM. However drc_pmem_query_stats() shouldn't have been called
since the vPMEM NVDIMM doesn't support and performance stat-id's. This was caused
due to missing check for 'p->stat_buffer_len' at the beginning of
papr_scm_pmu_check_events() which indicates that the NVDIMM doesn't support
performance-stats.
Fix this by introducing the check for 'p->stat_buffer_len' at the beginning of
papr_scm_pmu_check_events().
[1] https://lore.kernel.org/all/6B3A522A-6A5F-4CC9-B268-0C63AA6E07D3@linux.ibm.com
In the Linux kernel, the following vulnerability has been resolved:
net: altera: Fix refcount leak in altera_tse_mdio_create
Every iteration of for_each_child_of_node() decrements
the reference count of the previous node.
When break from a for_each_child_of_node() loop,
we need to explicitly call of_node_put() on the child node when
not need anymore.
Add missing of_node_put() to avoid refcount leak.
In the Linux kernel, the following vulnerability has been resolved:
ext4: filter out EXT4_FC_REPLAY from on-disk superblock field s_state
The EXT4_FC_REPLAY bit in sbi->s_mount_state is used to indicate that
we are in the middle of replay the fast commit journal. This was
actually a mistake, since the sbi->s_mount_info is initialized from
es->s_state. Arguably s_mount_state is misleadingly named, but the
name is historical --- s_mount_state and s_state dates back to ext2.
What should have been used is the ext4_{set,clear,test}_mount_flag()
inline functions, which sets EXT4_MF_* bits in sbi->s_mount_flags.
The problem with using EXT4_FC_REPLAY is that a maliciously corrupted
superblock could result in EXT4_FC_REPLAY getting set in
s_mount_state. This bypasses some sanity checks, and this can trigger
a BUG() in ext4_es_cache_extent(). As a easy-to-backport-fix, filter
out the EXT4_FC_REPLAY bit for now. We should eventually transition
away from EXT4_FC_REPLAY to something like EXT4_MF_REPLAY.
In the Linux kernel, the following vulnerability has been resolved:
net: dsa: lantiq_gswip: Fix refcount leak in gswip_gphy_fw_list
Every iteration of for_each_available_child_of_node() decrements
the reference count of the previous node.
when breaking early from a for_each_available_child_of_node() loop,
we need to explicitly call of_node_put() on the gphy_fw_np.
Add missing of_node_put() to avoid refcount leak.
In the Linux kernel, the following vulnerability has been resolved:
af_unix: Fix a data-race in unix_dgram_peer_wake_me().
unix_dgram_poll() calls unix_dgram_peer_wake_me() without `other`'s
lock held and check if its receive queue is full. Here we need to
use unix_recvq_full_lockless() instead of unix_recvq_full(), otherwise
KCSAN will report a data-race.
In the Linux kernel, the following vulnerability has been resolved:
ext4: avoid cycles in directory h-tree
A maliciously corrupted filesystem can contain cycles in the h-tree
stored inside a directory. That can easily lead to the kernel corrupting
tree nodes that were already verified under its hands while doing a node
split and consequently accessing unallocated memory. Fix the problem by
verifying traversed block numbers are unique.
In the Linux kernel, the following vulnerability has been resolved:
net: ethernet: bgmac: Fix refcount leak in bcma_mdio_mii_register
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.
In the Linux kernel, the following vulnerability has been resolved:
bpf, arm64: Clear prog->jited_len along prog->jited
syzbot reported an illegal copy_to_user() attempt
from bpf_prog_get_info_by_fd() [1]
There was no repro yet on this bug, but I think
that commit 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns")
is exposing a prior bug in bpf arm64.
bpf_prog_get_info_by_fd() looks at prog->jited_len
to determine if the JIT image can be copied out to user space.
My theory is that syzbot managed to get a prog where prog->jited_len
has been set to 43, while prog->bpf_func has ben cleared.
It is not clear why copy_to_user(uinsns, NULL, ulen) is triggering
this particular warning.
I thought find_vma_area(NULL) would not find a vm_struct.
As we do not hold vmap_area_lock spinlock, it might be possible
that the found vm_struct was garbage.
[1]
usercopy: Kernel memory exposure attempt detected from vmalloc (offset 792633534417210172, size 43)!
kernel BUG at mm/usercopy.c:101!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 25002 Comm: syz-executor.1 Not tainted 5.18.0-syzkaller-10139-g8291eaafed36 #0
Hardware name: linux,dummy-virt (DT)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usercopy_abort+0x90/0x94 mm/usercopy.c:101
lr : usercopy_abort+0x90/0x94 mm/usercopy.c:89
sp : ffff80000b773a20
x29: ffff80000b773a30 x28: faff80000b745000 x27: ffff80000b773b48
x26: 0000000000000000 x25: 000000000000002b x24: 0000000000000000
x23: 00000000000000e0 x22: ffff80000b75db67 x21: 0000000000000001
x20: 000000000000002b x19: ffff80000b75db3c x18: 00000000fffffffd
x17: 2820636f6c6c616d x16: 76206d6f72662064 x15: 6574636574656420
x14: 74706d6574746120 x13: 2129333420657a69 x12: 73202c3237313031
x11: 3237313434333533 x10: 3336323937207465 x9 : 657275736f707865
x8 : ffff80000a30c550 x7 : ffff80000b773830 x6 : ffff80000b773830
x5 : 0000000000000000 x4 : ffff00007fbbaa10 x3 : 0000000000000000
x2 : 0000000000000000 x1 : f7ff000028fc0000 x0 : 0000000000000064
Call trace:
usercopy_abort+0x90/0x94 mm/usercopy.c:89
check_heap_object mm/usercopy.c:186 [inline]
__check_object_size mm/usercopy.c:252 [inline]
__check_object_size+0x198/0x36c mm/usercopy.c:214
check_object_size include/linux/thread_info.h:199 [inline]
check_copy_size include/linux/thread_info.h:235 [inline]
copy_to_user include/linux/uaccess.h:159 [inline]
bpf_prog_get_info_by_fd.isra.0+0xf14/0xfdc kernel/bpf/syscall.c:3993
bpf_obj_get_info_by_fd+0x12c/0x510 kernel/bpf/syscall.c:4253
__sys_bpf+0x900/0x2150 kernel/bpf/syscall.c:4956
__do_sys_bpf kernel/bpf/syscall.c:5021 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5019 [inline]
__arm64_sys_bpf+0x28/0x40 kernel/bpf/syscall.c:5019
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:52
el0_svc_common.constprop.0+0x44/0xec arch/arm64/kernel/syscall.c:142
do_el0_svc+0xa0/0xc0 arch/arm64/kernel/syscall.c:206
el0_svc+0x44/0xb0 arch/arm64/kernel/entry-common.c:624
el0t_64_sync_handler+0x1ac/0x1b0 arch/arm64/kernel/entry-common.c:642
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:581
Code: aa0003e3 d00038c0 91248000 97fff65f (d4210000)
In the Linux kernel, the following vulnerability has been resolved:
ip_gre: test csum_start instead of transport header
GRE with TUNNEL_CSUM will apply local checksum offload on
CHECKSUM_PARTIAL packets.
ipgre_xmit must validate csum_start after an optional skb_pull,
else lco_csum may trigger an overflow. The original check was
if (csum && skb_checksum_start(skb) < skb->data)
return -EINVAL;
This had false positives when skb_checksum_start is undefined:
when ip_summed is not CHECKSUM_PARTIAL. A discussed refinement
was straightforward
if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
skb_checksum_start(skb) < skb->data)
return -EINVAL;
But was eventually revised more thoroughly:
- restrict the check to the only branch where needed, in an
uncommon GRE path that uses header_ops and calls skb_pull.
- test skb_transport_header, which is set along with csum_start
in skb_partial_csum_set in the normal header_ops datapath.
Turns out skbs can arrive in this branch without the transport
header set, e.g., through BPF redirection.
Revise the check back to check csum_start directly, and only if
CHECKSUM_PARTIAL. Do leave the check in the updated location.
Check field regardless of whether TUNNEL_CSUM is configured.
In the Linux kernel, the following vulnerability has been resolved:
drm/etnaviv: check for reaped mapping in etnaviv_iommu_unmap_gem
When the mapping is already reaped the unmap must be a no-op, as we
would otherwise try to remove the mapping twice, corrupting the involved
data structures.
In the Linux kernel, the following vulnerability has been resolved:
mm/huge_memory: Fix xarray node memory leak
If xas_split_alloc() fails to allocate the necessary nodes to complete the
xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem()
then interprets as "Please allocate more memory", not as "Please free
any unnecessary memory" (which was the intended outcome). It's confusing
to use xas_nomem() to free memory in this context, so call xas_destroy()
instead.