Author: Chanho Park <chanho61.park@samsung.com>
Date: Thu Apr 7 18:11:28 2022 +0900
arm64: Add part number for Arm Cortex-A78AE
commit 83bea32ac7ed37bbda58733de61fc9369513f9f9 upstream.
Add the MIDR part number info for the Arm Cortex-A78AE[1] and add it to
spectre-BHB affected list[2].
[1]: https://developer.arm.com/Processors/Cortex-A78AE
[2]: https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Link: https://lore.kernel.org/r/20220407091128.8700-1-chanho61.park@samsung.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Fangrui Song <maskray@google.com>
Date: Fri Feb 18 00:12:09 2022 -0800
arm64: module: remove (NOLOAD) from linker script
commit 4013e26670c590944abdab56c4fa797527b74325 upstream.
On ELF, (NOLOAD) sets the section type to SHT_NOBITS[1]. It is conceptually
inappropriate for .plt and .text.* sections which are always
SHT_PROGBITS.
In GNU ld, if PLT entries are needed, .plt will be SHT_PROGBITS anyway
and (NOLOAD) will be essentially ignored. In ld.lld, since
https://reviews.llvm.org/D118840 ("[ELF] Support (TYPE=<value>) to
customize the output section type"), ld.lld will report a `section type
mismatch` error. Just remove (NOLOAD) to fix the error.
[1] https://lld.llvm.org/ELF/linker_script.html As of today, "The
section should be marked as not loadable" on
https://sourceware.org/binutils/docs/ld/Output-Section-Type.html is
outdated for ELF.
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Fangrui Song <maskray@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220218081209.354383-1-maskray@google.com
Signed-off-by: Will Deacon <will@kernel.org>
[nathan: Fix conflicts due to lack of 1cbdf60bd1b7]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Guo Ren <guoren@kernel.org>
Date: Thu Apr 7 15:33:20 2022 +0800
arm64: patch_text: Fixup last cpu should be master
commit 31a099dbd91e69fcab55eef4be15ed7a8c984918 upstream.
These patch_text implementations are using stop_machine_cpuslocked
infrastructure with atomic cpu_count. The original idea: When the
master CPU patch_text, the others should wait for it. But current
implementation is using the first CPU as master, which couldn't
guarantee the remaining CPUs are waiting. This patch changes the
last CPU as the master to solve the potential risk.
Fixes: ae16480785de ("arm64: introduce interfaces to hotpatch kernel and module code")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220407073323.743224-2-guoren@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Christian Lamparter <chunkeey@gmail.com>
Date: Sat Mar 19 21:11:02 2022 +0100
ata: sata_dwc_460ex: Fix crash due to OOB write
commit 7aa8104a554713b685db729e66511b93d989dd6a upstream.
the driver uses libata's "tag" values from in various arrays.
Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
the value of the SATA_DWC_QCMD_MAX needs to account for that.
Otherwise ATA_TAG_INTERNAL usage cause similar crashes like
this as reported by Tice Rex on the OpenWrt Forum and
reproduced (with symbols) here:
| BUG: Kernel NULL pointer dereference at 0x00000000
| Faulting instruction address: 0xc03ed4b8
| Oops: Kernel access of bad area, sig: 11 [#1]
| BE PAGE_SIZE=4K PowerPC 44x Platform
| CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
| NIP: c03ed4b8 LR: c03d27e8 CTR: c03ed36c
| REGS: cfa59950 TRAP: 0300 Not tainted (5.4.163)
| MSR: 00021000 <CE,ME> CR: 42000222 XER: 00000000
| DEAR: 00000000 ESR: 00000000
| GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
| [..]
| NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
| LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
| Call Trace:
| [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
| [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
| [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
| [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
| [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
| [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
| [...]
This is because sata_dwc_dma_xfer_complete() NULLs the
dma_pending's next neighbour "chan" (a *dma_chan struct) in
this '32' case right here (line ~735):
> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
causes the crash.
With this patch, SATA_DWC_QCMD_MAX is now set to ATA_MAX_QUEUE + 1.
This avoids the OOB. But please note, there was a worthwhile discussion
on what ATA_TAG_INTERNAL and ATA_MAX_QUEUE is. And why there should not
be a "fake" 33 command-long queue size.
Ideally, the dw driver should account for the ATA_TAG_INTERNAL.
In Damien Le Moal's words: "... having looked at the driver, it
is a bigger change than just faking a 33rd "tag" that is in fact
not a command tag at all."
Fixes: 28361c403683c ("libata: add extra internal command")
Cc: stable@kernel.org # 4.18+
BugLink: https://github.com/openwrt/openwrt/issues/9505
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Venkateswara Naralasetty <quic_vnaralas@quicinc.com>
Date: Wed Jan 19 14:49:33 2022 +0530
ath11k: fix kernel panic during unload/load ath11k modules
[ Upstream commit 22b59cb965f79ee1accf83172441c9ca0ecb632a ]
Call netif_napi_del() from ath11k_ahb_free_ext_irq() to fix
the following kernel panic when unload/load ath11k modules
for few iterations.
[ 971.201365] Unable to handle kernel paging request at virtual address 6d97a208
[ 971.204227] pgd = 594c2919
[ 971.211478] [6d97a208] *pgd=00000000
[ 971.214120] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 971.412024] CPU: 2 PID: 4435 Comm: insmod Not tainted 5.4.89 #0
[ 971.434256] Hardware name: Generic DT based system
[ 971.440165] PC is at napi_by_id+0x10/0x40
[ 971.445019] LR is at netif_napi_add+0x160/0x1dc
[ 971.743127] (napi_by_id) from [<807d89a0>] (netif_napi_add+0x160/0x1dc)
[ 971.751295] (netif_napi_add) from [<7f1209ac>] (ath11k_ahb_config_irq+0xf8/0x414 [ath11k_ahb])
[ 971.759164] (ath11k_ahb_config_irq [ath11k_ahb]) from [<7f12135c>] (ath11k_ahb_probe+0x40c/0x51c [ath11k_ahb])
[ 971.768567] (ath11k_ahb_probe [ath11k_ahb]) from [<80666864>] (platform_drv_probe+0x48/0x94)
[ 971.779670] (platform_drv_probe) from [<80664718>] (really_probe+0x1c8/0x450)
[ 971.789389] (really_probe) from [<80664cc4>] (driver_probe_device+0x15c/0x1b8)
[ 971.797547] (driver_probe_device) from [<80664f60>] (device_driver_attach+0x44/0x60)
[ 971.805795] (device_driver_attach) from [<806650a0>] (__driver_attach+0x124/0x140)
[ 971.814822] (__driver_attach) from [<80662adc>] (bus_for_each_dev+0x58/0xa4)
[ 971.823328] (bus_for_each_dev) from [<80663a2c>] (bus_add_driver+0xf0/0x1e8)
[ 971.831662] (bus_add_driver) from [<806658a4>] (driver_register+0xa8/0xf0)
[ 971.839822] (driver_register) from [<8030269c>] (do_one_initcall+0x78/0x1ac)
[ 971.847638] (do_one_initcall) from [<80392524>] (do_init_module+0x54/0x200)
[ 971.855968] (do_init_module) from [<803945b0>] (load_module+0x1e30/0x1ffc)
[ 971.864126] (load_module) from [<803948b0>] (sys_init_module+0x134/0x17c)
[ 971.871852] (sys_init_module) from [<80301000>] (ret_fast_syscall+0x0/0x50)
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.6.0.1-00760-QCAHKSWPL_SILICONZ-1
Signed-off-by: Venkateswara Naralasetty <quic_vnaralas@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/1642583973-21599-1-git-send-email-quic_vnaralas@quicinc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kalle Valo <quic_kvalo@quicinc.com>
Date: Thu Jan 27 11:01:17 2022 +0200
ath11k: mhi: use mhi_sync_power_up()
[ Upstream commit 3df6d74aedfdca919cca475d15dfdbc8b05c9e5d ]
If amss.bin was missing ath11k would crash during 'rmmod ath11k_pci'. The
reason for that was that we were using mhi_async_power_up() which does not
check any errors. But mhi_sync_power_up() on the other hand does check for
errors so let's use that to fix the crash.
I was not able to find a reason why an async version was used.
ath11k_mhi_start() (which enables state ATH11K_MHI_POWER_ON) is called from
ath11k_hif_power_up(), which can sleep. So sync version should be safe to use
here.
[ 145.569731] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
[ 145.569789] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 145.569843] CPU: 2 PID: 1628 Comm: rmmod Kdump: loaded Tainted: G W 5.16.0-wt-ath+ #567
[ 145.569898] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021
[ 145.569956] RIP: 0010:ath11k_hal_srng_access_begin+0xb5/0x2b0 [ath11k]
[ 145.570028] Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 ec 01 00 00 48 8b ab a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 <0f> b6 14 02 48 89 e8 83 e0 07 83 c0 03 45 85 ed 75 48 38 d0 7c 08
[ 145.570089] RSP: 0018:ffffc900025d7ac0 EFLAGS: 00010246
[ 145.570144] RAX: dffffc0000000000 RBX: ffff88814fca2dd8 RCX: 1ffffffff50cb455
[ 145.570196] RDX: 0000000000000000 RSI: ffff88814fca2dd8 RDI: ffff88814fca2e80
[ 145.570252] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffa8659497
[ 145.570329] R10: fffffbfff50cb292 R11: 0000000000000001 R12: ffff88814fca0000
[ 145.570410] R13: 0000000000000000 R14: ffff88814fca2798 R15: ffff88814fca2dd8
[ 145.570465] FS: 00007fa399988540(0000) GS:ffff888233e00000(0000) knlGS:0000000000000000
[ 145.570519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 145.570571] CR2: 00007fa399b51421 CR3: 0000000137898002 CR4: 00000000003706e0
[ 145.570623] Call Trace:
[ 145.570675] <TASK>
[ 145.570727] ? ath11k_ce_tx_process_cb+0x34b/0x860 [ath11k]
[ 145.570797] ath11k_ce_tx_process_cb+0x356/0x860 [ath11k]
[ 145.570864] ? tasklet_init+0x150/0x150
[ 145.570919] ? ath11k_ce_alloc_pipes+0x280/0x280 [ath11k]
[ 145.570986] ? tasklet_clear_sched+0x42/0xe0
[ 145.571042] ? tasklet_kill+0xe9/0x1b0
[ 145.571095] ? tasklet_clear_sched+0xe0/0xe0
[ 145.571148] ? irq_has_action+0x120/0x120
[ 145.571202] ath11k_ce_cleanup_pipes+0x45a/0x580 [ath11k]
[ 145.571270] ? ath11k_pci_stop+0x10e/0x170 [ath11k_pci]
[ 145.571345] ath11k_core_stop+0x8a/0xc0 [ath11k]
[ 145.571434] ath11k_core_deinit+0x9e/0x150 [ath11k]
[ 145.571499] ath11k_pci_remove+0xd2/0x260 [ath11k_pci]
[ 145.571553] pci_device_remove+0x9a/0x1c0
[ 145.571605] __device_release_driver+0x332/0x660
[ 145.571659] driver_detach+0x1e7/0x2c0
[ 145.571712] bus_remove_driver+0xe2/0x2d0
[ 145.571772] pci_unregister_driver+0x21/0x250
[ 145.571826] __do_sys_delete_module+0x30a/0x4b0
[ 145.571879] ? free_module+0xac0/0xac0
[ 145.571933] ? lockdep_hardirqs_on_prepare.part.0+0x18c/0x370
[ 145.571986] ? syscall_enter_from_user_mode+0x1d/0x50
[ 145.572039] ? lockdep_hardirqs_on+0x79/0x100
[ 145.572097] do_syscall_64+0x3b/0x90
[ 145.572153] entry_SYSCALL_64_after_hwframe+0x44/0xae
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03003-QCAHSPSWPL_V1_V2_SILICONZ_LITE-2
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220127090117.2024-2-kvalo@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Zekun Shen <bruceshenzk@gmail.com>
Date: Sun Dec 26 22:12:13 2021 -0500
ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111
[ Upstream commit 564d4eceb97eaf381dd6ef6470b06377bb50c95a ]
The bug was found during fuzzing. Stacktrace locates it in
ath5k_eeprom_convert_pcal_info_5111.
When none of the curve is selected in the loop, idx can go
up to AR5K_EEPROM_N_PD_CURVES. The line makes pd out of bound.
pd = &chinfo[pier].pd_curves[idx];
There are many OOB writes using pd later in the code. So I
added a sanity check for idx. Checks for other loops involving
AR5K_EEPROM_N_PD_CURVES are not needed as the loop index is not
used outside the loops.
The patch is NOT tested with real device.
The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k]
Write of size 1 at addr ffff8880174a4d60 by task modprobe/214
CPU: 0 PID: 214 Comm: modprobe Not tainted 5.6.0 #1
Call Trace:
dump_stack+0x76/0xa0
print_address_description.constprop.0+0x16/0x200
? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k]
? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k]
__kasan_report.cold+0x37/0x7c
? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k]
kasan_report+0xe/0x20
ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k]
? apic_timer_interrupt+0xa/0x20
? ath5k_eeprom_init_11a_pcal_freq+0xbc0/0xbc0 [ath5k]
? ath5k_pci_eeprom_read+0x228/0x3c0 [ath5k]
ath5k_eeprom_init+0x2513/0x6290 [ath5k]
? ath5k_eeprom_init_11a_pcal_freq+0xbc0/0xbc0 [ath5k]
? usleep_range+0xb8/0x100
? apic_timer_interrupt+0xa/0x20
? ath5k_eeprom_read_pcal_info_2413+0x2f20/0x2f20 [ath5k]
ath5k_hw_init+0xb60/0x1970 [ath5k]
ath5k_init_ah+0x6fe/0x2530 [ath5k]
? kasprintf+0xa6/0xe0
? ath5k_stop+0x140/0x140 [ath5k]
? _dev_notice+0xf6/0xf6
? apic_timer_interrupt+0xa/0x20
ath5k_pci_probe.cold+0x29a/0x3d6 [ath5k]
? ath5k_pci_eeprom_read+0x3c0/0x3c0 [ath5k]
? mutex_lock+0x89/0xd0
? ath5k_pci_eeprom_read+0x3c0/0x3c0 [ath5k]
local_pci_probe+0xd3/0x160
pci_device_probe+0x23f/0x3e0
? pci_device_remove+0x280/0x280
? pci_device_remove+0x280/0x280
really_probe+0x209/0x5d0
Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu>
Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/YckvDdj3mtCkDRIt@a-10-27-26-18.dynapool.vpn.nyu.edu
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date: Thu Mar 3 13:11:57 2022 -0800
Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg}
[ Upstream commit 9b392e0e0b6d026da5a62bb79a08f32e27af858e ]
This fixes attemting to print hdev->name directly which causes them to
print an error:
kernel: read_version:367: (efault): sock 000000006a3008f2
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date: Fri Mar 11 13:19:33 2022 -0800
Bluetooth: Fix use after free in hci_send_acl
[ Upstream commit f63d24baff787e13b723d86fe036f84bdbc35045 ]
This fixes the following trace caused by receiving
HCI_EV_DISCONN_PHY_LINK_COMPLETE which does call hci_conn_del without
first checking if conn->type is in fact AMP_LINK and in case it is
do properly cleanup upper layers with hci_disconn_cfm:
==================================================================
BUG: KASAN: use-after-free in hci_send_acl+0xaba/0xc50
Read of size 8 at addr ffff88800e404818 by task bluetoothd/142
CPU: 0 PID: 142 Comm: bluetoothd Not tainted
5.17.0-rc5-00006-gda4022eeac1a #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x59
print_address_description.constprop.0+0x1f/0x150
kasan_report.cold+0x7f/0x11b
hci_send_acl+0xaba/0xc50
l2cap_do_send+0x23f/0x3d0
l2cap_chan_send+0xc06/0x2cc0
l2cap_sock_sendmsg+0x201/0x2b0
sock_sendmsg+0xdc/0x110
sock_write_iter+0x20f/0x370
do_iter_readv_writev+0x343/0x690
do_iter_write+0x132/0x640
vfs_writev+0x198/0x570
do_writev+0x202/0x280
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RSP: 002b:00007ffce8a099b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3
0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 14 00 00 00 0f 05
<48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
RDX: 0000000000000001 RSI: 00007ffce8a099e0 RDI: 0000000000000015
RAX: ffffffffffffffda RBX: 00007ffce8a099e0 RCX: 00007f788fc3cf77
R10: 00007ffce8af7080 R11: 0000000000000246 R12: 000055e4ccf75580
RBP: 0000000000000015 R08: 0000000000000002 R09: 0000000000000001
</TASK>
R13: 000055e4ccf754a0 R14: 000055e4ccf75cd0 R15: 000055e4ccf4a6b0
Allocated by task 45:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
hci_chan_create+0x9a/0x2f0
l2cap_conn_add.part.0+0x1a/0xdc0
l2cap_connect_cfm+0x236/0x1000
le_conn_complete_evt+0x15a7/0x1db0
hci_le_conn_complete_evt+0x226/0x2c0
hci_le_meta_evt+0x247/0x450
hci_event_packet+0x61b/0xe90
hci_rx_work+0x4d5/0xc50
process_one_work+0x8fb/0x15a0
worker_thread+0x576/0x1240
kthread+0x29d/0x340
ret_from_fork+0x1f/0x30
Freed by task 45:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0xfb/0x130
kfree+0xac/0x350
hci_conn_cleanup+0x101/0x6a0
hci_conn_del+0x27e/0x6c0
hci_disconn_phylink_complete_evt+0xe0/0x120
hci_event_packet+0x812/0xe90
hci_rx_work+0x4d5/0xc50
process_one_work+0x8fb/0x15a0
worker_thread+0x576/0x1240
kthread+0x29d/0x340
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff88800c0f0500
The buggy address is located 24 bytes inside of
which belongs to the cache kmalloc-128 of size 128
The buggy address belongs to the page:
128-byte region [ffff88800c0f0500, ffff88800c0f0580)
flags: 0x100000000000200(slab|node=0|zone=1)
page:00000000fe45cd86 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0xc0f0
raw: 0000000000000000 0000000080100010 00000001ffffffff
0000000000000000
raw: 0100000000000200 ffffea00003a2c80 dead000000000004
ffff8880078418c0
page dumped because: kasan: bad access detected
ffff88800c0f0400: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
Memory state around the buggy address:
>ffff88800c0f0500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88800c0f0480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88800c0f0580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
==================================================================
ffff88800c0f0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Reported-by: Sönke Huster <soenke.huster@eknoes.de>
Tested-by: Sönke Huster <soenke.huster@eknoes.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Date: Fri Feb 25 07:41:52 2022 +0000
Bluetooth: use memset avoid memory leaks
[ Upstream commit d3715b2333e9a21692ba16ef8645eda584a9515d ]
Use memset to initialize structs to prevent memory leaks
in l2cap_ecred_connect
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Michael Chan <michael.chan@broadcom.com>
Date: Sat Mar 5 03:54:39 2022 -0500
bnxt_en: Eliminate unintended link toggle during FW reset
[ Upstream commit 7c492a2530c1f05441da541307c2534230dfd59b ]
If the flow control settings have been changed, a subsequent FW reset
may cause the ethernet link to toggle unnecessarily. This link toggle
will increase the down time by a few seconds.
The problem is caused by bnxt_update_phy_setting() detecting a false
mismatch in the flow control settings between the stored software
settings and the current FW settings after the FW reset. This mismatch
is caused by the AUTONEG bit added to link_info->req_flow_ctrl in an
inconsistent way in bnxt_set_pauseparam() in autoneg mode. The AUTONEG
bit should not be added to link_info->req_flow_ctrl.
Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Andy Gospodarek <gospo@broadcom.com>
Date: Fri Apr 1 20:21:11 2022 -0400
bnxt_en: reserve space inside receive page for skb_shared_info
[ Upstream commit facc173cf700e55b2ad249ecbd3a7537f7315691 ]
Insufficient space was being reserved in the page used for packet
reception, so the interface MTU could be set too large to still have
room for the contents of the packet when doing XDP redirect. This
resulted in the following message when redirecting a packet between
3520 and 3822 bytes with an MTU of 3822:
[311815.561880] XDP_WARN: xdp_update_frame_from_buff(line:200): Driver BUG: missing reserved tailroom
Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jakub Sitnicki <jakub@cloudflare.com>
Date: Sun Jan 30 12:55:17 2022 +0100
bpf: Make dst_port field in struct bpf_sock 16-bit wide
[ Upstream commit 4421a582718ab81608d8486734c18083b822390d ]
Menglong Dong reports that the documentation for the dst_port field in
struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
field is a zero-padded 16-bit integer in network byte order. The value
appears to the BPF user as if laid out in memory as so:
offsetof(struct bpf_sock, dst_port) + 0 <port MSB>
+ 8 <port LSB>
+16 0x00
+24 0x00
32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
the offset into the field is 0.
32-bit wide loads from dst_port are especially confusing. The loaded value,
after converting to host byte order with bpf_ntohl(dst_port), contains the
port number in the upper 16-bits.
Remove the confusion by splitting the field into two 16-bit fields. For
backward compatibility, allow 32-bit wide loads from offsetof(struct
bpf_sock, dst_port).
While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
Reported-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220130115518.213259-2-jakub@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Maxim Mikityanskiy <maximmi@nvidia.com>
Date: Wed Apr 6 15:41:12 2022 +0300
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
[ Upstream commit 2e8702cc0cfa1080f29fd64003c00a3e24ac38de ]
bpf_tcp_gen_syncookie looks at the IP version in the IP header and
validates the address family of the socket. It supports IPv4 packets in
AF_INET6 dual-stack sockets.
On the other hand, bpf_tcp_check_syncookie looks only at the address
family of the socket, ignoring the real IP version in headers, and
validates only the packet size. This implementation has some drawbacks:
1. Packets are not validated properly, allowing a BPF program to trick
bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
socket.
2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
up receiving a SYNACK with the cookie, but the following ACK gets
dropped.
This patch fixes these issues by changing the checks in
bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
version from the header is taken into account, and it is validated
properly with address family.
Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ethan Lien <ethanlien@synology.com>
Date: Mon Mar 7 18:00:04 2022 +0800
btrfs: fix qgroup reserve overflow the qgroup limit
commit b642b52d0b50f4d398cb4293f64992d0eed2e2ce upstream.
We use extent_changeset->bytes_changed in qgroup_reserve_data() to record
how many bytes we set for EXTENT_QGROUP_RESERVED state. Currently the
bytes_changed is set as "unsigned int", and it will overflow if we try to
fallocate a range larger than 4GiB. The result is we reserve less bytes
and eventually break the qgroup limit.
Unlike regular buffered/direct write, which we use one changeset for
each ordered extent, which can never be larger than 256M. For
fallocate, we use one changeset for the whole range, thus it no longer
respects the 256M per extent limit, and caused the problem.
The following example test script reproduces the problem:
$ cat qgroup-overflow.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Set qgroup limit to 2GiB.
btrfs quota enable $MNT
btrfs qgroup limit 2G $MNT
# Try to fallocate a 3GiB file. This should fail.
echo
echo "Try to fallocate a 3GiB file..."
fallocate -l 3G $MNT/3G.file
# Try to fallocate a 5GiB file.
echo
echo "Try to fallocate a 5GiB file..."
fallocate -l 5G $MNT/5G.file
# See we break the qgroup limit.
echo
sync
btrfs qgroup show -r $MNT
umount $MNT
When running the test:
$ ./qgroup-overflow.sh
(...)
Try to fallocate a 3GiB file...
fallocate: fallocate failed: Disk quota exceeded
Try to fallocate a 5GiB file...
qgroupid        rfer        excl    max_rfer
--------Â Â Â Â Â Â Â Â ----Â Â Â Â Â Â Â Â ----Â Â Â Â --------
0/5Â Â Â Â Â Â Â Â Â Â 5.00GiBÂ Â Â Â Â 5.00GiBÂ Â Â Â Â 2.00GiB
Since we have no control of how bytes_changed is used, it's better to
set it to u64.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Ethan Lien <ethanlien@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Kaiwen Hu <kevinhu@synology.com>
Date: Wed Mar 23 15:10:32 2022 +0800
btrfs: prevent subvol with swapfile from being deleted
commit 60021bd754c6ca0addc6817994f20290a321d8d6 upstream.
A subvolume with an active swapfile must not be deleted otherwise it
would not be possible to deactivate it.
After the subvolume is deleted, we cannot swapoff the swapfile in this
deleted subvolume because the path is unreachable. The swapfile is
still active and holding references, the filesystem cannot be unmounted.
The test looks like this:
mkfs.btrfs -f $dev > /dev/null
mount $dev $mnt
btrfs sub create $mnt/subvol
touch $mnt/subvol/swapfile
chmod 600 $mnt/subvol/swapfile
chattr +C $mnt/subvol/swapfile
dd if=/dev/zero of=$mnt/subvol/swapfile bs=1K count=4096
mkswap $mnt/subvol/swapfile
swapon $mnt/subvol/swapfile
btrfs sub delete $mnt/subvol
swapoff $mnt/subvol/swapfile # failed: No such file or directory
swapoff --all
unmount $mnt # target is busy.
To prevent above issue, we simply check that whether the subvolume
contains any active swapfile, and stop the deleting process. This
behavior is like snapshot ioctl dealing with a swapfile.
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Kaiwen Hu <kevinhu@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Wed Mar 9 13:04:13 2022 +0100
can: isotp: set default value for N_As to 50 micro seconds
[ Upstream commit 530e0d46c61314c59ecfdb8d3bcb87edbc0f85d3 ]
The N_As value describes the time a CAN frame needs on the wire when
transmitted by the CAN controller. Even very short CAN FD frames need
arround 100 usecs (bitrate 1Mbit/s, data bitrate 8Mbit/s).
Having N_As to be zero (the former default) leads to 'no CAN frame
separation' when STmin is set to zero by the receiving node. This 'burst
mode' should not be enabled by default as it could potentially dump a high
number of CAN frames into the netdev queue from the soft hrtimer context.
This does not affect the system stability but is just not nice and
cooperative.
With this N_As/frame_txtime value the 'burst mode' is disabled by default.
As user space applications usually do not set the frame_txtime element
of struct can_isotp_options the new in-kernel default is very likely
overwritten with zero when the sockopt() CAN_ISOTP_OPTS is invoked.
To make sure that a N_As value of zero is only set intentional the
value '0' is now interpreted as 'do not change the current value'.
When a frame_txtime of zero is required for testing purposes this
CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime.
Link: https://lore.kernel.org/all/20220309120416.83514-2-socketcan@hartkopp.net
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Xiubo Li <xiubli@redhat.com>
Date: Sat Mar 5 19:52:59 2022 +0800
ceph: fix memory leak in ceph_readdir when note_last_dentry returns error
[ Upstream commit f639d9867eea647005dc824e0e24f39ffc50d4e4 ]
Reset the last_readdir at the same time, and add a comment explaining
why we don't free last_readdir when dir_emit returns false.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Avraham Stern <avraham.stern@intel.com>
Date: Wed Feb 2 10:49:37 2022 +0200
cfg80211: don't add non transmitted BSS to 6GHz scanned channels
[ Upstream commit 5666ee154f4696c011dfa8544aaf5591b6b87515 ]
When adding 6GHz channels to scan request based on reported
co-located APs, don't add channels that have only APs with
"non-transmitted" BSSes if they only match the wildcard SSID since
they will be found by probing the "transmitted" BSS.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20220202104617.f6ddf099f934.I231e55885d3644f292d00dfe0f42653269f2559e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tejun Heo <tj@kernel.org>
Date: Thu Jan 6 11:02:28 2022 -1000
cgroup: Use open-time credentials for process migraton perm checks
commit 1756d7994ad85c2479af6ae5a9750b92324685af upstream.
cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's credentials which is a
potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.
This patch makes both cgroup2 and cgroup1 process migration interfaces to
use the credentials saved at the time of open (file->f_cred) instead of
current's.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Fixes: 187fe84067bd ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy")
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[OP: apply original __cgroup_procs_write() changes to cgroup_threads_write()
and cgroup_procs_write(), as the refactoring commit da70862efe006 ("cgroup:
cgroup.{procs,threads} factor out common parts") is not present in 5.10-stable]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Maxime Ripard <maxime@cerno.tech>
Date: Fri Feb 25 15:35:25 2022 +0100
clk: Enforce that disjoints limits are invalid
[ Upstream commit 10c46f2ea914202482d19cf80dcc9c321c9ff59b ]
If we were to have two users of the same clock, doing something like:
clk_set_rate_range(user1, 1000, 2000);
clk_set_rate_range(user2, 3000, 4000);
The second call would fail with -EINVAL, preventing from getting in a
situation where we end up with impossible limits.
However, this is never explicitly checked against and enforced, and
works by relying on an undocumented behaviour of clk_set_rate().
Indeed, on the first clk_set_rate_range will make sure the current clock
rate is within the new range, so it will be between 1000 and 2000Hz. On
the second clk_set_rate_range(), it will consider (rightfully), that our
current clock is outside of the 3000-4000Hz range, and will call
clk_core_set_rate_nolock() to set it to 3000Hz.
clk_core_set_rate_nolock() will then call clk_calc_new_rates() that will
eventually check that our rate 3000Hz rate is outside the min 3000Hz max
2000Hz range, will bail out, the error will propagate and we'll
eventually return -EINVAL.
This solely relies on the fact that clk_calc_new_rates(), and in
particular clk_core_determine_round_nolock(), won't modify the new rate
allowing the error to be reported. That assumption won't be true for all
drivers, and most importantly we'll break that assumption in a later
patch.
It can also be argued that we shouldn't even reach the point where we're
calling clk_core_set_rate_nolock().
Let's make an explicit check for disjoints range before we're doing
anything.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220225143534.405820-4-maxime@cerno.tech
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Adam Wujek <dev_public@wujek.eu>
Date: Fri Dec 3 14:12:07 2021 +0000
clk: si5341: fix reported clk_rate when output divider is 2
[ Upstream commit 2a8b539433e111c4de364237627ef219d2f6350a ]
SI5341_OUT_CFG_RDIV_FORCE2 shall be checked first to distinguish whether
a divider for a given output is set to 2 (SI5341_OUT_CFG_RDIV_FORCE2
is set) or the output is disabled (SI5341_OUT_CFG_RDIV_FORCE2 not set,
SI5341_OUT_R_REG is set 0).
Before the change, divider set to 2 (SI5341_OUT_R_REG set to 0) was
interpreted as output is disabled.
Signed-off-by: Adam Wujek <dev_public@wujek.eu>
Link: https://lore.kernel.org/r/20211203141125.2447520-1-dev_public@wujek.eu
Reviewed-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tony Lindgren <tony@atomide.com>
Date: Fri Feb 4 09:14:43 2022 +0200
clk: ti: Preserve node in ti_dt_clocks_register()
[ Upstream commit 80864594ff2ad002e2755daf97d46ff0c86faf1f ]
In preparation for making use of the clock-output-names, we want to
keep node around in ti_dt_clocks_register().
This change should not needed as a fix currently.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20220204071449.16762-3-tony@atomide.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jordy Zomer <jordy@jordyzomer.github.io>
Date: Sat Jan 29 15:58:39 2022 +0100
dm ioctl: prevent potential spectre v1 gadget
[ Upstream commit cd9c88da171a62c4b0f1c70e50c75845969fbc18 ]
It appears like cmd could be a Spectre v1 gadget as it's supplied by a
user and used as an array index. Prevent the contents of kernel memory
from being leaked to userspace via speculative execution by using
array_index_nospec.
Signed-off-by: Jordy Zomer <jordy@pwning.systems>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mike Snitzer <snitzer@redhat.com>
Date: Tue Feb 22 13:28:12 2022 -0500
dm: requeue IO if mapping table not yet available
[ Upstream commit fa247089de9936a46e290d4724cb5f0b845600f5 ]
Update both bio-based and request-based DM to requeue IO if the
mapping table not available.
This race of IO being submitted before the DM device ready is so
narrow, yet possible for initial table load given that the DM device's
request_queue is created prior, that it best to requeue IO to handle
this unlikely case.
Reported-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Vinod Koul <vkoul@kernel.org>
Date: Thu Mar 10 10:13:20 2022 +0530
dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error"
commit d143f939a95696d38ff800ada14402fa50ebbd6c upstream.
This reverts commit 455896c53d5b ("dmaengine: shdma: Fix runtime PM
imbalance on error") as the patch wrongly reduced the count on error and
did not bail out. So drop the count by reverting the patch .
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Miaoqian Lin <linmq006@gmail.com>
Date: Mon Apr 4 12:53:36 2022 +0000
dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe
[ Upstream commit 2b04bd4f03bba021959ca339314f6739710f0954 ]
This node pointer is returned by of_find_compatible_node() with
refcount incremented. Calling of_node_put() to aovid the refcount leak.
Fixes: d346c9e86d86 ("dpaa2-ptp: reuse ptp_qoriq driver")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220404125336.13427-1-linmq006@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Date: Wed Apr 6 21:04:43 2022 +0200
drbd: Fix five use after free bugs in get_initial_state
[ Upstream commit aadb22ba2f656581b2f733deb3a467c48cc618f6 ]
In get_initial_state, it calls notify_initial_state_done(skb,..) if
cb->args[5]==1. If genlmsg_put() failed in notify_initial_state_done(),
the skb will be freed by nlmsg_free(skb).
Then get_initial_state will goto out and the freed skb will be used by
return value skb->len, which is a uaf bug.
What's worse, the same problem goes even further: skb can also be
freed in the notify_*_state_change -> notify_*_state calls below.
Thus 4 additional uaf bugs happened.
My patch lets the problem callee functions: notify_initial_state_done
and notify_*_state_change return an error code if errors happen.
So that the error codes could be propagated and the uaf bugs can be avoid.
v2 reports a compilation warning. This v3 fixed this warning and built
successfully in my local environment with no additional warnings.
v2: https://lore.kernel.org/patchwork/patch/1435218/
Fixes: a29728463b254 ("drbd: Backport the "events2" command")
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Guilherme G. Piccoli <gpiccoli@igalia.com>
Date: Tue Mar 15 17:35:35 2022 -0300
Drivers: hv: vmbus: Fix potential crash on module unload
[ Upstream commit 792f232d57ff28bbd5f9c4abe0466b23d5879dc8 ]
The vmbus driver relies on the panic notifier infrastructure to perform
some operations when a panic event is detected. Since vmbus can be built
as module, it is required that the driver handles both registering and
unregistering such panic notifier callback.
After commit 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback")
though, the panic notifier registration is done unconditionally in the module
initialization routine whereas the unregistering procedure is conditionally
guarded and executes only if HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE capability
is set.
This patch fixes that by unconditionally unregistering the panic notifier
in the module's exit routine as well.
Fixes: 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220315203535.682306-1-gpiccoli@igalia.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Date: Mon Mar 28 17:44:57 2022 +0200
Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()
commit eaa03d34535872d29004cb5cf77dc9dec1ba9a25 upstream.
Following the recommendation in Documentation/memory-barriers.txt for
virtual machine guests.
Fixes: 8b6a877c060ed ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20220328154457.100872-1-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Xin Xiong <xiongx18@fudan.edu.cn>
Date: Fri Jan 21 15:46:23 2022 -0500
drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj
[ Upstream commit dfced44f122c500004a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence obj, which is bumped
earlier by amdgpu_cs_get_fence(). This may result in reference count
leaks.
Fix it by decreasing the refcount of specific object before returning
the error code.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Dale Zhao <dale.zhao@amd.com>
Date: Tue Dec 28 16:50:28 2021 +0800
drm/amd/display: Add signal type check when verify stream backends same
[ Upstream commit 047db281c026de5971cedb5bb486aa29bd16a39d ]
[Why]
For allow eDP hot-plug feature, the stream signal may change to VIRTUAL
when plug-out and back to eDP when plug-in. OS will still setPathMode
with same timing for each plugging, but eDP gets no stream update as we
don't check signal type changing back as keeping it VIRTUAL. It's also
unsafe for future cases that stream signal is switched with same timing.
[How]
Check stream signal type change include previous HDMI signal case.
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Dale Zhao <dale.zhao@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Alex Deucher <alexander.deucher@amd.com>
Date: Fri Apr 1 11:08:48 2022 -0400
drm/amdgpu/smu10: fix SoC/fclk units in auto mode
commit 2f25d8ce09b7ba5d769c132ba3d4eb84a941d2cb upstream.
SMU takes clock limits in Mhz units. socclk and fclk were
using 10 khz units in some cases. Switch to Mhz units.
Fixes higher than required SoC clocks.
Fixes: 97cf32996c46d9 ("drm/amd/pm: Removed fixed clock in auto mode DPM")
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed Mar 16 11:41:48 2022 +0300
drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()
[ Upstream commit 1647b54ed55d4d48c7199d439f8834626576cbe9 ]
This post-op should be a pre-op so that we do not pass -1 as the bit
number to test_bit(). The current code will loop downwards from 63 to
-1. After changing to a pre-op, it loops from 63 to 0.
Fixes: 71c37505e7ea ("drm/amdgpu/gfx: move more common KIQ code to amdgpu_gfx.c")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Date: Thu Feb 3 21:18:21 2022 -0500
drm/amdgpu: Fix recursive locking warning
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]
Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.
[ +0.000003] WARNING: possible recursive locking detected
[ +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted
[ +0.000004] --------------------------------------------
[ +0.000002] python/4822 is trying to acquire lock:
[ +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000203]
but task is already holding lock:
[ +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[ +0.000017]
other info that might help us debug this:
[ +0.000002] Possible unsafe locking scenario:
[ +0.000003] CPU0
[ +0.000002] ----
[ +0.000002] lock(reservation_ww_class_mutex);
[ +0.000004] lock(reservation_ww_class_mutex);
[ +0.000003]
*** DEADLOCK ***
[ +0.000002] May be due to missing lock nesting notation
[ +0.000003] 7 locks held by python/4822:
[ +0.000003] #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at:
kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu]
[ +0.000232] #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu]
[ +0.000241] #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu]
[ +0.000236] #3: ffffb2b35606fd28
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu]
[ +0.000235] #4: ffff932cbb7181f8
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[ +0.000015] #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at:
drm_dev_enter+0x5/0xa0 [drm]
[ +0.000038] #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3},
at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu]
[ +0.000195]
stack backtrace:
[ +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted
5.13.0-kfd-rajneesh #1030
[ +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02
08/29/2018
[ +0.000003] Call Trace:
[ +0.000003] dump_stack+0x6d/0x89
[ +0.000010] __lock_acquire+0xb93/0x1a90
[ +0.000009] lock_acquire+0x25d/0x2d0
[ +0.000005] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000184] ? lock_is_held_type+0xa2/0x110
[ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060
[ +0.000007] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000183] ? lock_release+0x13f/0x270
[ +0.000005] ? lock_is_held_type+0xa2/0x110
[ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm]
[ +0.000010] amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu]
[ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu]
[ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu]
[ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu]
[ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu]
[ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu]
[ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu]
[ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60
[amdgpu]
[ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu]
[ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu]
[ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu]
[ +0.000216] ? lock_release+0x13f/0x270
[ +0.000006] ? __fget_files+0x107/0x1e0
[ +0.000007] __x64_sys_ioctl+0x8b/0xd0
[ +0.000007] do_syscall_64+0x36/0x70
[ +0.000004] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ +0.000007] RIP: 0033:0x7fbff90a7317
[ +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[ +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX:
00007fbff90a7317
[ +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI:
0000000000000004
[ +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09:
00007fbcc402d880
[ +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12:
00000000c0184b18
[ +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15:
00007fbcc402d820
Cc: Christian König <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lee Jones <lee.jones@linaro.org>
Date: Thu Mar 31 13:21:17 2022 +0100
drm/amdkfd: Create file descriptor after client is added to smi_clients list
commit e79a2398e1b2d47060474dca291542368183bc0f upstream.
This ensures userspace cannot prematurely clean-up the client before
it is fully initialised which has been proven to cause issues in the
past.
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Alex Deucher <alexander.deucher@amd.com>
Date: Fri Feb 18 15:40:12 2022 -0500
drm/amdkfd: make CRAT table missing message informational only
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: José Expósito <jose.exposito89@gmail.com>
Date: Sat Jan 8 17:52:30 2022 +0100
drm/imx: Fix memory leak in imx_pd_connector_get_modes
[ Upstream commit bce81feb03a20fca7bbdd1c4af16b4e9d5c0e1d3 ]
Avoid leaking the display mode variable if of_get_drm_display_mode
fails.
Fixes: 76ecd9c9fb24 ("drm/imx: parallel-display: check return code from of_get_drm_display_mode()")
Addresses-Coverity-ID: 1443943 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://lore.kernel.org/r/20220108165230.44610-1-jose.exposito89@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Date: Wed Jan 5 15:47:29 2022 +0800
drm/imx: imx-ldb: Check for null pointer after calling kmemdup
[ Upstream commit 8027a9ad9b3568c5eb49c968ad6c97f279d76730 ]
As the possible failure of the allocation, kmemdup() may return NULL
pointer.
Therefore, it should be better to check the return value of kmemdup()
and return error if fails.
Fixes: dc80d7038883 ("drm/imx-ldb: Add support to drm-bridge")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://lore.kernel.org/r/20220105074729.2363657-1-jiasheng@iscas.ac.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Karol Herbst <kherbst@redhat.com>
Date: Tue Mar 22 13:48:00 2022 +0100
drm/nouveau/pmu: Add missing callbacks for Tegra devices
commit 38d4e5cf5b08798f093374e53c2f4609d5382dd5 upstream.
Fixes a crash booting on those platforms with nouveau.
Fixes: 4cdd2450bf73 ("drm/nouveau/pmu/gm200-: use alternate falcon reset sequence")
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.17+
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220322124800.2605463-1-kherbst@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Anisse Astier <anisse@astier.eu>
Date: Wed Dec 29 23:22:00 2021 +0100
drm: Add orientation quirk for GPD Win Max
[ Upstream commit 0b464ca3e0dd3cec65f28bc6d396d82f19080f69 ]
Panel is 800x1280, but mounted on a laptop form factor, sideways.
Signed-off-by: Anisse Astier <anisse@astier.eu>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211229222200.53128-3-anisse@astier.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Andreas Gruenbacher <agruenba@redhat.com>
Date: Wed Oct 21 16:37:54 2020 +0200
gfs2: Check for active reservation in gfs2_release
[ Upstream commit 0ec9b9ea4f83303bfd8f052a3d8b2bd179b002e1 ]
In gfs2_release, check if the inode has an active reservation to avoid
unnecessary lock taking.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Bob Peterson <rpeterso@redhat.com>
Date: Tue Jan 18 09:30:18 2022 -0500
gfs2: Fix gfs2_release for non-writers regression
[ Upstream commit d3add1a9519dcacd6e644ecac741c56cf18b67f5 ]
When a file is opened for writing, the vfs code (do_dentry_open)
calls get_write_access for the inode, thus incrementing the inode's write
count. That writer normally then creates a multi-block reservation for
the inode (i_res) that can be re-used by other writers, which speeds up
writes for applications that stupidly loop on open/write/close.
When the writes are all done, the multi-block reservation should be
deleted when the file is closed by the last "writer."
Commit 0ec9b9ea4f83 broke that concept when it moved the call to
gfs2_rs_delete before the check for FMODE_WRITE. Non-writers have no
business removing the multi-block reservations of writers. In fact, if
someone opens and closes the file for RO while a writer has a
multi-block reservation, the RO closer will delete the reservation
midway through the write, and this results in:
kernel BUG at fs/gfs2/rgrp.c:677! (or thereabouts) which is:
BUG_ON(rs->rs_requested); from function gfs2_rs_deltree.
This patch moves the check back inside the check for FMODE_WRITE.
Fixes: 0ec9b9ea4f83 ("gfs2: Check for active reservation in gfs2_release")
Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Andreas Gruenbacher <agruenba@redhat.com>
Date: Fri Dec 10 14:43:36 2021 +0100
gfs2: gfs2_setattr_size error path fix
[ Upstream commit 7336905a89f19173bf9301cd50a24421162f417c ]
When gfs2_setattr_size() fails, it calls gfs2_rs_delete(ip, NULL) to get
rid of any reservations the inode may have. Instead, it should pass in
the inode's write count as the second parameter to allow
gfs2_rs_delete() to figure out if the inode has any writers left.
In a next step, there are two instances of gfs2_rs_delete(ip, NULL) left
where we know that there can be no other users of the inode. Replace
those with gfs2_rs_deltree(&ip->i_res) to avoid the unnecessary write
count check.
With that, gfs2_rs_delete() is only called with the inode's actual write
count, so get rid of the second parameter.
Fixes: a097dc7e24cb ("GFS2: Make rgrp reservations part of the gfs2_inode structure")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Shreeya Patel <shreeya.patel@collabora.com>
Date: Mon Mar 21 19:02:41 2022 +0530
gpio: Restrict usage of GPIO chip irq members before initialization
commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320 upstream.
GPIO chip irq members are exposed before they could be completely
initialized and this leads to race conditions.
One such issue was observed for the gc->irq.domain variable which
was accessed through the I2C interface in gpiochip_to_irq() before
it could be initialized by gpiochip_add_irqchip(). This resulted in
Kernel NULL pointer dereference.
Following are the logs for reference :-
kernel: Call Trace:
kernel: gpiod_to_irq+0x53/0x70
kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0
kernel: i2c_acpi_get_irq+0xc0/0xd0
kernel: i2c_device_probe+0x28a/0x2a0
kernel: really_probe+0xf2/0x460
kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0
To avoid such scenarios, restrict usage of GPIO chip irq members before
they are completely initialized.
Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Niels Dossche <dossche.niels@gmail.com>
Date: Mon Feb 28 17:53:30 2022 +0100
IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition
[ Upstream commit 4d809f69695d4e7d1378b3a072fa9aef23123018 ]
The documentation of the function rvt_error_qp says both r_lock and s_lock
need to be held when calling that function. It also asserts using lockdep
that both of those locks are held. However, the commit I referenced in
Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no
longer covered by r_lock. This results in the lockdep assertion failing
and also possibly in a race condition.
Fixes: d757c60eca9b ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error")
Link: https://lore.kernel.org/r/20220228165330.41546-1-dossche.niels@gmail.com
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ivan Vecera <ivecera@redhat.com>
Date: Thu Mar 31 09:20:06 2022 -0700
ice: Clear default forwarding VSI during VSI release
[ Upstream commit bd8c624c0cd59de0032752ba3001c107bba97f7b ]
VSI is set as default forwarding one when promisc mode is set for
PF interface, when PF is switched to switchdev mode or when VF
driver asks to enable allmulticast or promisc mode for the VF
interface (when vf-true-promisc-support priv flag is off).
The third case is buggy because in that case VSI associated with
VF remains as default one after VF removal.
Reproducer:
1. Create VF
echo 1 > sys/class/net/ens7f0/device/sriov_numvfs
2. Enable allmulticast or promisc mode on VF
ip link set ens7f0v0 allmulticast on
ip link set ens7f0v0 promisc on
3. Delete VF
echo 0 > sys/class/net/ens7f0/device/sriov_numvfs
4. Try to enable promisc mode on PF
ip link set ens7f0 promisc on
Although it looks that promisc mode on PF is enabled the opposite
is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC
handling first checks if any other VSI is set as default forwarding
one and if so the function does not do anything. At this point
it is not possible to enable promisc mode on PF without re-probe
device.
To resolve the issue this patch clear default forwarding VSI
during ice_vsi_release() when the VSI to be released is the default
one.
Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Date: Mon Apr 4 11:35:48 2022 -0700
ice: Do not skip not enabled queues in ice_vc_dis_qs_msg
[ Upstream commit 05ef6813b234db3196f083b91db3963f040b65bb ]
Disable check for queue being enabled in ice_vc_dis_qs_msg, because
there could be a case when queues were created, but were not enabled.
We still need to delete those queues.
Normal workflow for VF looks like:
Enable path:
VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10)
VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6)
VIRTCHNL_OP_ENABLE_QUEUES (opcode 8)
Disable path:
VIRTCHNL_OP_DISABLE_QUEUES (opcode 9)
VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11)
The issue appears only in stress conditions when VF is enabled and
disabled very fast.
Eventually there will be a case, when queues are created by
VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by
VIRTCHNL_OP_ENABLE_QUEUES.
In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES,
because there is a check whether queues are enabled in
ice_vc_dis_qs_msg.
When we bring up the VF again, we will see the "Failed to set LAN Tx queue
context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This
happens because old 16 queues were not deleted and VF requests to create
16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to
find a parent node for first newly requested queue (because all nodes
are allocated to 16 old queues).
Testing Hints:
Just enable and disable VF fast enough, so it would be disabled before
reaching VIRTCHNL_OP_ENABLE_QUEUES.
while true; do
ip link set dev ens785f0v0 up
sleep 0.065 # adjust delay value for you machine
ip link set dev ens785f0v0 down
done
Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Date: Mon Apr 4 11:35:47 2022 -0700
ice: Set txq_teid to ICE_INVAL_TEID on ring creation
[ Upstream commit ccfee1822042b87e5135d33cad8ea353e64612d2 ]
When VF is freshly created, but not brought up, ring->txq_teid
value is by default set to 0.
But 0 is a valid TEID. On some platforms the Root Node of
Tx scheduler has a TEID = 0. This can cause issues as shown below.
The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).
Testing Hints:
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
ip link set dev ens785f0v0 up
ip link set dev ens785f0v0 down
If we have freshly created VF and quickly turn it on and off, so there
would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then
VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error:
[ 639.531454] disable queue 89 failed 14
[ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR
[ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5
The reason for the fail is that we are trying to send AQ command to
delete queue 89, which has never been created and receive an "invalid
argument" error from firmware.
As this queue has never been created, it's teid and ring->txq_teid
have default value 0.
ice_dis_vsi_txq has a check against non-existent queues:
node = ice_sched_find_node_by_teid(pi->root, q_teids[i]);
if (!node)
continue;
But on some platforms the Root Node of Tx scheduler has a teid = 0.
Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is
pi->root), and we go further to submit an erroneous request to firmware.
Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7")
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Thu Mar 17 19:36:27 2022 +0100
ice: synchronize_rcu() when terminating rings
[ Upstream commit f9124c68f05ffdb87a47e3ea6d5fae9dad7cb6eb ]
Unfortunately, the ice driver doesn't respect the RCU critical section that
XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to
paths that destroy resources that might be in use.
This was addressed in other AF_XDP ZC enabled drivers, for reference see
for example commit b3873a5be757 ("net/i40e: Fix concurrency issues
between config flow and XSK")
Fixes: efc2214b6047 ("ice: Add support for XDP")
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Randy Dunlap <rdunlap@infradead.org>
Date: Wed Mar 23 16:06:14 2022 -0700
init/main.c: return 1 from handled __setup() functions
[ Upstream commit f9a40b0890658330c83c95511f9d6b396610defc ]
initcall_blacklist() should return 1 to indicate that it handled its
cmdline arguments.
set_debug_rodata() should return 1 to indicate that it handled its
cmdline arguments. Print a warning if the option string is invalid.
This prevents these strings from being added to the 'init' program's
environment as they are not init arguments/parameters.
Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Wed Apr 6 12:43:58 2022 +0100
io_uring: don't touch scm_fp_list after queueing skb
[ Upstream commit a07211e3001435fe8591b992464cd8d5e3c98c5a ]
It's safer to not touch scm_fp_list after we queued an skb to which it
was assigned, there might be races lurking if we screw subtle sync
guarantees on the io_uring side.
Fixes: 6b06314c47e14 ("io_uring: add file set registration")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jens Axboe <axboe@kernel.dk>
Date: Fri Apr 8 11:08:58 2022 -0600
io_uring: fix race between timeout flush and removal
commit e677edbcabee849bfdd43f1602bccbecf736a646 upstream.
io_flush_timeouts() assumes the timeout isn't in progress of triggering
or being removed/canceled, so it unconditionally removes it from the
timeout list and attempts to cancel it.
Leave it on the list and let the normal timeout cancelation take care
of it.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Zhou Guanghui <zhouguanghui1@huawei.com>
Date: Wed Jan 19 07:07:54 2022 +0000
iommu/arm-smmu-v3: fix event handling soft lockup
[ Upstream commit 30de2b541af98179780054836b48825fcfba4408 ]
During event processing, events are read from the event queue one
by one until the queue is empty.If the master device continuously
requests address access at the same time and the SMMU generates
events, the cyclic processing of the event takes a long time and
softlockup warnings may be reported.
arm-smmu-v3 arm-smmu-v3.34.auto: event 0x0a received:
arm-smmu-v3 arm-smmu-v3.34.auto: 0x00007f220000280a
arm-smmu-v3 arm-smmu-v3.34.auto: 0x000010000000007e
arm-smmu-v3 arm-smmu-v3.34.auto: 0x00000000034e8670
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [irq/268-arm-smm:247]
Call trace:
_dev_info+0x7c/0xa0
arm_smmu_evtq_thread+0x1c0/0x230
irq_thread_fn+0x30/0x80
irq_thread+0x128/0x210
kthread+0x134/0x138
ret_from_fork+0x10/0x1c
Kernel panic - not syncing: softlockup: hung tasks
Fix this by calling cond_resched() after the event information is
printed.
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Link: https://lore.kernel.org/r/20220119070754.26528-1-zhouguanghui1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Tony Lindgren <tony@atomide.com>
Date: Thu Mar 31 09:23:01 2022 +0300
iommu/omap: Fix regression in probe for NULL pointer dereference
[ Upstream commit 71ff461c3f41f6465434b9e980c01782763e7ad8 ]
Commit 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") started
triggering a NULL pointer dereference for some omap variants:
__iommu_probe_device from probe_iommu_group+0x2c/0x38
probe_iommu_group from bus_for_each_dev+0x74/0xbc
bus_for_each_dev from bus_iommu_probe+0x34/0x2e8
bus_iommu_probe from bus_set_iommu+0x80/0xc8
bus_set_iommu from omap_iommu_init+0x88/0xcc
omap_iommu_init from do_one_initcall+0x44/0x24
This is caused by omap iommu probe returning 0 instead of ERR_PTR(-ENODEV)
as noted by Jason Gunthorpe <jgg@ziepe.ca>.
Looks like the regression already happened with an earlier commit
6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs")
that changed the function return type and missed converting one place.
Cc: Drew Fustini <dfustini@baylibre.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Suman Anna <s-anna@ti.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Fixes: 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs")
Fixes: 3f6634d997db ("iommu: Use right way to retrieve iommu_ops")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-by: Drew Fustini <dfustini@baylibre.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220331062301.24269-1-tony@atomide.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ido Schimmel <idosch@nvidia.com>
Date: Sat Feb 19 17:45:19 2022 +0200
ipv4: Invalidate neighbour for broadcast address upon address addition
[ Upstream commit 0c51e12e218f20b7d976158fdc18019627326f7a ]
In case user space sends a packet destined to a broadcast address when a
matching broadcast route is not configured, the kernel will create a
unicast neighbour entry that will never be resolved [1].
When the broadcast route is configured, the unicast neighbour entry will
not be invalidated and continue to linger, resulting in packets being
dropped.
Solve this by invalidating unresolved neighbour entries for broadcast
addresses after routes for these addresses are internally configured by
the kernel. This allows the kernel to create a broadcast neighbour entry
following the next route lookup.
Another possible solution that is more generic but also more complex is
to have the ARP code register a listener to the FIB notification chain
and invalidate matching neighbour entries upon the addition of broadcast
routes.
It is also possible to wave off the issue as a user space problem, but
it seems a bit excessive to expect user space to be that intimately
familiar with the inner workings of the FIB/neighbour kernel code.
[1] https://lore.kernel.org/netdev/55a04a8f-56f3-f73c-2aea-2195923f09d1@huawei.com/
Reported-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: David Ahern <dsahern@kernel.org>
Date: Mon Apr 4 09:09:08 2022 -0600
ipv6: Fix stats accounting in ip6_pkt_drop
[ Upstream commit 1158f79f82d437093aeed87d57df0548bdd68146 ]
VRF devices are the loopbacks for VRFs, and a loopback can not be
assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should
be '||' not '&&'.
Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach")
Reported-by: Pudak, Filip <Filip.Pudak@windriver.com>
Reported-by: Xiao, Jiguang <Jiguang.Xiao@windriver.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Eric Dumazet <edumazet@google.com>
Date: Fri Feb 4 12:15:45 2022 -0800
ipv6: make mc_forwarding atomic
[ Upstream commit 145c7a793838add5e004e7d49a67654dc7eba147 ]
This fixes minor data-races in ip6_mc_input() and
batadv_mcast_mla_rtr_flags_softif_get_ipv6()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Andre Przywara <andre.przywara@arm.com>
Date: Mon Apr 4 12:08:42 2022 +0100
irqchip/gic, gic-v3: Prevent GSI to SGI translations
commit 544808f7e21cb9ccdb8f3aa7de594c05b1419061 upstream.
At the moment the GIC IRQ domain translation routine happily converts
ACPI table GSI numbers below 16 to GIC SGIs (Software Generated
Interrupts aka IPIs). On the Devicetree side we explicitly forbid this
translation, actually the function will never return HWIRQs below 16 when
using a DT based domain translation.
We expect SGIs to be handled in the first part of the function, and any
further occurrence should be treated as a firmware bug, so add a check
and print to report this explicitly and avoid lengthy debug sessions.
Fixes: 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard interrupts")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220404110842.2882446-1-andre.przywara@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Marc Zyngier <maz@kernel.org>
Date: Tue Mar 15 16:50:32 2022 +0000
irqchip/gic-v3: Fix GICR_CTLR.RWP polling
commit 0df6664531a12cdd8fc873f0cac0dcb40243d3e9 upstream.
It turns out that our polling of RWP is totally wrong when checking
for it in the redistributors, as we test the *distributor* bit index,
whereas it is a different bit number in the RDs... Oopsie boo.
This is embarassing. Not only because it is wrong, but also because
it took *8 years* to notice the blunder...
Just fix the damn thing.
Fixes: 021f653791ad ("irqchip: gic-v3: Initial support for GICv3")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20220315165034.794482-2-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ilan Peer <ilan.peer@intel.com>
Date: Fri Feb 4 12:25:00 2022 +0200
iwlwifi: mvm: Correctly set fragmented EBS
[ Upstream commit d8d4dd26b9e0469baf5017f0544d852fd4e3fb6d ]
Currently, fragmented EBS was set for a channel only if the 'hb_type'
was set to fragmented or balanced scan. However, 'hb_type' is set only
in case of CDB, and thus fragmented EBS is never set for a channel for
non-CDB devices. Fix it.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20220204122220.a6165ac9b9d5.I654eafa62fd647030ae6d4f07f32c96c3171decb@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Haimin Zhang <tcs_kernel@tencent.com>
Date: Tue Mar 22 21:59:17 2022 +0800
jfs: prevent NULL deref in diFree
[ Upstream commit a53046291020ec41e09181396c1e829287b48d47 ]
Add validation check for JFS_IP(ipimap)->i_imap to prevent a NULL deref
in diFree since diFree uses it without do any validations.
When function jfs_mount calls diMount to initialize fileset inode
allocation map, it can fail and JFS_IP(ipimap)->i_imap won't be
initialized. Then it calls diFreeSpecial to close fileset inode allocation
map inode and it will flow into jfs_evict_inode. Function jfs_evict_inode
just validates JFS_SBI(inode->i_sb)->ipimap, then calls diFree. diFree use
JFS_IP(ipimap)->i_imap directly, then it will cause a NULL deref.
Reported-by: TCS Robot <tcs_robot@tencent.com>
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hou Wenlong <houwenlong.hwl@antgroup.com>
Date: Wed Mar 2 21:15:14 2022 +0800
KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
[ Upstream commit a836839cbfe60dc434c5476a7429cf2bae36415d ]
When RDTSCP is supported but RDPID is not supported in host,
RDPID emulation is available. However, __kvm_get_msr() would
only fail when RDTSCP/RDPID both are disabled in guest, so
the emulator wouldn't inject a #UD when RDPID is disabled but
RDTSCP is enabled in guest.
Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID")
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jim Mattson <jmattson@google.com>
Date: Sat Feb 26 15:41:31 2022 -0800
KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
[ Upstream commit 9b026073db2f1ad0e4d8b61c83316c8497981037 ]
AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.
When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.
This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.
For example,
root@Ubuntu1804:~# perf stat -e r26 -a sleep 1
Performance counter stats for 'system wide':
0 r26
1.001070977 seconds time elapsed
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90
Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reported-by: Lotus Fenn <lotusf@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Like Xu <likexu@tencent.com>
Reviewed-by: David Dunn <daviddunn@google.com>
Message-Id: <20220226234131.2167175-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yonghong Song <yhs@fb.com>
Date: Fri Feb 4 13:43:55 2022 -0800
libbpf: Fix build issue with llvm-readelf
[ Upstream commit 0908a66ad1124c1634c33847ac662106f7f2c198 ]
There are cases where clang compiler is packaged in a way
readelf is a symbolic link to llvm-readelf. In such cases,
llvm-readelf will be used instead of default binutils readelf,
and the following error will appear during libbpf build:
# Warning: Num of global symbols in
# /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/sharedobjs/libbpf-in.o (367)
# does NOT match with num of versioned symbols in
# /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf.so libbpf.map (383).
# Please make sure all LIBBPF_API symbols are versioned in libbpf.map.
# --- /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_global_syms.tmp ...
# +++ /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_versioned_syms.tmp ...
# @@ -324,6 +324,22 @@
# btf__str_by_offset
# btf__type_by_id
# btf__type_cnt
# +LIBBPF_0.0.1
# +LIBBPF_0.0.2
# +LIBBPF_0.0.3
# +LIBBPF_0.0.4
# +LIBBPF_0.0.5
# +LIBBPF_0.0.6
# +LIBBPF_0.0.7
# +LIBBPF_0.0.8
# +LIBBPF_0.0.9
# +LIBBPF_0.1.0
# +LIBBPF_0.2.0
# +LIBBPF_0.3.0
# +LIBBPF_0.4.0
# +LIBBPF_0.5.0
# +LIBBPF_0.6.0
# +LIBBPF_0.7.0
# libbpf_attach_type_by_name
# libbpf_find_kernel_btf
# libbpf_find_vmlinux_btf_id
# make[2]: *** [Makefile:184: check_abi] Error 1
# make[1]: *** [Makefile:140: all] Error 2
The above failure is due to different printouts for some ABS
versioned symbols. For example, with the same libbpf.so,
$ /bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS
134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0
202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0
...
$ /opt/llvm/bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS
134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0@@LIBBPF_0.5.0
202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0@@LIBBPF_0.6.0
...
The binutils readelf doesn't print out the symbol LIBBPF_* version and llvm-readelf does.
Such a difference caused libbpf build failure with llvm-readelf.
The proposed fix filters out all ABS symbols as they are not part of the comparison.
This works for both binutils readelf and llvm-readelf.
Reported-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220204214355.502108-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed Apr 13 21:01:11 2022 +0200
Linux 5.10.111
Link: https://lore.kernel.org/r/20220412062927.870347203@linuxfoundation.org
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Fox Chen <foxhlchen@gmail.com>
Link: https://lore.kernel.org/r/20220412173819.234884577@linuxfoundation.org
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Hulk Robot <hulkrobot@huawei.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Author: Guo Xuenan <guoxuenan@huawei.com>
Date: Fri Apr 8 13:08:58 2022 -0700
lz4: fix LZ4_decompress_safe_partial read out of bound
commit eafc0a02391b7b36617b36c97c4b5d6832cf5e24 upstream.
When partialDecoding, it is EOF if we've either filled the output buffer
or can't proceed with reading an offset for following match.
In some extreme corner cases when compressed data is suitably corrupted,
UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial
may lead to read out of bound problem during decoding. lz4 upstream has
fixed it [2] and this issue has been disscussed here [3] before.
current decompression routine was ported from lz4 v1.8.3, bumping
lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd
better fix it first.
[1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/
[2] https://github.com/lz4/lz4/commit/c5d6f8a8be3927c0bec91bcc58667a6cfad244ad#
[3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/
Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com
Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <cyan@fb.com>
Cc: Chengyang Fan <cy.fan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sven Eckelmann <sven@narfation.org>
Date: Mon Feb 28 01:32:40 2022 +0100
macvtap: advertise link netns via netlink
[ Upstream commit a02192151b7dbf855084c38dca380d77c7658353 ]
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages. This fixes iproute2 which otherwise resolved
the link interface to an interface in the wrong namespace.
Test commands:
ip netns add nst
ip link add dummy0 type dummy
ip link add link macvtap0 link dummy0 type macvtap
ip link set macvtap0 netns nst
ip -netns nst link show macvtap0
Before:
10: macvtap0@gre0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff
After:
10: macvtap0@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Reported-by: Leonardo Mörlein <freifunk@irrelefant.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Link: https://lore.kernel.org/r/20220228003240.1337426-1-sven@narfation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Qinghua Jin <qhjin.dev@gmail.com>
Date: Wed Mar 23 16:06:23 2022 -0700
minix: fix bug when opening a file with O_DIRECT
[ Upstream commit 9ce3c0d26c42d279b6c378a03cd6a61d828f19ca ]
Testcase:
1. create a minix file system and mount it
2. open a file on the file system with O_RDWR|O_CREAT|O_TRUNC|O_DIRECT
3. open fails with -EINVAL but leaves an empty file behind. All other
open() failures don't leave the failed open files behind.
It is hard to check the direct_IO op before creating the inode. Just as
ext4 and btrfs do, this patch will resolve the issue by allowing to
create the file with O_DIRECT but returning error when writing the file.
Link: https://lkml.kernel.org/r/20220107133626.413379-1-qhjin.dev@gmail.com
Signed-off-by: Qinghua Jin <qhjin.dev@gmail.com>
Reported-by: Colin Ian King <colin.king@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Alexander Lobakin <alobakin@pm.me>
Date: Wed Feb 23 01:30:23 2022 +0000
MIPS: fix fortify panic when copying asm exception handlers
[ Upstream commit d17b66417308996e7e64b270a3c7f3c1fbd4cfc8 ]
With KCFLAGS="-O3", I was able to trigger a fortify-source
memcpy() overflow panic on set_vi_srs_handler().
Although O3 level is not supported in the mainline, under some
conditions that may've happened with any optimization settings,
it's just a matter of inlining luck. The panic itself is correct,
more precisely, 50/50 false-positive and not at the same time.
From the one side, no real overflow happens. Exception handler
defined in asm just gets copied to some reserved places in the
memory.
But the reason behind is that C code refers to that exception
handler declares it as `char`, i.e. something of 1 byte length.
It's obvious that the asm function itself is way more than 1 byte,
so fortify logics thought we are going to past the symbol declared.
The standard way to refer to asm symbols from C code which is not
supposed to be called from C is to declare them as
`extern const u8[]`. This is fully correct from any point of view,
as any code itself is just a bunch of bytes (including 0 as it is
for syms like _stext/_etext/etc.), and the exact size is not known
at the moment of compilation.
Adjust the type of the except_vec_vi_*() and related variables.
Make set_handler() take `const` as a second argument to avoid
cast-away warnings and give a little more room for optimization.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Krzysztof Kozlowski <krzk@kernel.org>
Date: Thu Mar 17 12:52:59 2022 +0100
MIPS: ingenic: correct unit node address
[ Upstream commit 8931ddd8d6a55fcefb20f44a38ba42bb746f0b62 ]
Unit node addresses should not have leading 0x:
Warning (unit_address_format): /nemc@13410000/efuse@d0/eth-mac-addr@0x22: unit name should not have leading "0x"
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Reviewed-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hangyu Hua <hbh25y@gmail.com>
Date: Mon Feb 28 15:35:37 2022 +0800
mips: ralink: fix a refcount leak in ill_acc_of_setup()
[ Upstream commit 4a0a1436053b17e50b7c88858fb0824326641793 ]
of_node_put(np) needs to be called when pdev == NULL.
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Miaohe Lin <linmiaohe@huawei.com>
Date: Fri Apr 8 13:09:07 2022 -0700
mm/mempolicy: fix mpol_new leak in shared_policy_replace
commit 4ad099559b00ac01c3726e5c95dc3108ef47d03e upstream.
If mpol_new is allocated but not used in restart loop, mpol_new will be
freed via mpol_put before returning to the caller. But refcnt is not
initialized yet, so mpol_put could not do the right things and might
leak the unused mpol_new. This would happen if mempolicy was updated on
the shared shmem file while the sp->lock has been dropped during the
memory allocation.
This issue could be triggered easily with the below code snippet if
there are many processes doing the below work at the same time:
shmid = shmget((key_t)5566, 1024 * PAGE_SIZE, 0666|IPC_CREAT);
shm = shmat(shmid, 0, 0);
loop many times {
mbind(shm, 1024 * PAGE_SIZE, MPOL_LOCAL, mask, maxnode, 0);
mbind(shm + 128 * PAGE_SIZE, 128 * PAGE_SIZE, MPOL_DEFAULT, mask,
maxnode, 0);
}
Link: https://lkml.kernel.org/r/20220329111416.27954-1-linmiaohe@huawei.com
Fixes: 42288fe366c4 ("mm: mempolicy: Convert shared_policy mutex to spinlock")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org> [3.8]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Waiman Long <longman@redhat.com>
Date: Fri Apr 8 13:09:01 2022 -0700
mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning
commit a431dbbc540532b7465eae4fc8b56a85a9fc7d17 upstream.
The gcc 12 compiler reports a "'mem_section' will never be NULL" warning
on the following code:
static inline struct mem_section *__nr_to_section(unsigned long nr)
{
#ifdef CONFIG_SPARSEMEM_EXTREME
if (!mem_section)
return NULL;
#endif
if (!mem_section[SECTION_NR_TO_ROOT(nr)])
return NULL;
:
It happens with CONFIG_SPARSEMEM_EXTREME off. The mem_section definition
is
#ifdef CONFIG_SPARSEMEM_EXTREME
extern struct mem_section **mem_section;
#else
extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
#endif
In the !CONFIG_SPARSEMEM_EXTREME case, mem_section is a static
2-dimensional array and so the check "!mem_section[SECTION_NR_TO_ROOT(nr)]"
doesn't make sense.
Fix this warning by moving the "!mem_section[SECTION_NR_TO_ROOT(nr)]"
check up inside the CONFIG_SPARSEMEM_EXTREME block and adding an
explicit NR_SECTION_ROOTS check to make sure that there is no
out-of-bound array access.
Link: https://lkml.kernel.org/r/20220331180246.2746210-1-longman@redhat.com
Fixes: 3e347261a80b ("sparsemem extreme implementation")
Signed-off-by: Waiman Long <longman@redhat.com>
Reported-by: Justin Forbes <jforbes@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Peter Xu <peterx@redhat.com>
Date: Tue Mar 22 14:42:15 2022 -0700
mm: don't skip swap entry even if zap_details specified
commit 5abfd71d936a8aefd9f9ccd299dea7a164a5d455 upstream.
Patch series "mm: Rework zap ptes on swap entries", v5.
Patch 1 should fix a long standing bug for zap_pte_range() on
zap_details usage. The risk is we could have some swap entries skipped
while we should have zapped them.
Migration entries are not the major concern because file backed memory
always zap in the pattern that "first time without page lock, then
re-zap with page lock" hence the 2nd zap will always make sure all
migration entries are already recovered.
However there can be issues with real swap entries got skipped
errornoously. There's a reproducer provided in commit message of patch
1 for that.
Patch 2-4 are cleanups that are based on patch 1. After the whole
patchset applied, we should have a very clean view of zap_pte_range().
Only patch 1 needs to be backported to stable if necessary.
This patch (of 4):
The "details" pointer shouldn't be the token to decide whether we should
skip swap entries.
For example, when the callers specified details->zap_mapping==NULL, it
means the user wants to zap all the pages (including COWed pages), then
we need to look into swap entries because there can be private COWed
pages that was swapped out.
Skipping some swap entries when details is non-NULL may lead to wrongly
leaving some of the swap entries while we should have zapped them.
A reproducer of the problem:
===8<===
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <assert.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
int page_size;
int shmem_fd;
char *buffer;
void main(void)
{
int ret;
char val;
page_size = getpagesize();
shmem_fd = memfd_create("test", 0);
assert(shmem_fd >= 0);
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
MAP_PRIVATE, shmem_fd, 0);
assert(buffer != MAP_FAILED);
/* Write private page, swap it out */
buffer[page_size] = 1;
madvise(buffer, page_size * 2, MADV_PAGEOUT);
/* This should drop private buffer[page_size] already */
ret = ftruncate(shmem_fd, page_size);
assert(ret == 0);
/* Recover the size */
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
/* Re-read the data, it should be all zero */
val = buffer[page_size];
if (val == 0)
printf("Good\n");
else
printf("BUG\n");
}
===8<===
We don't need to touch up the pmd path, because pmd never had a issue with
swap entries. For example, shmem pmd migration will always be split into
pte level, and same to swapping on anonymous.
Add another helper should_zap_cows() so that we can also check whether we
should zap private mappings when there's no page pointer specified.
This patch drops that trick, so we handle swap ptes coherently. Meanwhile
we should do the same check upon migration entry, hwpoison entry and
genuine swap entries too.
To be explicit, we should still remember to keep the private entries if
even_cows==false, and always zap them when even_cows==true.
The issue seems to exist starting from the initial commit of git.
[peterx@redhat.com: comment tweaks]
Link: https://lkml.kernel.org/r/20220217060746.71256-2-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220217060746.71256-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-2-peterx@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mauricio Faria de Oliveira <mfo@canonical.com>
Date: Thu Apr 7 16:14:28 2022 -0300
mm: fix race between MADV_FREE reclaim and blkdev direct IO read
commit 6c8e2a256915a223f6289f651d6b926cd7135c9e upstream.
Problem:
=======
Userspace might read the zero-page instead of actual data from a direct IO
read on a block device if the buffers have been called madvise(MADV_FREE)
on earlier (this is discussed below) due to a race between page reclaim on
MADV_FREE and blkdev direct IO read.
- Race condition:
==============
During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks
if the page is not dirty, then discards its rmap PTE(s) (vs. remap back
if the page is dirty).
However, after try_to_unmap_one() returns to shrink_page_list(), it might
keep the page _anyway_ if page_ref_freeze() fails (it expects exactly
_one_ page reference, from the isolation for page reclaim).
Well, blkdev_direct_IO() gets references for all pages, and on READ
operations it only sets them dirty _later_.
So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct
IO read from block devices, and page reclaim happens during
__blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages()
returns, but BEFORE the pages are set dirty, the situation happens.
The direct IO read eventually completes. Now, when userspace reads the
buffers, the PTE is no longer there and the page fault handler
do_anonymous_page() services that with the zero-page, NOT the data!
A synthetic reproducer is provided.
- Page faults:
===========
If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't
happen, because that faults-in all pages as writeable, so
do_anonymous_page() sets up a new page/rmap/PTE, and that is used by
direct IO. The userspace reads don't fault as the PTE is there (thus
zero-page is not used/setup).
But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE
is no longer there; the subsequent page faults can't help:
The data-read from the block device probably won't generate faults due to
DMA (no MMU) but even in the case it wouldn't use DMA, that happens on
different virtual addresses (not user-mapped addresses) because `struct
bio_vec` stores `struct page` to figure addresses out (which are different
from user-mapped addresses) for the read.
Thus userspace reads (to user-mapped addresses) still fault, then
do_anonymous_page() gets another `struct page` that would address/ map to
other memory than the `struct page` used by `struct bio_vec` for the read.
(The original `struct page` is not available, since it wasn't freed, as
page_ref_freeze() failed due to more page refs. And even if it were
available, its data cannot be trusted anymore.)
Solution:
========
One solution is to check for the expected page reference count in
try_to_unmap_one().
There should be one reference from the isolation (that is also checked in
shrink_page_list() with page_ref_freeze()) plus one or more references
from page mapping(s) (put in discard: label). Further references mean
that rmap/PTE cannot be unmapped/nuked.
(Note: there might be more than one reference from mapping due to
fork()/clone() without CLONE_VM, which use the same `struct page` for
references, until the copy-on-write page gets copied.)
So, additional page references (e.g., from direct IO read) now prevent the
rmap/PTE from being unmapped/dropped; similarly to the page is not freed
per shrink_page_list()/page_ref_freeze()).
- Races and Barriers:
==================
The new check in try_to_unmap_one() should be safe in races with
bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's
done under the PTE lock.
The fast path doesn't take the lock, but it checks if the PTE has changed
and if so, it drops the reference and leaves the page for the slow path
(which does take that lock).
The fast path requires synchronization w/ full memory barrier: it writes
the page reference count first then it reads the PTE later, while
try_to_unmap() writes PTE first then it reads page refcount.
And a second barrier is needed, as the page dirty flag should not be read
before the page reference count (as in __remove_mapping()). (This can be
a load memory barrier only; no writes are involved.)
Call stack/comments:
- try_to_unmap_one()
- page_vma_mapped_walk()
- map_pte() # see pte_offset_map_lock():
pte_offset_map()
spin_lock()
- ptep_get_and_clear() # write PTE
- smp_mb() # (new barrier) GUP fast path
- page_ref_count() # (new check) read refcount
- page_vma_mapped_walk_done() # see pte_unmap_unlock():
pte_unmap()
spin_unlock()
- bio_iov_iter_get_pages()
- __bio_iov_iter_get_pages()
- iov_iter_get_pages()
- get_user_pages_fast()
- internal_get_user_pages_fast()
# fast path
- lockless_pages_from_mm()
- gup_{pgd,p4d,pud,pmd,pte}_range()
ptep = pte_offset_map() # not _lock()
pte = ptep_get_lockless(ptep)
page = pte_page(pte)
try_grab_compound_head(page) # inc refcount
# (RMW/barrier
# on success)
if (pte_val(pte) != pte_val(*ptep)) # read PTE
put_compound_head(page) # dec refcount
# go slow path
# slow path
- __gup_longterm_unlocked()
- get_user_pages_unlocked()
- __get_user_pages_locked()
- __get_user_pages()
- follow_{page,p4d,pud,pmd}_mask()
- follow_page_pte()
ptep = pte_offset_map_lock()
pte = *ptep
page = vm_normal_page(pte)
try_grab_page(page) # inc refcount
pte_unmap_unlock()
- Huge Pages:
==========
Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE
(aka lazyfree) pages are PageAnon() && !PageSwapBacked()
(madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn())
thus should reach shrink_page_list() -> split_huge_page_to_list() before
try_to_unmap[_one](), so it deals with normal pages only.
(And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens,
which should not or be rare, the page refcount should be greater than
mapcount: the head page is referenced by tail pages. That also prevents
checking the head `page` then incorrectly call page_remove_rmap(subpage)
for a tail page, that isn't even in the shrink_page_list()'s page_list (an
effect of split huge pmd/pmvw), as it might happen today in this unlikely
scenario.)
MADV_FREE'd buffers:
===================
So, back to the "if MADV_FREE pages are used as buffers" note. The case
is arguable, and subject to multiple interpretations.
The madvise(2) manual page on the MADV_FREE advice value says:
1) 'After a successful MADV_FREE ... data will be lost when
the kernel frees the pages.'
2) 'the free operation will be canceled if the caller writes
into the page' / 'subsequent writes ... will succeed and
then [the] kernel cannot free those dirtied pages'
3) 'If there is no subsequent write, the kernel can free the
pages at any time.'
Thoughts, questions, considerations... respectively:
1) Since the kernel didn't actually free the page (page_ref_freeze()
failed), should the data not have been lost? (on userspace read.)
2) Should writes performed by the direct IO read be able to cancel
the free operation?
- Should the direct IO read be considered as 'the caller' too,
as it's been requested by 'the caller'?
- Should the bio technique to dirty pages on return to userspace
(bio_check_pages_dirty() is called/used by __blkdev_direct_IO())
be considered in another/special way here?
3) Should an upcoming write from a previously requested direct IO
read be considered as a subsequent write, so the kernel should
not free the pages? (as it's known at the time of page reclaim.)
And lastly:
Technically, the last point would seem a reasonable consideration and
balance, as the madvise(2) manual page apparently (and fairly) seem to
assume that 'writes' are memory access from the userspace process (not
explicitly considering writes from the kernel or its corner cases; again,
fairly).. plus the kernel fix implementation for the corner case of the
largely 'non-atomic write' encompassed by a direct IO read operation, is
relatively simple; and it helps.
Reproducer:
==========
@ test.c (simplified, but works)
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
int main() {
int fd, i;
char *buf;
fd = open(DEV, O_RDONLY | O_DIRECT);
buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
buf[i] = 1; // init to non-zero
madvise(buf, BUF_SIZE, MADV_FREE);
read(fd, buf, BUF_SIZE);
for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
printf("%p: 0x%x\n", &buf[i], buf[i]);
return 0;
}
@ block/fops.c (formerly fs/block_dev.c)
+#include <linux/swap.h>
...
... __blkdev_direct_IO[_simple](...)
{
...
+ if (!strcmp(current->comm, "good"))
+ shrink_all_memory(ULONG_MAX);
+
ret = bio_iov_iter_get_pages(...);
+
+ if (!strcmp(current->comm, "bad"))
+ shrink_all_memory(ULONG_MAX);
...
}
@ shell
# NUM_PAGES=4
# PAGE_SIZE=$(getconf PAGE_SIZE)
# yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES}
# DEV=$(losetup -f --show test.img)
# gcc -DDEV=\"$DEV\" \
-DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \
-DPAGE_SIZE=${PAGE_SIZE} \
test.c -o test
# od -tx1 $DEV
0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a
*
0040000
# mv test good
# ./good
0x7f7c10418000: 0x79
0x7f7c10419000: 0x79
0x7f7c1041a000: 0x79
0x7f7c1041b000: 0x79
# mv good bad
# ./bad
0x7fa1b8050000: 0x0
0x7fa1b8051000: 0x0
0x7fa1b8052000: 0x0
0x7fa1b8053000: 0x0
Note: the issue is consistent on v5.17-rc3, but it's intermittent with the
support of MADV_FREE on v4.5 (60%-70% error; needs swap). [wrap
do_direct_IO() in do_blockdev_direct_IO() @ fs/direct-io.c].
- v5.17-rc3:
# for i in {1..1000}; do ./good; done \
| cut -d: -f2 | sort | uniq -c
4000 0x79
# mv good bad
# for i in {1..1000}; do ./bad; done \
| cut -d: -f2 | sort | uniq -c
4000 0x0
# free | grep Swap
Swap: 0 0 0
- v4.5:
# for i in {1..1000}; do ./good; done \
| cut -d: -f2 | sort | uniq -c
4000 0x79
# mv good bad
# for i in {1..1000}; do ./bad; done \
| cut -d: -f2 | sort | uniq -c
2702 0x0
1298 0x79
# swapoff -av
swapoff /swap
# for i in {1..1000}; do ./bad; done \
| cut -d: -f2 | sort | uniq -c
4000 0x79
Ceph/TCMalloc:
=============
For documentation purposes, the use case driving the analysis/fix is Ceph
on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to
release unused memory to the system from the mmap'ed page heap (might be
committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan()
-> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() ->
TCMalloc_SystemCommit() -> do nothing.
Note: TCMalloc switched back to MADV_DONTNEED a few commits after the
release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue
just 'disappeared' on Ceph on later Ubuntu releases but is still present
in the kernel, and can be hit by other use cases.
The observed issue seems to be the old Ceph bug #22464 [1], where checksum
mismatches are observed (and instrumentation with buffer dumps shows
zero-pages read from mmap'ed/MADV_FREE'd page ranges).
The issue in Ceph was reasonably deemed a kernel bug (comment #50) and
mostly worked around with a retry mechanism, but other parts of Ceph could
still hit that (rocksdb). Anyway, it's less likely to be hit again as
TCMalloc switched out of MADV_FREE by default.
(Some kernel versions/reports from the Ceph bug, and relation with
the MADV_FREE introduction/changes; TCMalloc versions not checked.)
- 4.4 good
- 4.5 (madv_free: introduction)
- 4.9 bad
- 4.10 good? maybe a swapless system
- 4.12 (madv_free: no longer free instantly on swapless systems)
- 4.13 bad
[1] https://tracker.ceph.com/issues/22464
Thanks:
======
Several people contributed to analysis/discussions/tests/reproducers in
the first stages when drilling down on ceph/tcmalloc/linux kernel:
- Dan Hill
- Dan Streetman
- Dongdong Tao
- Gavin Guo
- Gerald Yang
- Heitor Alves de Siqueira
- Ioanna Alifieraki
- Jay Vosburgh
- Matthew Ruffell
- Ponnuvel Palaniyappan
Reviews, suggestions, corrections, comments:
- Minchan Kim
- Yu Zhao
- Huang, Ying
- John Hubbard
- Christoph Hellwig
[mfo@canonical.com: v4]
Link: https://lkml.kernel.org/r/20220209202659.183418-1-mfo@canonical.comLink: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com
Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Dan Hill <daniel.hill@canonical.com>
Cc: Dan Streetman <dan.streetman@canonical.com>
Cc: Dongdong Tao <dongdong.tao@canonical.com>
Cc: Gavin Guo <gavin.guo@canonical.com>
Cc: Gerald Yang <gerald.yang@canonical.com>
Cc: Heitor Alves de Siqueira <halves@canonical.com>
Cc: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Ponnuvel Palaniyappan <ponnuvel.palaniyappan@canonical.com>
Cc: <stable@vger.kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[mfo: backport: replace folio/test_flag with page/flag equivalents;
real Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)") in v4.]
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yann Gautier <yann.gautier@foss.st.com>
Date: Thu Mar 17 12:19:43 2022 +0100
mmc: mmci: stm32: correctly check all elements of sg list
commit 0d319dd5a27183b75d984e3dc495248e59f99334 upstream.
Use sg and not data->sg when checking sg list elements. Else only the
first element alignment is checked.
The last element should be checked the same way, for_each_sg already set
sg to sg_next(sg).
Fixes: 46b723dd867d ("mmc: mmci: add stm32 sdmmc variant")
Cc: stable@vger.kernel.org
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Link: https://lore.kernel.org/r/20220317111944.116148-2-yann.gautier@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date: Mon Apr 4 13:49:02 2022 +0200
mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete
commit 03e59b1e2f56245163b14c69e0a830c24b1a3a47 upstream.
When HS400 tuning is complete and HS400 is going to be activated, we
have to keep the current number of TAPs and should not overwrite them
with a hardcoded value. This was probably a copy&paste mistake when
upporting HS400 support from the BSP.
Fixes: 26eb2607fa28 ("mmc: renesas_sdhi: add eMMC HS400 mode support")
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220404114902.12175-1-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri Apr 8 13:09:04 2022 -0700
mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)
commit 01e67e04c28170c47700c2c226d732bbfedb1ad0 upstream.
If an mremap() syscall with old_size=0 ends up in move_page_tables(), it
will call invalidate_range_start()/invalidate_range_end() unnecessarily,
i.e. with an empty range.
This causes a WARN in KVM's mmu_notifier. In the past, empty ranges
have been diagnosed to be off-by-one bugs, hence the WARNing. Given the
low (so far) number of unique reports, the benefits of detecting more
buggy callers seem to outweigh the cost of having to fix cases such as
this one, where userspace is doing something silly. In this particular
case, an early return from move_page_tables() is enough to fix the
issue.
Link: https://lkml.kernel.org/r/20220329173155.172439-1-pbonzini@redhat.com
Reported-by: syzbot+6bde52d89cfdf9f61425@syzkaller.appspotmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Tue Feb 1 12:29:55 2022 +0100
mt76: dma: initialize skip_unmap in mt76_dma_rx_fill
[ Upstream commit 577298ec55dfc8b9aece54520f0258c3f93a6573 ]
Even if it is only a false-positive since skip_buf0/skip_buf1 are only
used in mt76_dma_tx_cleanup_idx routine, initialize skip_unmap in
mt76_dma_rx_fill in order to fix the following UBSAN report:
[ 13.924906] UBSAN: invalid-load in linux-5.15.0/drivers/net/wireless/mediatek/mt76/dma.c:162:13
[ 13.924909] load of value 225 is not a valid value for type '_Bool'
[ 13.924912] CPU: 9 PID: 672 Comm: systemd-udevd Not tainted 5.15.0-18-generic #18-Ubuntu
[ 13.924914] Hardware name: LENOVO 21A0000CMX/21A0000CMX, BIOS R1MET43W (1.13 ) 11/05/2021
[ 13.924915] Call Trace:
[ 13.924917] <TASK>
[ 13.924920] show_stack+0x52/0x58
[ 13.924925] dump_stack_lvl+0x4a/0x5f
[ 13.924931] dump_stack+0x10/0x12
[ 13.924932] ubsan_epilogue+0x9/0x45
[ 13.924934] __ubsan_handle_load_invalid_value.cold+0x44/0x49
[ 13.924935] ? __iommu_dma_map+0x84/0xf0
[ 13.924939] mt76_dma_add_buf.constprop.0.cold+0x23/0x85 [mt76]
[ 13.924949] mt76_dma_rx_fill.isra.0+0x102/0x1f0 [mt76]
[ 13.924954] mt76_dma_init+0xc9/0x150 [mt76]
[ 13.924959] ? mt7921_dma_enable+0x110/0x110 [mt7921e]
[ 13.924966] mt7921_dma_init+0x1e3/0x260 [mt7921e]
[ 13.924970] mt7921_register_device+0x29d/0x510 [mt7921e]
[ 13.924975] mt7921_pci_probe.part.0+0x17f/0x1b0 [mt7921e]
[ 13.924980] mt7921_pci_probe+0x43/0x60 [mt7921e]
[ 13.924984] local_pci_probe+0x4b/0x90
[ 13.924987] pci_device_probe+0x115/0x1f0
[ 13.924989] really_probe+0x21e/0x420
[ 13.924992] __driver_probe_device+0x115/0x190
[ 13.924994] driver_probe_device+0x23/0xc0
[ 13.924996] __driver_attach+0xbd/0x1d0
[ 13.924998] ? __device_attach_driver+0x110/0x110
[ 13.924999] bus_for_each_dev+0x7e/0xc0
[ 13.925001] driver_attach+0x1e/0x20
[ 13.925003] bus_add_driver+0x135/0x200
[ 13.925005] driver_register+0x95/0xf0
[ 13.925008] ? 0xffffffffc0766000
[ 13.925010] __pci_register_driver+0x68/0x70
[ 13.925011] mt7921_pci_driver_init+0x23/0x1000 [mt7921e]
[ 13.925015] do_one_initcall+0x48/0x1d0
[ 13.925019] ? kmem_cache_alloc_trace+0x19e/0x2e0
[ 13.925022] do_init_module+0x62/0x280
[ 13.925025] load_module+0xac9/0xbb0
[ 13.925027] __do_sys_finit_module+0xbf/0x120
[ 13.925029] __x64_sys_finit_module+0x18/0x20
[ 13.925030] do_syscall_64+0x5c/0xc0
[ 13.925033] ? do_syscall_64+0x69/0xc0
[ 13.925034] ? sysvec_reschedule_ipi+0x78/0xe0
[ 13.925036] ? asm_sysvec_reschedule_ipi+0xa/0x20
[ 13.925039] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 13.925040] RIP: 0033:0x7fbf2b90f94d
[ 13.925045] RSP: 002b:00007ffe2ec7e5d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 13.925047] RAX: ffffffffffffffda RBX: 000056106b0634e0 RCX: 00007fbf2b90f94d
[ 13.925048] RDX: 0000000000000000 RSI: 00007fbf2baa3441 RDI: 0000000000000013
[ 13.925049] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
[ 13.925050] R10: 0000000000000013 R11: 0000000000000246 R12: 00007fbf2baa3441
[ 13.925051] R13: 000056106b062620 R14: 000056106b0610c0 R15: 000056106b0640d0
[ 13.925053] </TASK>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yang Li <yang.lee@linux.alibaba.com>
Date: Mon Feb 14 09:58:21 2022 +0800
mt76: mt7615: Fix assigning negative values to unsigned variable
[ Upstream commit 9273ffcc9a11942bd586bb42584337ef3962b692 ]
Smatch reports the following:
drivers/net/wireless/mediatek/mt76/mt7615/mac.c:1865
mt7615_mac_adjust_sensitivity() warn: assigning (-110) to unsigned
variable 'def_th'
drivers/net/wireless/mediatek/mt76/mt7615/mac.c:1865
mt7615_mac_adjust_sensitivity() warn: assigning (-98) to unsigned
variable 'def_th'
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Dust Li <dust.li@linux.alibaba.com>
Date: Tue Mar 1 17:44:00 2022 +0800
net/smc: correct settings of RMB window update limit
[ Upstream commit 6bf536eb5c8ca011d1ff57b5c5f7c57ceac06a37 ]
rmbe_update_limit is used to limit announcing receive
window updating too frequently. RFC7609 request a minimal
increase in the window size of 10% of the receive buffer
space. But current implementation used:
min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2)
and SOCK_MIN_SNDBUF / 2 == 2304 Bytes, which is almost
always less then 10% of the receive buffer space.
This causes the receiver always sending CDC message to
update its consumer cursor when it consumes more then 2K
of data. And as a result, we may encounter something like
"TCP silly window syndrome" when sending 2.5~8K message.
This patch fixes this using max(rmbe_size / 10, SOCK_MIN_SNDBUF / 2).
With this patch and SMC autocorking enabled, qperf 2K/4K/8K
tcp_bw test shows 45%/75%/40% increase in throughput respectively.
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ziyang Xuan <william.xuanziyang@huawei.com>
Date: Thu Mar 31 15:04:28 2022 +0800
net/tls: fix slab-out-of-bounds bug in decrypt_internal
[ Upstream commit 9381fe8c849cfbe50245ac01fc077554f6eaa0e2 ]
The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in
tls_set_sw_offload(). The return value of crypto_aead_ivsize()
for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes
memory space will trigger slab-out-of-bounds bug as following:
==================================================================
BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls]
Read of size 16 at addr ffff888114e84e60 by task tls/10911
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x5e/0x5db
? decrypt_internal+0x385/0xc40 [tls]
kasan_report+0xab/0x120
? decrypt_internal+0x385/0xc40 [tls]
kasan_check_range+0xf9/0x1e0
memcpy+0x20/0x60
decrypt_internal+0x385/0xc40 [tls]
? tls_get_rec+0x2e0/0x2e0 [tls]
? process_rx_list+0x1a5/0x420 [tls]
? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls]
decrypt_skb_update+0x9d/0x400 [tls]
tls_sw_recvmsg+0x3c8/0xb50 [tls]
Allocated by task 10911:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
tls_set_sw_offload+0x2eb/0xa20 [tls]
tls_setsockopt+0x68c/0x700 [tls]
__sys_setsockopt+0xfe/0x1b0
Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size
when memcpy() iv value in TLS_1_3_VERSION scenario.
Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jakub Kicinski <kuba@kernel.org>
Date: Wed Mar 9 10:29:13 2022 -0800
net: account alternate interface name memory
[ Upstream commit 5d26cff5bdbebdf98ba48217c078ff102536f134 ]
George reports that altnames can eat up kernel memory.
We should charge that memory appropriately.
Reported-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Nikolay Aleksandrov <razor@blackwall.org>
Date: Fri Apr 1 10:33:42 2022 +0300
net: ipv4: fix route with nexthop object delete warning
[ Upstream commit 6bf92d70e690b7ff12b24f4bfff5e5434d019b82 ]
FRR folks have hit a kernel warning[1] while deleting routes[2] which is
caused by trying to delete a route pointing to a nexthop id without
specifying nhid but matching on an interface. That is, a route is found
but we hit a warning while matching it. The warning is from
fib_info_nh() in include/net/nexthop.h because we run it on a fib_info
with nexthop object. The call chain is:
inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
the fib_info and triggering the warning). The fix is to not do any
matching in that branch if the fi has a nexthop object because those are
managed separately. I.e. we should match when deleting without nh spec and
should fail when deleting a nexthop route with old-style nh spec because
nexthop objects are managed separately, e.g.:
$ ip r show 1.2.3.4/32
1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
$ ip r del 1.2.3.4/32
$ ip r del 1.2.3.4/32 nhid 12
<both should work>
$ ip r del 1.2.3.4/32 dev dummy0
<should fail with ESRCH>
[1]
[ 523.462226] ------------[ cut here ]------------
[ 523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460
[ 523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd
[ 523.462274] videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse
[ 523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P OE 5.16.18-200.fc35.x86_64 #1
[ 523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020
[ 523.462303] RIP: 0010:fib_nh_match+0x210/0x460
[ 523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00
[ 523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286
[ 523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0
[ 523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380
[ 523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000
[ 523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031
[ 523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0
[ 523.462311] FS: 00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000
[ 523.462313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0
[ 523.462315] Call Trace:
[ 523.462316] <TASK>
[ 523.462320] fib_table_delete+0x1a9/0x310
[ 523.462323] inet_rtm_delroute+0x93/0x110
[ 523.462325] rtnetlink_rcv_msg+0x133/0x370
[ 523.462327] ? _copy_to_iter+0xb5/0x6f0
[ 523.462330] ? rtnl_calcit.isra.0+0x110/0x110
[ 523.462331] netlink_rcv_skb+0x50/0xf0
[ 523.462334] netlink_unicast+0x211/0x330
[ 523.462336] netlink_sendmsg+0x23f/0x480
[ 523.462338] sock_sendmsg+0x5e/0x60
[ 523.462340] ____sys_sendmsg+0x22c/0x270
[ 523.462341] ? import_iovec+0x17/0x20
[ 523.462343] ? sendmsg_copy_msghdr+0x59/0x90
[ 523.462344] ? __mod_lruvec_page_state+0x85/0x110
[ 523.462348] ___sys_sendmsg+0x81/0xc0
[ 523.462350] ? netlink_seq_start+0x70/0x70
[ 523.462352] ? __dentry_kill+0x13a/0x180
[ 523.462354] ? __fput+0xff/0x250
[ 523.462356] __sys_sendmsg+0x49/0x80
[ 523.462358] do_syscall_64+0x3b/0x90
[ 523.462361] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 523.462364] RIP: 0033:0x7f24552aa337
[ 523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337
[ 523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003
[ 523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
[ 523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040
[ 523.462373] </TASK>
[ 523.462374] ---[ end trace ba537bc16f6bf4ed ]---
[2] https://github.com/FRRouting/frr/issues/6412
Fixes: 4c7e8084fd46 ("ipv4: Plumb support for nexthop object in a fib_info")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jakub Kicinski <kuba@kernel.org>
Date: Wed Mar 9 10:29:14 2022 -0800
net: limit altnames to 64k total
[ Upstream commit 155fb43b70b5fce341347a77d1af2765d1e8fbb8 ]
Property list (altname is a link "property") is wrapped
in a nlattr. nlattrs length is 16bit so practically
speaking the list of properties can't be longer than
that, otherwise user space would have to interpret
broken netlink messages.
Prevent the problem from occurring by checking the length
of the property list before adding new entries.
Reported-by: George Shuklin <george.shuklin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Mon Apr 4 12:41:50 2022 +0200
net: openvswitch: don't send internal clone attribute to the userspace.
[ Upstream commit 3f2a3050b4a3e7f32fc0ea3c9b0183090ae00522 ]
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for
performance optimization inside the kernel. It's added by the kernel
while parsing user-provided actions and should not be sent during the
flow dump as it's not part of the uAPI.
The issue doesn't cause any significant problems to the ovs-vswitchd
process, because reported actions are not really used in the
application lifecycle and only supposed to be shown to a human via
ovs-dpctl flow dump. However, the action list is still incorrect
and causes the following error if the user wants to look at the
datapath flows:
# ovs-dpctl add-dp system@ovs-system
# ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)"
# ovs-dpctl dump-flows
<flow match>, packets:0, bytes:0, used:never,
actions:clone(bad length 4, expected -1 for: action0(01 00 00 00),
ct(commit),0)
With the fix:
# ovs-dpctl dump-flows
<flow match>, packets:0, bytes:0, used:never,
actions:clone(ct(commit),0)
Additionally fixed an incorrect attribute name in the comment.
Fixes: b233504033db ("openvswitch: kernel datapath clone action")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Ilya Maximets <i.maximets@ovn.org>
Date: Mon Apr 4 17:43:45 2022 +0200
net: openvswitch: fix leak of nested actions
[ Upstream commit 1f30fb9166d4f15a1aa19449b9da871fe0ed4796 ]
While parsing user-provided actions, openvswitch module may dynamically
allocate memory and store pointers in the internal copy of the actions.
So this memory has to be freed while destroying the actions.
Currently there are only two such actions: ct() and set(). However,
there are many actions that can hold nested lists of actions and
ovs_nla_free_flow_actions() just jumps over them leaking the memory.
For example, removal of the flow with the following actions will lead
to a leak of the memory allocated by nf_ct_tmpl_alloc():
actions:clone(ct(commit),0)
Non-freed set() action may also leak the 'dst' structure for the
tunnel info including device references.
Under certain conditions with a high rate of flow rotation that may
cause significant memory leak problem (2MB per second in reporter's
case). The problem is also hard to mitigate, because the user doesn't
have direct control over the datapath flows generated by OVS.
Fix that by iterating over all the nested actions and freeing
everything that needs to be freed recursively.
New build time assertion should protect us from this problem if new
actions will be added in the future.
Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
attributes has to be explicitly checked. sample() and clone() actions
are mixing extra attributes into the user-provided action list. That
prevents some code generalization too.
Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst")
Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html
Reported-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Michael Walle <michael@walle.cc>
Date: Tue Apr 5 14:02:33 2022 +0200
net: phy: mscc-miim: reject clause 45 register accesses
[ Upstream commit 8d90991e5bf7fdb9f264f5f579d18969913054b7 ]
The driver doesn't support clause 45 register access yet, but doesn't
check if the access is a c45 one either. This leads to spurious register
reads and writes. Add the check.
Fixes: 542671fe4d86 ("net: phy: mscc-miim: Add MDIO driver")
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Michael Walle <michael@walle.cc>
Date: Sat Mar 12 21:50:14 2022 +0100
net: sfp: add 2500base-X quirk for Lantech SFP module
[ Upstream commit 00eec9fe4f3b9588b4bfa8ef9dd0aae96407d5d7 ]
The Lantech 8330-262D-E module is 2500base-X capable, but it reports the
nominal bitrate as 2500MBd instead of 3125MBd. Add a quirk for the
module.
The following in an EEPROM dump of such a SFP with the serial number
redacted:
00: 03 04 07 00 00 00 01 20 40 0c 05 01 19 00 00 00 ???...? @????...
10: 1e 0f 00 00 4c 61 6e 74 65 63 68 20 20 20 20 20 ??..Lantech
20: 20 20 20 20 00 00 00 00 38 33 33 30 2d 32 36 32 ....8330-262
30: 44 2d 45 20 20 20 20 20 56 31 2e 30 03 52 00 cb D-E V1.0?R.?
40: 00 1a 00 00 46 43 XX XX XX XX XX XX XX XX XX XX .?..FCXXXXXXXXXX
50: 20 20 20 20 32 32 30 32 31 34 20 20 68 b0 01 98 220214 h???
60: 45 58 54 52 45 4d 45 4c 59 20 43 4f 4d 50 41 54 EXTREMELY COMPAT
70: 49 42 4c 45 20 20 20 20 20 20 20 20 20 20 20 20 IBLE
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20220312205014.4154907-1-michael@walle.cc
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Chen-Yu Tsai <wens@csie.org>
Date: Fri Apr 1 02:48:32 2022 +0800
net: stmmac: Fix unset max_speed difference between DT and non-DT platforms
[ Upstream commit c21cabb0fd0b54b8b54235fc1ecfe1195a23bcb2 ]
In commit 9cbadf094d9d ("net: stmmac: support max-speed device tree
property"), when DT platforms don't set "max-speed", max_speed is set to
-1; for non-DT platforms, it stays the default 0.
Prior to commit eeef2f6b9f6e ("net: stmmac: Start adding phylink support"),
the check for a valid max_speed setting was to check if it was greater
than zero. This commit got it right, but subsequent patches just checked
for non-zero, which is incorrect for DT platforms.
In commit 92c3807b9ac3 ("net: stmmac: convert to phylink_get_linkmodes()")
the conversion switched completely to checking for non-zero value as a
valid value, which caused 1000base-T to stop getting advertised by
default.
Instead of trying to fix all the checks, simply leave max_speed alone if
DT property parsing fails.
Fixes: 9cbadf094d9d ("net: stmmac: support max-speed device tree property")
Fixes: 92c3807b9ac3 ("net: stmmac: convert to phylink_get_linkmodes()")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220331184832.16316-1-wens@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Wang Yufen <wangyufen@huawei.com>
Date: Fri Mar 18 14:35:08 2022 +0800
netlabel: fix out-of-bounds memory accesses
[ Upstream commit f22881de730ebd472e15bcc2c0d1d46e36a87b9c ]
In calipso_map_cat_ntoh(), in the for loop, if the return value of
netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when
netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses
of bitmap[byte_offset] occurs.
The bug was found during fuzzing. The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0
Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252
CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x21c/0x230
show_stack+0x1c/0x60
dump_stack_lvl+0x64/0x7c
print_address_description.constprop.0+0x70/0x2d0
__kasan_report+0x158/0x16c
kasan_report+0x74/0x120
__asan_load1+0x80/0xa0
netlbl_bitmap_walk+0x3c/0xd0
calipso_opt_getattr+0x1a8/0x230
calipso_sock_getattr+0x218/0x340
calipso_sock_getattr+0x44/0x60
netlbl_sock_getattr+0x44/0x80
selinux_netlbl_socket_setsockopt+0x138/0x170
selinux_socket_setsockopt+0x4c/0x60
security_socket_setsockopt+0x4c/0x90
__sys_setsockopt+0xbc/0x2b0
__arm64_sys_setsockopt+0x6c/0x84
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x88/0x200
do_el0_svc+0x88/0xa0
el0_svc+0x128/0x1b0
el0t_64_sync_handler+0x9c/0x120
el0t_64_sync+0x16c/0x170
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Mon Mar 21 13:48:36 2022 -0400
NFS: Avoid writeback threads getting stuck in mempool_alloc()
[ Upstream commit 0bae835b63c53f86cdc524f5962e39409585b22c ]
In a low memory situation, allow the NFS writeback code to fail without
getting stuck in infinite loops in mempool_alloc().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Mon Mar 21 12:34:19 2022 -0400
NFS: nfsiod should not block forever in mempool_alloc()
[ Upstream commit 515dcdcd48736576c6f5c197814da6f81c60a21e ]
The concern is that since nfsiod is sometimes required to kick off a
commit, it can get locked up waiting forever in mempool_alloc() instead
of failing gracefully and leaving the commit until later.
Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
then fall back to a non-blocking attempt to allocate from the memory
pool.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: NeilBrown <neilb@suse.de>
Date: Mon Mar 7 10:41:44 2022 +1100
NFS: swap IO handling is slightly different for O_DIRECT IO
[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ]
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
possible deadlocks with "fs_reclaim". These deadlocks could, I believe,
eventuate if a buffered read on the swapfile was attempted.
We don't need coherence with the page cache for a swap file, and
buffered writes are forbidden anyway. There is no other need for
i_rwsem during direct IO. So never take it for swap_rw()
2/ generic_write_checks() explicitly forbids writes to swap, and
performs checks that are not needed for swap. So bypass it
for swap_rw().
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: NeilBrown <neilb@suse.de>
Date: Mon Mar 7 10:41:44 2022 +1100
NFS: swap-out must always use STABLE writes.
[ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ]
The commit handling code is not safe against memory-pressure deadlocks
when writing to swap. In particular, nfs_commitdata_alloc() blocks
indefinitely waiting for memory, and this can consume all available
workqueue threads.
swap-out most likely uses STABLE writes anyway as COND_STABLE indicates
that a stable write should be used if the write fits in a single
request, and it normally does. However if we ever swap with a small
wsize, or gather unusually large numbers of pages for a single write,
this might change.
For safety, make it explicit in the code that direct writes used for swap
must always use FLUSH_STABLE.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Xin Xiong <xiongx18@fudan.edu.cn>
Date: Tue Jan 25 21:10:45 2022 +0800
NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify()
[ Upstream commit b7f114edd54326f730a754547e7cfb197b5bc132 ]
[You don't often get email from xiongx18@fudan.edu.cn. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.]
The reference counting issue happens in two error paths in the
function _nfs42_proc_copy_notify(). In both error paths, the function
simply returns the error code and forgets to balance the refcount of
object `ctx`, bumped by get_nfs_open_context() earlier, which may
cause refcount leaks.
Fix it by balancing refcount of the `ctx` object before the function
returns in both error paths.
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: ChenXiaoSong <chenxiaosong2@huawei.com>
Date: Tue Mar 29 19:32:08 2022 +0800
NFSv4: fix open failure with O_ACCMODE flag
[ Upstream commit b243874f6f9568b2daf1a00e9222cacdc15e159c ]
open() with O_ACCMODE|O_DIRECT flags secondly will fail.
Reproducer:
1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
3. close(fd)
4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)
Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when
client use incorrect share access mode of 0.
Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client,
just like firstly opening.
Fixes: ce4ef7c0a8a05 ("NFS: Split out NFS v4 file operations")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Sat Jan 29 13:32:45 2022 -0500
NFSv4: Protect the state recovery thread against direct reclaim
[ Upstream commit 3e17898aca293a24dae757a440a50aa63ca29671 ]
If memory allocation triggers a direct reclaim from the state recovery
thread, then we can deadlock. Use memalloc_nofs_save/restore to ensure
that doesn't happen.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Helge Deller <deller@gmx.de>
Date: Sun Mar 27 15:46:26 2022 +0200
parisc: Fix CPU affinity for Lasi, WAX and Dino chips
[ Upstream commit 939fc856676c266c3bc347c1c1661872a3725c0f ]
Add the missing logic to allow Lasi, WAX and Dino to set the
CPU affinity. This fixes IRQ migration to other CPUs when a
CPU is shutdown which currently holds the IRQs for one of those
chips.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: John David Anglin <dave.anglin@bell.net>
Date: Tue Mar 29 18:54:36 2022 +0000
parisc: Fix patch code locking and flushing
[ Upstream commit a9fe7fa7d874a536e0540469f314772c054a0323 ]
This change fixes the following:
1) The flags variable is not initialized. Always use raw_spin_lock_irqsave
and raw_spin_unlock_irqrestore to serialize patching.
2) flush_kernel_vmap_range is primarily intended for DMA flushes. Since
__patch_text_multiple is often called with interrupts disabled, it is
better to directly call flush_kernel_dcache_range_asm and
flush_kernel_icache_range_asm. This avoids an extra call.
3) The final call to flush_icache_range is unnecessary.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Pali Rohár <pali@kernel.org>
Date: Mon Jan 10 02:49:58 2022 +0100
PCI: aardvark: Fix support for MSI interrupts
[ Upstream commit b0b0b8b897f8e12b2368e868bd7cdc5742d5c5a9 ]
Aardvark hardware supports Multi-MSI and MSI_FLAG_MULTI_PCI_MSI is already
set for the MSI chip. But when allocating MSI interrupt numbers for
Multi-MSI, the numbers need to be properly aligned, otherwise endpoint
devices send MSI interrupt with incorrect numbers.
Fix this issue by using function bitmap_find_free_region() instead of
bitmap_find_next_zero_area().
To ensure that aligned MSI interrupt numbers are used by endpoint devices,
we cannot use Linux virtual irq numbers (as they are random and not
properly aligned). Instead we need to use the aligned hwirq numbers.
This change fixes receiving MSI interrupts on Armada 3720 boards and
allows using NVMe disks which use Multi-MSI feature with 3 interrupts.
Without this NVMe disks freeze booting as linux nvme-core.c is waiting
60s for an interrupt.
Link: https://lore.kernel.org/r/20220110015018.26359-4-kabel@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Date: Fri Dec 17 17:47:08 2021 +0800
PCI: endpoint: Fix alignment fault error in copy tests
[ Upstream commit 829cc0e2ea2d61fb6c54bc3f8a17f86c56e11864 ]
The copy test uses the memcpy() to copy data between IO memory spaces.
This can trigger an alignment fault error (pasted the error logs below)
because memcpy() may use unaligned accesses on a mapped memory that is
just IO, which does not support unaligned memory accesses.
Fix it by using the correct memcpy API to copy from/to IO memory.
Alignment fault error logs:
Unable to handle kernel paging request at virtual address ffff8000101cd3c1
Mem abort info:
ESR = 0x96000021
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x21: alignment fault
Data abort info:
ISV = 0, ISS = 0x00000021
CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081773000
[ffff8000101cd3c1] pgd=1000000082410003, p4d=1000000082410003, pud=1000000082411003, pmd=1000000082412003, pte=0068004000001f13
Internal error: Oops: 96000021 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 6 Comm: kworker/0:0H Not tainted 5.15.0-rc1-next-20210914-dirty #2
Hardware name: LS1012A RDB Board (DT)
Workqueue: kpcitest pci_epf_test_cmd_handler
pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __memcpy+0x168/0x230
lr : pci_epf_test_cmd_handler+0x6f0/0xa68
sp : ffff80001003bce0
x29: ffff80001003bce0 x28: ffff800010135000 x27: ffff8000101e5000
x26: ffff8000101cd000 x25: ffff6cda941cf6c8 x24: 0000000000000000
x23: ffff6cda863f2000 x22: ffff6cda9096c800 x21: ffff800010135000
x20: ffff6cda941cf680 x19: ffffaf39fd999000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffffaf39fd2b6000
x14: 0000000000000000 x13: 15f5c8fa2f984d57 x12: 604d132b60275454
x11: 065cee5e5fb428b6 x10: aae662eb17d0cf3e x9 : 1d97c9a1b4ddef37
x8 : 7541b65edebf928c x7 : e71937c4fc595de0 x6 : b8a0e09562430d1c
x5 : ffff8000101e5401 x4 : ffff8000101cd401 x3 : ffff8000101e5380
x2 : fffffffffffffff1 x1 : ffff8000101cd3c0 x0 : ffff8000101e5000
Call trace:
__memcpy+0x168/0x230
process_one_work+0x1ec/0x370
worker_thread+0x44/0x478
kthread+0x154/0x160
ret_from_fork+0x10/0x20
Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e)
---[ end trace 568c28c7b6336335 ]---
Link: https://lore.kernel.org/r/20211217094708.28678-1-Zhiqiang.Hou@nxp.com
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Li Chen <lchen@ambarella.com>
Date: Fri Jan 21 15:48:23 2022 +0800
PCI: endpoint: Fix misused goto label
[ Upstream commit bf8d87c076f55b8b4dfdb6bc6c6b6dc0c2ccb487 ]
Fix a misused goto label jump since that can result in a memory leak.
Link: https://lore.kernel.org/r/17e7b9b9ee6.c6d9c6a02564.4545388417402742326@zohomail.com
Signed-off-by: Li Chen <lchen@ambarella.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Manivannan Sadhasivam <mani@kernel.org>
Date: Thu Feb 10 20:20:03 2022 +0530
PCI: pciehp: Add Qualcomm quirk for Command Completed erratum
[ Upstream commit 9f72d4757cbe4d1ed669192f6d23817c9e437c4b ]
The Qualcomm PCI bridge device (Device ID 0x0110) found in chipsets such as
SM8450 does not set the Command Completed bit unless writes to the Slot
Command register change "Control" bits.
This results in timeouts like below:
pcieport 0001:00:00.0: pciehp: Timeout on hotplug command 0x03c0 (issued 2020 msec ago)
Add the device to the Command Completed quirk to mark commands "completed"
immediately unless they change the "Control" bits.
Link: https://lore.kernel.org/r/20220210145003.135907-1-manivannan.sadhasivam@linaro.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Thu Apr 7 11:04:20 2022 -0300
perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13
commit 3a8a0475861a443f02e3a9b57d044fe2a0a99291 upstream.
Using -ffat-lto-objects in the python feature test when building with
clang-13 results in:
clang-13: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument]
error: command '/usr/sbin/clang' failed with exit code 1
cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory
make[2]: *** [Makefile.perf:639: /tmp/build/perf/python/perf.so] Error 1
Noticed when building on a docker.io/library/archlinux:base container.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Fri Apr 8 10:08:07 2022 -0300
perf python: Fix probing for some clang command line options
commit dd6e1fe91cdd52774ca642d1da75b58a86356b56 upstream.
The clang compiler complains about some options even without a source
file being available, while others require one, so use the simple
tools/build/feature/test-hello.c file.
Then check for the "is not supported" string in its output, in addition
to the "unknown argument" already being looked for.
This was noticed when building with clang-13 where -ffat-lto-objects
isn't supported and since we were looking just for "unknown argument"
and not providing a source code to clang, was mistakenly assumed as
being available and not being filtered to set of command line options
provided to clang, leading to a build failure.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Denis Nikitin <denik@chromium.org>
Date: Tue Mar 29 20:11:30 2022 -0700
perf session: Remap buf if there is no space for event
[ Upstream commit bc21e74d4775f883ae1f542c1f1dc7205b15d925 ]
If a perf event doesn't fit into remaining buffer space return NULL to
remap buf and fetch the event again.
Keep the logic to error out on inadequate input from fuzzing.
This fixes perf failing on ChromeOS (with 32b userspace):
$ perf report -v -i perf.data
...
prefetch_event: head=0x1fffff8 event->header_size=0x30, mmap_size=0x2000000: fuzzed or compressed perf.data?
Error:
failed to process sample
Fixes: 57fc032ad643ffd0 ("perf session: Avoid infinite loop when seeing invalid header.size")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Denis Nikitin <denik@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220330031130.2152327-1-denik@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Adrian Hunter <adrian.hunter@intel.com>
Date: Fri Apr 8 16:26:25 2022 +0300
perf tools: Fix perf's libperf_print callback
[ Upstream commit aeee9dc53ce405d2161f9915f553114e94e5b677 ]
eprintf() does not expect va_list as the type of the 4th parameter.
Use veprintf() because it does.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 428dab813a56ce94 ("libperf: Merge libperf_set_print() into libperf_init()")
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220408132625.2451452-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: James Clark <james.clark@arm.com>
Date: Fri Apr 8 15:40:56 2022 +0100
perf: arm-spe: Fix perf report --mem-mode
[ Upstream commit ffab487052054162b3b6c9c6005777ec6cfcea05 ]
Since commit bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't allow opening the file unless one of the events has
PERF_SAMPLE_DATA_SRC set.
SPE doesn't have this set even though synthetic memory data is generated
after it is decoded. Fix this issue by setting DATA_SRC on SPE events.
This has no effect on the data collected because the SPE driver doesn't
do anything with that flag and doesn't generate samples.
Fixes: bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available")
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220408144056.1955535-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Date: Sun Mar 27 13:57:33 2022 +0800
perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator
commit 2012a9e279013933885983cbe0a5fe828052563b upstream.
The bug is here:
return cluster;
The list iterator value 'cluster' will *always* be set and non-NULL
by list_for_each_entry(), so it is incorrect to assume that the
iterator value will be NULL if the list is empty or no element
is found.
To fix the bug, return 'cluster' when found, otherwise return NULL.
Cc: stable@vger.kernel.org
Fixes: 21bdbb7102ed ("perf: add qcom l2 cache perf events driver")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Link: https://lore.kernel.org/r/20220327055733.4070-1-xiam0nd.tong@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Amjad Ouled-Ameur <aouledameur@baylibre.com>
Date: Tue Jan 11 10:52:54 2022 +0100
phy: amlogic: meson8b-usb2: Use dev_err_probe()
[ Upstream commit 6466ba1898d415b527e1013bd8551a6fdfece94c ]
Use the existing dev_err_probe() helper instead of open-coding the same
operation.
Signed-off-by: Amjad Ouled-Ameur <aouledameur@baylibre.com>
Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://lore.kernel.org/r/20220111095255.176141-3-aouledameur@baylibre.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Evgeny Boger <boger@wirenboard.com>
Date: Wed Jan 12 11:47:27 2022 +0300
power: supply: axp20x_battery: properly report current when discharging
[ Upstream commit d4f408cdcd26921c1268cb8dcbe8ffb6faf837f3 ]
As stated in [1], negative current values are used for discharging
batteries.
AXP PMICs internally have two different ADC channels for shunt current
measurement: one used during charging and one during discharging.
The values reported by these ADCs are unsigned.
While the driver properly selects ADC channel to get the data from,
it doesn't apply negative sign when reporting discharging current.
[1] Documentation/ABI/testing/sysfs-class-power
Signed-off-by: Evgeny Boger <boger@wirenboard.com>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hans de Goede <hdegoede@redhat.com>
Date: Tue Feb 8 13:51:47 2022 +0100
power: supply: axp288-charger: Set Vhold to 4.4V
[ Upstream commit 5ac121b81b4051e7fc83d5b3456a5e499d5bd147 ]
The AXP288's recommended and factory default Vhold value (minimum
input voltage below which the input current draw will be reduced)
is 4.4V. This lines up with other charger IC's such as the TI
bq2419x/bq2429x series which use 4.36V or 4.44V.
For some reason some BIOS-es initialize Vhold to 4.6V or even 4.7V
which combined with the typical voltage drop over typically low
wire gauge micro-USB cables leads to the input-current getting
capped below 1A (with a 2A capable dedicated charger) based on Vhold.
This leads to slow charging, or even to the device slowly discharging
if the device is in heavy use.
As the Linux AXP288 drivers use the builtin BC1.2 charger detection
and send the input-current-limit according to the detected charger
there really is no reason not to use the recommended 4.4V Vhold.
Set Vhold to 4.4V to fix the slow charging issue on various devices.
There is one exception, the special-case of the HP X2 2-in-1s which
combine this BC1.2 capable PMIC with a Type-C port and a 5V/3A factory
provided charger with a Type-C plug which does not do BC1.2. These
have their input-current-limit hardcoded to 3A (like under Windows)
and use a higher Vhold on purpose to limit the current when used
with other chargers. To avoid touching Vhold on these HP X2 laptops
the code setting Vhold is added to an else branch of the if checking
for these models.
Note this also fixes the sofar unused VBUS_ISPOUT_VHOLD_SET_MASK
define, which was wrong.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Hangyu Hua <hbh25y@gmail.com>
Date: Wed Mar 2 10:19:59 2022 +0800
powerpc/secvar: fix refcount leak in format_show()
[ Upstream commit d601fd24e6964967f115f036a840f4f28488f63f ]
Refcount leak will happen when format_show returns failure in multiple
cases. Unified management of of_node_put can fix this problem.
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220302021959.10959-1-hbh25y@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Maxim Kiselev <bigunclemax@gmail.com>
Date: Thu Dec 30 18:11:21 2021 +0300
powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
[ Upstream commit 17846485dff91acce1ad47b508b633dffc32e838 ]
T1040RDB has two RTL8211E-VB phys which requires setting
of internal delays for correct work.
Changing the phy-connection-type property to `rgmii-id`
will fix this issue.
Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211230151123.1258321-1-bigunclemax@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date: Thu Apr 7 00:57:57 2022 +1000
powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
commit ffa0b64e3be58519ae472ea29a1a1ad681e32f48 upstream.
mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000.
Because of the way __pa() works we have:
__pa(0x8000000000000000) == 0, and therefore
virt_to_pfn(0x8000000000000000) == 0, and therefore
virt_addr_valid(0x8000000000000000) == true
Which is wrong, virt_addr_valid() should be false for vmalloc space.
In fact all vmalloc addresses that alias with a valid PFN will return
true from virt_addr_valid(). That can cause bugs with hardened usercopy
as described below by Kefeng Wang:
When running ethtool eth0 on 64-bit Book3E, a BUG occurred:
usercopy: Kernel memory exposure attempt detected from SLUB object not in SLUB page?! (offset 0, size 1048)!
kernel BUG at mm/usercopy.c:99
...
usercopy_abort+0x64/0xa0 (unreliable)
__check_heap_object+0x168/0x190
__check_object_size+0x1a0/0x200
dev_ethtool+0x2494/0x2b20
dev_ioctl+0x5d0/0x770
sock_do_ioctl+0xf0/0x1d0
sock_ioctl+0x3ec/0x5a0
__se_sys_ioctl+0xf0/0x160
system_call_exception+0xfc/0x1f0
system_call_common+0xf8/0x200
The code shows below,
data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN));
copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))
The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true
on 64-bit Book3E, which leads to the panic.
As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va
and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in
the virt_addr_valid() for 64-bit, also add upper limit check to make
sure the virt is below high_memory.
Meanwhile, for 32-bit PAGE_OFFSET is the virtual address of the start
of lowmem, high_memory is the upper low virtual address, the check is
suitable for 32-bit, this will fix the issue mentioned in commit
602946ec2f90 ("powerpc: Set max_mapnr correctly") too.
On 32-bit there is a similar problem with high memory, that was fixed in
commit 602946ec2f90 ("powerpc: Set max_mapnr correctly"), but that
commit breaks highmem and needs to be reverted.
We can't easily fix __pa(), we have code that relies on its current
behaviour. So for now add extra checks to virt_addr_valid().
For 64-bit Book3S the extra checks are not necessary, the combination of
virt_to_pfn() and pfn_valid() should yield the correct result, but they
are harmless.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Add additional change log detail]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220406145802.538416-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sourabh Jain <sourabhjain@linux.ibm.com>
Date: Fri Feb 4 14:26:01 2022 +0530
powerpc: Set crashkernel offset to mid of RMA region
[ Upstream commit 7c5ed82b800d8615cdda00729e7b62e5899f0b13 ]
On large config LPARs (having 192 and more cores), Linux fails to boot
due to insufficient memory in the first memblock. It is due to the
memory reservation for the crash kernel which starts at 128MB offset of
the first memblock. This memory reservation for the crash kernel doesn't
leave enough space in the first memblock to accommodate other essential
system resources.
The crash kernel start address was set to 128MB offset by default to
ensure that the crash kernel get some memory below the RMA region which
is used to be of size 256MB. But given that the RMA region size can be
512MB or more, setting the crash kernel offset to mid of RMA size will
leave enough space for the kernel to allocate memory for other system
resources.
Since the above crash kernel offset change is only applicable to the LPAR
platform, the LPAR feature detection is pushed before the crash kernel
reservation. The rest of LPAR specific initialization will still
be done during pseries_probe_fw_features as usual.
This patch is dependent on changes to paca allocation for boot CPU. It
expect boot CPU to discover 1T segment support which is introduced by
the patch posted here:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
Reported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yang Guang <yang.guang5@zte.com.cn>
Date: Thu Jan 27 08:02:36 2022 +0800
ptp: replace snprintf with sysfs_emit
[ Upstream commit e2cf07654efb0fd7bbcb475c6f74be7b5755a8fd ]
coccinelle report:
./drivers/ptp/ptp_sysfs.c:17:8-16:
WARNING: use scnprintf or sprintf
./drivers/ptp/ptp_sysfs.c:390:8-16:
WARNING: use scnprintf or sprintf
Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Date: Wed Apr 6 21:19:19 2022 +1000
qede: confirm skb is allocated before using
[ Upstream commit 4e910dbe36508654a896d5735b318c0b88172570 ]
qede_build_skb() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.
This results in a kernel panic because the skb to reserve is NULL.
Add a check in case build_skb() failed to allocate and return NULL.
The NULL return is handled correctly in callers to qede_build_skb().
Fixes: 8a8633978b842 ("qede: Add build_skb() support.")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Douglas Miller <doug.miller@cornelisnetworks.com>
Date: Fri Apr 8 09:35:23 2022 -0400
RDMA/hfi1: Fix use-after-free bug for mm struct
commit 2bbac98d0930e8161b1957dc0ec99de39ade1b3c upstream.
Under certain conditions, such as MPI_Abort, the hfi1 cleanup code may
represent the last reference held on the task mm.
hfi1_mmu_rb_unregister() then drops the last reference and the mm is freed
before the final use in hfi1_release_user_pages(). A new task may
allocate the mm structure while it is still being used, resulting in
problems. One manifestation is corruption of the mmap_sem counter leading
to a hang in down_write(). Another is corruption of an mm struct that is
in use by another task.
Fixes: 3d2a9d642512 ("IB/hfi1: Ensure correct mm is used at all times")
Link: https://lore.kernel.org/r/20220408133523.122165.72975.stgit@awfm-01.cornelisnetworks.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Aharon Landau <aharonl@nvidia.com>
Date: Mon Apr 4 11:58:03 2022 +0300
RDMA/mlx5: Don't remove cache MRs when a delay is needed
[ Upstream commit 84c2362fb65d69c721fec0974556378cbb36a62b ]
Don't remove MRs from the cache if need to delay the removal.
Fixes: b9358bdbc713 ("RDMA/mlx5: Fix locking in MR cache work queue")
Link: https://lore.kernel.org/r/c3087a90ff362c8796c7eaa2715128743ce36722.1649062436.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Sasha Levin <sashal@kernel.org>
Date: Sat Apr 9 12:16:15 2022 -0400
Revert "hv: utils: add PTP_1588_CLOCK to Kconfig to fix build"
This reverts commit c4dc584a2d4c8d74b054f09d67e0a076767bdee5.
On Sat, Apr 09, 2022 at 09:07:51AM -0700, Randy Dunlap wrote:
>According to https://bugzilla.kernel.org/show_bug.cgi?id=215823,
>c4dc584a2d4c8d74b054f09d67e0a076767bdee5 ("hv: utils: add PTP_1588_CLOCK to Kconfig to fix build")
>is a problem for 5.10 since CONFIG_PTP_1588_CLOCK_OPTIONAL does not exist in 5.10.
>This prevents the hyper-V NIC timestamping from working, so please revert that commit.
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Pali Rohár <pali@kernel.org>
Date: Fri Mar 18 15:14:41 2022 +0100
Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"
commit 7e2646ed47542123168d43916b84b954532e5386 upstream.
This reverts commit bb32e1987bc55ce1db400faf47d85891da3c9b9f.
Commit 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization")
contains proper fix for the issue described in commit bb32e1987bc5 ("mmc:
sdhci-xenon: fix annoying 1.8V regulator warning").
Fixes: 8d876bf472db ("mmc: sdhci-xenon: wait 5ms after set 1.8V signal enable")
Cc: stable@vger.kernel.org # 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization")
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/r/20220318141441.32329-1-pali@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: ChenXiaoSong <chenxiaosong2@huawei.com>
Date: Tue Mar 29 19:32:07 2022 +0800
Revert "NFSv4: Handle the special Linux file open access mode"
[ Upstream commit ab0fc21bc7105b54bafd85bd8b82742f9e68898a ]
This reverts commit 44942b4e457beda00981f616402a1a791e8c616e.
After secondly opening a file with O_ACCMODE|O_DIRECT flags,
nfs4_valid_open_stateid() will dereference NULL nfs4_state when lseek().
Reproducer:
1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
3. close(fd)
4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)
5. lseek(fd)
Reported-by: Lyu Tao <tao.lyu@epfl.ch>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Date: Thu Mar 3 16:50:30 2022 +0800
rtc: wm8350: Handle error for wm8350_register_irq
[ Upstream commit 43f0269b6b89c1eec4ef83c48035608f4dcdd886 ]
As the potential failure of the wm8350_register_irq(),
it should be better to check it and return error if fails.
Also, it need not free 'wm_rtc->rtc' since it will be freed
automatically.
Fixes: 077eaf5b40ec ("rtc: rtc-wm8350: add support for WM8350 RTC")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220303085030.291793-1-jiasheng@iscas.ac.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Eric Dumazet <edumazet@google.com>
Date: Mon Apr 4 11:34:39 2022 -0700
rxrpc: fix a race in rxrpc_exit_net()
[ Upstream commit 1946014ca3b19be9e485e780e862c375c6f98bad ]
Current code can lead to the following race:
CPU0 CPU1
rxrpc_exit_net()
rxrpc_peer_keepalive_worker()
if (rxnet->live)
rxnet->live = false;
del_timer_sync(&rxnet->peer_keepalive_timer);
timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_exit_net() exits while peer_keepalive_timer is still armed,
leading to use-after-free.
syzbot report was:
ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0
WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Modules linked in:
CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0
R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__debug_check_no_obj_freed lib/debugobjects.c:992 [inline]
debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023
kfree+0xd6/0x310 mm/slab.c:3809
ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176
ops_free_list net/core/net_namespace.c:174 [inline]
cleanup_net+0x591/0xb00 net/core/net_namespace.c:598
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
</TASK>
Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: linux-afs@lists.infradead.org
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Randy Dunlap <rdunlap@infradead.org>
Date: Tue Feb 22 16:06:23 2022 -0800
scsi: aha152x: Fix aha152x_setup() __setup handler return value
[ Upstream commit cc8294ec4738d25e2bb2d71f7d82a9bf7f4a157b ]
__setup() handlers should return 1 if the command line option is handled
and 0 if not (or maybe never return 0; doing so just pollutes init's
environment with strings that are not init arguments/parameters).
Return 1 from aha152x_setup() to indicate that the boot option has been
handled.
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Link: https://lore.kernel.org/r/20220223000623.5920-1-rdunlap@infradead.org
Cc: "Juergen E. Fischer" <fischer@norbit.de>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yang Guang <yang.guang5@zte.com.cn>
Date: Thu Jan 27 08:03:46 2022 +0800
scsi: bfa: Replace snprintf() with sysfs_emit()
[ Upstream commit 2245ea91fd3a04cafbe2f54911432a8657528c3b ]
coccinelle report:
./drivers/scsi/bfa/bfad_attr.c:908:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:860:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:888:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:853:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:808:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:728:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:822:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:927:9-17:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:900:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:874:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:714:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:839:8-16:
WARNING: use scnprintf or sprintf
Use sysfs_emit() instead of scnprintf() or sprintf().
Link: https://lore.kernel.org/r/def83ff75faec64ba592b867a8499b1367bae303.1643181468.git.yang.guang5@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Qi Liu <liuqi115@huawei.com>
Date: Thu Feb 24 19:51:26 2022 +0800
scsi: hisi_sas: Free irq vectors in order for v3 HW
[ Upstream commit 554fb72ee34f4732c7f694f56c3c6e67790352a0 ]
If the driver probe fails to request the channel IRQ or fatal IRQ, the
driver will free the IRQ vectors before freeing the IRQs in free_irq(),
and this will cause a kernel BUG like this:
------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:369!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Call trace:
free_msi_irqs+0x118/0x13c
pci_disable_msi+0xfc/0x120
pci_free_irq_vectors+0x24/0x3c
hisi_sas_v3_probe+0x360/0x9d0 [hisi_sas_v3_hw]
local_pci_probe+0x44/0xb0
work_for_cpu_fn+0x20/0x34
process_one_work+0x1d0/0x340
worker_thread+0x2e0/0x460
kthread+0x180/0x190
ret_from_fork+0x10/0x20
---[ end trace b88990335b610c11 ]---
So we use devm_add_action() to control the order in which we free the
vectors.
Link: https://lore.kernel.org/r/1645703489-87194-4-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Jianglei Nie <niejianglei2021@163.com>
Date: Thu Mar 3 09:51:15 2022 +0800
scsi: libfc: Fix use after free in fc_exch_abts_resp()
[ Upstream commit 271add11994ba1a334859069367e04d2be2ebdd4 ]
fc_exch_release(ep) will decrease the ep's reference count. When the
reference count reaches zero, it is freed. But ep is still used in the
following code, which will lead to a use after free.
Return after the fc_exch_release() call to avoid use after free.
Link: https://lore.kernel.org/r/20220303015115.459778-1-niejianglei2021@163.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Yang Guang <yang.guang5@zte.com.cn>
Date: Thu Jan 27 08:00:59 2022 +0800
scsi: mvsas: Replace snprintf() with sysfs_emit()
[ Upstream commit 0ad3867b0f13e45cfee5a1298bfd40eef096116c ]
coccinelle report:
./drivers/scsi/mvsas/mv_init.c:699:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/mvsas/mv_init.c:747:8-16:
WARNING: use scnprintf or sprintf
Use sysfs_emit() instead of scnprintf() or sprintf().
Link: https://lore.kernel.org/r/c1711f7cf251730a8ceb5bdfc313bf85662b3395.1643182948.git.yang.guang5@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Date: Sun Feb 20 12:18:01 2022 +0900
scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req()
[ Upstream commit f792a3629f4c4aa4c3703d66b43ce1edcc3ec09a ]
In pm8001_chip_fw_flash_update_build(), if
pm8001_chip_fw_flash_update_build() fails, the struct fw_control_ex
allocated must be freed.
Link: https://lore.kernel.org/r/20220220031810.738362-23-damien.lemoal@opensource.wdc.com
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Date: Sun Feb 20 12:17:57 2022 +0900
scsi: pm8001: Fix pm8001_mpi_task_abort_resp()
[ Upstream commit 7e6b7e740addcea450041b5be8e42f0a4ceece0f ]
The call to pm8001_ccb_task_free() at the end of
pm8001_mpi_task_abort_resp() already frees the ccb tag. So when the device
NCQ_ABORT_ALL_FLAG is set, the tag should not be freed again. Also change
the hardcoded 0xBFFFFFFF value to ~NCQ_ABORT_ALL_FLAG as it ought to be.
Link: https://lore.kernel.org/r/20220220031810.738362-19-damien.lemoal@opensource.wdc.com
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Date: Sun Feb 20 12:17:44 2022 +0900
scsi: pm8001: Fix pm80xx_pci_mem_copy() interface
[ Upstream commit 3762d8f6edcdb03994c919f9487fd6d336c06561 ]
The declaration of the local variable destination1 in pm80xx_pci_mem_copy()
as a pointer to a u32 results in the sparse warning:
warning: incorrect type in assignment (different base types)
expected unsigned int [usertype]
got restricted __le32 [usertype]
Furthermore, the destination" argument of pm80xx_pci_mem_copy() is wrongly
declared with the const attribute.
Fix both problems by changing the type of the "destination" argument to
"__le32 *" and use this argument directly inside the pm80xx_pci_mem_copy()
function, thus removing the need for the destination1 local variable.
Link: https://lore.kernel.org/r/20220220031810.738362-6-damien.lemoal@opensource.wdc.com
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Date: Sun Feb 20 12:18:00 2022 +0900
scsi: pm8001: Fix tag leaks on error
[ Upstream commit 4c8f04b1905cd4b776d0b720463c091545478ef7 ]
In pm8001_chip_set_dev_state_req(), pm8001_chip_fw_flash_update_req(),
pm80xx_chip_phy_ctl_req() and pm8001_chip_reg_dev_req() add missing calls
to pm8001_tag_free() to free the allocated tag when pm8001_mpi_build_cmd()
fails.
Similarly, in pm8001_exec_internal_task_abort(), if the chip ->task_abort
method fails, the tag allocated for the abort request task must be
freed. Add the missing call to pm8001_tag_free().
Link: https://lore.kernel.org/r/20220220031810.738362-22-damien.lemoal@opensource.wdc.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Date: Sun Feb 20 12:17:59 2022 +0900
scsi: pm8001: Fix task leak in pm8001_send_abort_all()
[ Upstream commit f90a74892f3acf0cdec5844e90fc8686ca13e7d7 ]
In pm8001_send_abort_all(), make sure to free the allocated sas task
if pm8001_tag_alloc() or pm8001_mpi_build_cmd() fail.
Link: https://lore.kernel.org/r/20220220031810.738362-21-damien.lemoal@opensource.wdc.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Sat Mar 19 08:01:24 2022 +0100
scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one()
[ Upstream commit 16ed828b872d12ccba8f07bcc446ae89ba662f9c ]
The error handling path of the probe releases a resource that is not freed
in the remove function. In some cases, a ioremap() must be undone.
Add the missing iounmap() call in the remove function.
Link: https://lore.kernel.org/r/247066a3104d25f9a05de8b3270fc3c848763bcc.1647673264.git.christophe.jaillet@wanadoo.fr
Fixes: 45804fbb00ee ("[SCSI] 53c700: Amiga Zorro NCR53c710 SCSI")
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Sachin Sant <sachinp@linux.vnet.ibm.com>
Date: Fri Nov 6 13:10:06 2020 +0530
selftests/cgroup: Fix build on older distros
commit c2e46f6b3e3551558d44c4dc518b9667cb0d5f8b upstream.
On older distros struct clone_args does not have a cgroup member,
leading to build errors:
cgroup_util.c: In function 'clone_into_cgroup':
cgroup_util.c:343:4: error: 'struct clone_args' has no member named 'cgroup'
cgroup_util.c:346:33: error: invalid application of 'sizeof' to incomplete
type 'struct clone_args'
But the selftests already have a locally defined version of the
structure which is up to date, called __clone_args.
So use __clone_args which fixes the error.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sachin Sant <sachinp@linux.vnet.ibm.com>>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tejun Heo <tj@kernel.org>
Date: Thu Jan 6 11:02:29 2022 -1000
selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644
commit b09c2baa56347ae65795350dfcc633dedb1c2970 upstream.
0644 is an odd perm to create a cgroup which is a directory. Use the regular
0755 instead. This is necessary for euid switching test case.
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tejun Heo <tj@kernel.org>
Date: Thu Jan 6 11:02:29 2022 -1000
selftests: cgroup: Test open-time cgroup namespace usage for migration checks
commit bf35a7879f1dfb0d050fe779168bcf25c7de66f5 upstream.
When a task is writing to an fd opened by a different task, the perm check
should use the cgroup namespace of the latter task. Add a test for it.
Tested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tejun Heo <tj@kernel.org>
Date: Thu Jan 6 11:02:29 2022 -1000
selftests: cgroup: Test open-time credential usage for migration checks
commit 613e040e4dc285367bff0f8f75ea59839bc10947 upstream.
When a task is writing to an fd opened by a different task, the perm check
should use the credentials of the latter task. Add a test for it.
Tested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Jiri Slaby <jirislaby@kernel.org>
Date: Tue Mar 8 12:51:53 2022 +0100
serial: samsung_tty: do not unlock port->lock for uart_write_wakeup()
[ Upstream commit 988c7c00691008ea1daaa1235680a0da49dab4e8 ]
The commit c15c3747ee32 (serial: samsung: fix potential soft lockup
during uart write) added an unlock of port->lock before
uart_write_wakeup() and a lock after it. It was always problematic to
write data from tty_ldisc_ops::write_wakeup and it was even documented
that way. We fixed the line disciplines to conform to this recently.
So if there is still a missed one, we should fix them instead of this
workaround.
On the top of that, s3c24xx_serial_tx_dma_complete() in this driver
still holds the port->lock while calling uart_write_wakeup().
So revert the wrap added by the commit above.
Cc: Thomas Abraham <thomas.abraham@linaro.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Hyeonkook Kim <hk619.kim@samsung.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20220308115153.4225-1-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Martin Habets <habetsm.xilinx@gmail.com>
Date: Mon Apr 4 11:48:51 2022 +0100
sfc: Do not free an empty page_ring
[ Upstream commit 458f5d92df4807e2a7c803ed928369129996bf96 ]
When the page_ring is not used page_ptr_mask is 0.
Do not dereference page_ring[0] in this case.
Fixes: 2768935a4660 ("sfc: reuse pages to avoid DMA mapping/unmapping costs")
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kamal Dasu <kdasu.kdev@gmail.com>
Date: Mon Mar 28 10:24:42 2022 -0400
spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op()
[ Upstream commit 2c7d1b281286c46049cd22b43435cecba560edde ]
This fixes case where MSPI controller is used to access spi-nor
flash and BSPI block is not present.
Fixes: 5f195ee7d830 ("spi: bcm-qspi: Implement the spi_mem interface")
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220328142442.7553-1-kdasu.kdev@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Stefan Wahren <stefan.wahren@i2se.com>
Date: Sun Jan 23 21:02:22 2022 +0100
staging: vchiq_core: handle NULL result of find_service_by_handle
[ Upstream commit ca225857faf237234d2fffe5d1919467dfadd822 ]
In case of an invalid handle the function find_servive_by_handle
returns NULL. So take care of this and avoid a NULL pointer dereference.
Reviewed-by: Nicolas Saenz Julienne <nsaenz@kernel.org>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: https://lore.kernel.org/r/1642968143-19281-18-git-send-email-stefan.wahren@i2se.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Xiaoke Wang <xkernel.wang@foxmail.com>
Date: Fri Feb 18 21:59:45 2022 +0800
staging: wfx: fix an error handling in wfx_init_common()
[ Upstream commit 60f1d3c92dc1ef1026e5b917a329a7fa947da036 ]
One error handler of wfx_init_common() return without calling
ieee80211_free_hw(hw), which may result in memory leak. And I add
one err label to unify the error handler, which is useful for the
subsequent changes.
Suggested-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Link: https://lore.kernel.org/r/tencent_24A24A3EFF61206ECCC4B94B1C5C1454E108@qq.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: NeilBrown <neilb@suse.de>
Date: Mon Mar 7 10:41:44 2022 +1100
SUNRPC/call_alloc: async tasks mustn't block waiting for memory
[ Upstream commit c487216bec83b0c5a8803e5c61433d33ad7b104d ]
When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything.
So it must not block waiting for memory.
mempools are particularly a problem as memory can only be released back
to the mempool by an async rpc task running. If all available
workqueue threads are waiting on the mempool, no thread is available to
return anything.
rpc_malloc() can block, and this might cause deadlocks.
So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if
blocking is acceptable.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: NeilBrown <neilb@suse.de>
Date: Mon Mar 7 10:41:44 2022 +1100
SUNRPC/xprt: async tasks mustn't block waiting for memory
[ Upstream commit a721035477fb5fb8abc738fbe410b07c12af3dc5 ]
When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything. So it
must not block waiting for memory.
xprt_dynamic_alloc_slot can block indefinitely. This can tie up all
workqueue threads and NFS can deadlock. So when called from a
workqueue, set __GFP_NORETRY.
The rdma alloc_slot already does not block. However it sets the error
to -EAGAIN suggesting this will trigger a sleep. It does not. As we
can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN
causes immediate retry.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Mon Mar 14 21:02:10 2022 -0400
SUNRPC: Fix socket waits for write buffer space
[ Upstream commit 7496b59f588dd52886fdbac7633608097543a0a5 ]
The socket layer requires that we use the socket lock to protect changes
to the sock->sk_write_pending field and others.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Wed Apr 6 23:18:57 2022 -0400
SUNRPC: Handle ENOMEM in call_transmit_status()
[ Upstream commit d3c15033b240767d0287f1c4a529cbbe2d5ded8a ]
Both call_transmit() and call_bc_transmit() can now return ENOMEM, so
let's make sure that we handle the errors gracefully.
Fixes: 0472e4766049 ("SUNRPC: Convert socket page send code to use iov_iter()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Thu Apr 7 09:50:19 2022 -0400
SUNRPC: Handle low memory situations in call_status()
[ Upstream commit 9d82819d5b065348ce623f196bf601028e22ed00 ]
We need to handle ENFILE, ENOBUFS, and ENOMEM, because
xprt_wake_pending_tasks() can be called with any one of these due to
socket creation failures.
Fixes: b61d59fffd3e ("SUNRPC:Â xs_tcp_connect_worker{4,6}: merge common code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: NeilBrown <neilb@suse.de>
Date: Mon Mar 7 10:41:44 2022 +1100
SUNRPC: remove scheduling boost for "SWAPPER" tasks.
[ Upstream commit a80a8461868905823609be97f91776a26befe839 ]
Currently, tasks marked as "swapper" tasks get put to the front of
non-priority rpc_queues, and are sorted earlier than non-swapper tasks on
the transport's ->xmit_queue.
This is pointless as currently *all* tasks for a mount that has swap
enabled on *any* file are marked as "swapper" tasks. So the net result
is that the non-priority rpc_queues are reverse-ordered (LIFO).
This scheduling boost is not necessary to avoid deadlocks, and hurts
fairness, so remove it. If there were a need to expedite some requests,
the tk_priority mechanism is a more appropriate tool.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Thu Apr 7 14:10:23 2022 -0400
SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
[ Upstream commit b056fa070814897be32d83b079dbc311375588e7 ]
The allocation is done with GFP_KERNEL, but it could still fail in a low
memory situation.
Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Wed Feb 9 19:56:57 2022 +0100
tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
[ Upstream commit 4f9bf2a2f5aacf988e6d5e56b961ba45c5a25248 ]
Commit
9652dc2eb9e40 ("tcp: relax listening_hash operations")
removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.
On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves is also acquired with enabled bottom halves
followed by disabling bottom halves. This is the reverse locking order.
It has been observed with inet_listen_hashbucket::lock:
local_bh_disable() + spin_lock(&ilb->lock):
inet_listen()
inet_csk_listen_start()
sk->sk_prot->hash() := inet_hash()
local_bh_disable()
__inet_hash()
spin_lock(&ilb->lock);
acquire(&ilb->lock);
Reverse order: spin_lock(&ilb2->lock) + local_bh_disable():
tcp_seq_next()
listening_get_next()
spin_lock(&ilb2->lock);
acquire(&ilb2->lock);
tcp4_seq_show()
get_tcp4_sock()
sock_i_ino()
read_lock_bh(&sk->sk_callback_lock);
acquire(softirq_ctrl) // <---- whoops
acquire(&sk->sk_callback_lock)
Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock. Split inet_unhash() and acquire the
listen_hashbucket lock without disabling bottom halves; the inet_ehash
lock with disabled bottom halves.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de
Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net
Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Tue Apr 5 10:33:21 2022 -0300
tools build: Filter out options and warnings not supported by clang
commit 41caff459a5b956b3e23ba9ca759dd0629ad3dda upstream.
These make the feature check fail when using clang, so remove them just
like is done in tools/perf/Makefile.config to build perf itself.
Adding -Wno-compound-token-split-by-macro to tools/perf/Makefile.config
when building with clang is also necessary to avoid these warnings
turned into errors (-Werror):
CC /tmp/build/perf/util/scripting-engines/trace-event-perl.o
In file included from util/scripting-engines/trace-event-perl.c:35:
In file included from /usr/lib64/perl5/CORE/perl.h:4085:
In file included from /usr/lib64/perl5/CORE/hv.h:659:
In file included from /usr/lib64/perl5/CORE/hv_func.h:34:
In file included from /usr/lib64/perl5/CORE/sbox32_hash.h:4:
/usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '(' and '{' tokens introducing statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro]
ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/perl5/CORE/zaphod32_hash.h:80:38: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
#define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \
^~~~~~~~~~
/usr/lib64/perl5/CORE/perl.h:737:29: note: expanded from macro 'STMT_START'
# define STMT_START (void)( /* gcc supports "({ STATEMENTS; })" */
^
/usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: '{' token is here
ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/perl5/CORE/zaphod32_hash.h:80:49: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
#define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \
^
/usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '}' and ')' tokens terminating statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro]
ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/perl5/CORE/zaphod32_hash.h:87:41: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
v ^= (v>>23); \
^
/usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: ')' token is here
ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/perl5/CORE/zaphod32_hash.h:88:3: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
} STMT_END
^~~~~~~~
/usr/lib64/perl5/CORE/perl.h:738:21: note: expanded from macro 'STMT_END'
# define STMT_END )
^
Please refer to the discussion on the Link: tag below, where Nathan
clarifies the situation:
<quote>
acme> And then get to the problems at the end of this message, which seem
acme> similar to the problem described here:
acme>
acme> From Nathan Chancellor <>
acme> Subject [PATCH] mwifiex: Remove unnecessary braces from HostCmd_SET_SEQ_NO_BSS_INFO
acme>
acme> https://lkml.org/lkml/2020/9/1/135
acme>
acme> So perhaps in this case its better to disable that
acme> -Werror,-Wcompound-token-split-by-macro when building with clang?
Yes, I think that is probably the best solution. As far as I can tell,
at least in this file and context, the warning appears harmless, as the
"create a GNU C statement expression from two different macros" is very
much intentional, based on the presence of PERL_USE_GCC_BRACE_GROUPS.
The warning is fixed in upstream Perl by just avoiding creating GNU C
statement expressions using STMT_START and STMT_END:
https://github.com/Perl/perl5/issues/18780
https://github.com/Perl/perl5/pull/18984
If I am reading the source code correctly, an alternative to disabling
the warning would be specifying -DPERL_GCC_BRACE_GROUPS_FORBIDDEN but it
seems like that might end up impacting more than just this site,
according to the issue discussion above.
</quote>
Based-on-a-patch-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Debian/Selfmade LLVM-14 (x86-64)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: http://lore.kernel.org/lkml/YkxWcYzph5pC1EK8@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Mon Apr 4 17:28:48 2022 -0300
tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts
commit 541f695cbcb6932c22638b06e0cbe1d56177e2e9 upstream.
Just like its done for ldopts and for both in tools/perf/Makefile.config.
Using `` to initialize PERL_EMBED_CCOPTS somehow precludes using:
$(filter-out SOMETHING_TO_FILTER,$(PERL_EMBED_CCOPTS))
And we need to do it to allow for building with versions of clang where
some gcc options selected by distros are not available.
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Debian/Selfmade LLVM-14 (x86-64)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: http://lore.kernel.org/lkml/YktYX2OnLtyobRYD@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Harold Huang <baymaxhuang@gmail.com>
Date: Thu Mar 3 10:24:40 2022 +0800
tuntap: add sanity checks about msg_controllen in sendmsg
[ Upstream commit 74a335a07a17d131b9263bfdbdcb5e40673ca9ca ]
In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to
tun_sendmsg. Although we donot use msg_controllen in this path, we should
check msg_controllen to make sure the caller pass a valid msg_ctl.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe8dd45bb7556246c6b76277b1ba4296c91c2505
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220303022441.383865-1-baymaxhuang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Zhihao Cheng <chengzhihao1@huawei.com>
Date: Mon Dec 27 11:22:38 2021 +0800
ubifs: Rectify space amount budget for mkdir/tmpfile operations
[ Upstream commit a6dab6607d4681d227905d5198710b575dbdb519 ]
UBIFS should make sure the flash has enough space to store dirty (Data
that is newer than disk) data (in memory), space budget is exactly
designed to do that. If space budget calculates less data than we need,
'make_reservation()' will do more work(return -ENOSPC if no free space
lelf, sometimes we can see "cannot reserve xxx bytes in jhead xxx, error
-28" in ubifs error messages) with ubifs inodes locked, which may effect
other syscalls.
A simple way to decide how much space do we need when make a budget:
See how much space is needed by 'make_reservation()' in ubifs_jnl_xxx()
function according to corresponding operation.
It's better to report ENOSPC in ubifs_budget_space(), as early as we can.
Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE")
Fixes: 1e51764a3c2ac05 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Kees Cook <keescook@chromium.org>
Date: Wed Jan 19 18:10:35 2022 -0800
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
commit 69d0db01e210e07fe915e5da91b54a867cda040f upstream.
The object-size sanitizer is redundant to -Warray-bounds, and
inappropriately performs its checks at run-time when all information
needed for the evaluation is available at compile-time, making it quite
difficult to use:
https://bugzilla.kernel.org/show_bug.cgi?id=214861
With -Warray-bounds almost enabled globally, it doesn't make sense to
keep this around.
Link: https://lkml.kernel.org/r/20211203235346.110809-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: H. Nikolaus Schaller <hns@goldelico.com>
Date: Tue Mar 8 14:03:37 2022 +0100
usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm
[ Upstream commit ac01df343e5a6c6bcead2ed421af1fde30f73e7e ]
Usually, the vbus_regulator (smps10 on omap5evm) boots up disabled.
Hence calling regulator_disable() indirectly through dwc3_omap_set_mailbox()
during probe leads to:
[ 10.332764] WARNING: CPU: 0 PID: 1628 at drivers/regulator/core.c:2853 _regulator_disable+0x40/0x164
[ 10.351919] unbalanced disables for smps10_out1
[ 10.361298] Modules linked in: dwc3_omap(+) clk_twl6040 at24 gpio_twl6040 palmas_gpadc palmas_pwrbutton
industrialio snd_soc_omap_mcbsp(+) snd_soc_ti_sdma display_connector ti_tpd12s015 drm leds_gpio
drm_panel_orientation_quirks ip_tables x_tables ipv6 autofs4
[ 10.387818] CPU: 0 PID: 1628 Comm: systemd-udevd Not tainted 5.17.0-rc1-letux-lpae+ #8139
[ 10.405129] Hardware name: Generic OMAP5 (Flattened Device Tree)
[ 10.411455] unwind_backtrace from show_stack+0x10/0x14
[ 10.416970] show_stack from dump_stack_lvl+0x40/0x4c
[ 10.422313] dump_stack_lvl from __warn+0xb8/0x170
[ 10.427377] __warn from warn_slowpath_fmt+0x70/0x9c
[ 10.432595] warn_slowpath_fmt from _regulator_disable+0x40/0x164
[ 10.439037] _regulator_disable from regulator_disable+0x30/0x64
[ 10.445382] regulator_disable from dwc3_omap_set_mailbox+0x8c/0xf0 [dwc3_omap]
[ 10.453116] dwc3_omap_set_mailbox [dwc3_omap] from dwc3_omap_probe+0x2b8/0x394 [dwc3_omap]
[ 10.467021] dwc3_omap_probe [dwc3_omap] from platform_probe+0x58/0xa8
[ 10.481762] platform_probe from really_probe+0x168/0x2fc
[ 10.481782] really_probe from __driver_probe_device+0xc4/0xd8
[ 10.481782] __driver_probe_device from driver_probe_device+0x24/0xa4
[ 10.503762] driver_probe_device from __driver_attach+0xc4/0xd8
[ 10.510018] __driver_attach from bus_for_each_dev+0x64/0xa0
[ 10.516001] bus_for_each_dev from bus_add_driver+0x148/0x1a4
[ 10.524880] bus_add_driver from driver_register+0xb4/0xf8
[ 10.530678] driver_register from do_one_initcall+0x90/0x1c4
[ 10.536661] do_one_initcall from do_init_module+0x4c/0x200
[ 10.536683] do_init_module from load_module+0x13dc/0x1910
[ 10.551159] load_module from sys_finit_module+0xc8/0xd8
[ 10.561319] sys_finit_module from __sys_trace_return+0x0/0x18
[ 10.561336] Exception stack(0xc344bfa8 to 0xc344bff0)
[ 10.561341] bfa0: b6fb5778 b6fab8d8 00000007 b6ecfbb8 00000000 b6ed0398
[ 10.561341] bfc0: b6fb5778 b6fab8d8 855c0500 0000017b 00020000 b6f9a3cc 00000000 b6fb5778
[ 10.595500] bfe0: bede18f8 bede18e8 b6ec9aeb b6dda1c2
[ 10.601345] ---[ end trace 0000000000000000 ]---
Fix this unnecessary warning by checking if the regulator is enabled.
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Link: https://lore.kernel.org/r/af3b750dc2265d875deaabcf5f80098c9645da45.1646744616.git.hns@goldelico.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Neal Liu <neal_liu@aspeedtech.com>
Date: Tue Feb 8 18:16:57 2022 +0800
usb: ehci: add pci device support for Aspeed platforms
[ Upstream commit c3c9cee592828528fd228b01d312c7526c584a42 ]
Enable Aspeed quirks in commit 7f2d73788d90 ("usb: ehci:
handshake CMD_RUN instead of STS_HALT") to support Aspeed
ehci-pci device.
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Neal Liu <neal_liu@aspeedtech.com>
Link: https://lore.kernel.org/r/20220208101657.76459-1-neal_liu@aspeedtech.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Wayne Chang <waynec@nvidia.com>
Date: Fri Jan 7 17:04:43 2022 +0800
usb: gadget: tegra-xudc: Do not program SPARAM
[ Upstream commit 62fb61580eb48fc890b7bc9fb5fd263367baeca8 ]
According to the Tegra Technical Reference Manual, SPARAM
is a read-only register and should not be programmed in
the driver.
The change removes the wrong SPARAM usage.
Signed-off-by: Wayne Chang <waynec@nvidia.com>
Link: https://lore.kernel.org/r/20220107090443.149021-1-waynec@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Wayne Chang <waynec@nvidia.com>
Date: Fri Jan 7 17:13:49 2022 +0800
usb: gadget: tegra-xudc: Fix control endpoint's definitions
[ Upstream commit 7bd42fb95eb4f98495ccadf467ad15124208ec49 ]
According to the Tegra Technical Reference Manual, the seq_num
field of control endpoint is not [31:24] but [31:27]. Bit 24
is reserved and bit 26 is splitxstate.
The change fixes the wrong control endpoint's definitions.
Signed-off-by: Wayne Chang <waynec@nvidia.com>
Link: https://lore.kernel.org/r/20220107091349.149798-1-waynec@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Randy Dunlap <rdunlap@infradead.org>
Date: Wed Mar 16 12:20:03 2022 -0700
virtio_console: eliminate anonymous module_init & module_exit
[ Upstream commit fefb8a2a941338d871e2d83fbd65fbfa068857bd ]
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.
Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.
Example 1: (System.map)
ffffffff832fc78c t init
ffffffff832fc79e t init
ffffffff832fc8f8 t init
Example 2: (initcall_debug log)
calling init+0x0/0x12 @ 1
initcall init+0x0/0x12 returned 0 after 15 usecs
calling init+0x0/0x60 @ 1
initcall init+0x0/0x60 returned 0 after 2 usecs
calling init+0x0/0x9a @ 1
initcall init+0x0/0x9a returned 0 after 74 usecs
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Amit Shah <amit@kernel.org>
Cc: virtualization@lists.linux-foundation.org
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20220316192010.19001-3-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lucas Denefle <lucas.denefle@converge.io>
Date: Wed Feb 23 11:35:55 2022 +0000
w1: w1_therm: fixes w1_seq for ds28ea00 sensors
[ Upstream commit 41a92a89eee819298f805c40187ad8b02bb53426 ]
w1_seq was failing due to several devices responding to the
CHAIN_DONE at the same time. Now properly selects the current
device in the chain with MATCH_ROM. Also acknowledgment was
read twice.
Signed-off-by: Lucas Denefle <lucas.denefle@converge.io>
Link: https://lore.kernel.org/r/20220223113558.232750-1-lucas.denefle@converge.io
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Nathan Chancellor <nathan@kernel.org>
Date: Mon Mar 14 12:48:42 2022 -0700
x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
[ Upstream commit aaeed6ecc1253ce1463fa1aca0b70a4ccbc9fa75 ]
There are two outstanding issues with CONFIG_X86_X32_ABI and
llvm-objcopy, with similar root causes:
1. llvm-objcopy does not properly convert .note.gnu.property when going
from x86_64 to x86_x32, resulting in a corrupted section when
linking:
https://github.com/ClangBuiltLinux/linux/issues/1141
2. llvm-objcopy produces corrupted compressed debug sections when going
from x86_64 to x86_x32, also resulting in an error when linking:
https://github.com/ClangBuiltLinux/linux/issues/514
After commit 41c5ef31ad71 ("x86/ibt: Base IBT bits"), the
.note.gnu.property section is always generated when
CONFIG_X86_KERNEL_IBT is enabled, which causes the first issue to become
visible with an allmodconfig build:
ld.lld: error: arch/x86/entry/vdso/vclock_gettime-x32.o:(.note.gnu.property+0x1c): program property is too short
To avoid this error, do not allow CONFIG_X86_X32_ABI to be selected when
using llvm-objcopy. If the two issues ever get fixed in llvm-objcopy,
this can be turned into a feature check.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220314194842.3452-3-nathan@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date: Mon Apr 4 17:34:19 2022 -0700
x86/pm: Save the MSR validity status at context setup
commit 73924ec4d560257004d5b5116b22a3647661e364 upstream.
The mechanism to save/restore MSRs during S3 suspend/resume checks for
the MSR validity during suspend, and only restores the MSR if its a
valid MSR. This is not optimal, as an invalid MSR will unnecessarily
throw an exception for every suspend cycle. The more invalid MSRs,
higher the impact will be.
Check and save the MSR validity at setup. This ensures that only valid
MSRs that are guaranteed to not throw an exception will be attempted
during suspend.
Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date: Mon Apr 4 17:35:45 2022 -0700
x86/speculation: Restore speculation related MSRs during S3 resume
commit e2a1256b17b16f9b9adf1b6fea56819e7b68e463 upstream.
After resuming from suspend-to-RAM, the MSRs that control CPU's
speculative execution behavior are not being restored on the boot CPU.
These MSRs are used to mitigate speculative execution vulnerabilities.
Not restoring them correctly may leave the CPU vulnerable. Secondary
CPU's MSRs are correctly being restored at S3 resume by
identify_secondary_cpu().
During S3 resume, restore these MSRs for boot CPU when restoring its
processor state.
Fixes: 772439717dbf ("x86/bugs/intel: Set proper CPU features and setup RDS")
Reported-by: Neelima Krishnan <neelima.krishnan@intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Dongli Zhang <dongli.zhang@oracle.com>
Date: Wed Mar 2 08:40:32 2022 -0800
xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
[ Upstream commit eed05744322da07dd7e419432dcedf3c2e017179 ]
The sched_clock() can be used very early since commit 857baa87b642
("sched/clock: Enable sched clock early"). In addition, with commit
38669ba205d1 ("x86/xen/time: Output xen sched_clock time from 0"), kdump
kernel in Xen HVM guest may panic at very early stage when accessing
&__this_cpu_read(xen_vcpu)->time as in below:
setup_arch()
-> init_hypervisor_platform()
-> x86_init.hyper.init_platform = xen_hvm_guest_init()
-> xen_hvm_init_time_ops()
-> xen_clocksource_read()
-> src = &__this_cpu_read(xen_vcpu)->time;
This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
However, when Xen HVM guest panic on vcpu >= 32, since
xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when
vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.
This patch calls xen_hvm_init_time_ops() again later in
xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is
registered when the boot vcpu is >= 32.
This issue can be reproduced on purpose via below command at the guest
side when kdump/kexec is enabled:
"taskset -c 33 echo c > /proc/sysrq-trigger"
The bugfix for PVM is not implemented due to the lack of testing
environment.
[boris: xen_hvm_init_time_ops() returns on errors instead of jumping to end]
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20220302164032.14569-3-dongli.zhang@oracle.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Max Filippov <jcmvbkbc@gmail.com>
Date: Thu Mar 17 02:49:41 2022 -0700
xtensa: fix DTC warning unit_address_format
[ Upstream commit e85d29ba4b24f68e7a78cb85c55e754362eeb2de ]
DTC issues the following warnings when building xtfpga device trees:
/soc/flash@00000000/partition@0x0: unit name should not have leading "0x"
/soc/flash@00000000/partition@0x6000000: unit name should not have leading "0x"
/soc/flash@00000000/partition@0x6800000: unit name should not have leading "0x"
/soc/flash@00000000/partition@0x7fe0000: unit name should not have leading "0x"
Drop leading 0x from flash partition unit names.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>