óÐÉÓÏË ÉÚÍÅÎÅÎÉÊ × Linux 5.15.103

 
af_unix: fix struct pid leaks in OOB support [+ + +]
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Mar 7 16:45:30 2023 +0000

    af_unix: fix struct pid leaks in OOB support
    
    [ Upstream commit 2aab4b96900272885bc157f8b236abf1cdc02e08 ]
    
    syzbot reported struct pid leak [1].
    
    Issue is that queue_oob() calls maybe_add_creds() which potentially
    holds a reference on a pid.
    
    But skb->destructor is not set (either directly or by calling
    unix_scm_to_skb())
    
    This means that subsequent kfree_skb() or consume_skb() would leak
    this reference.
    
    In this fix, I chose to fully support scm even for the OOB message.
    
    [1]
    BUG: memory leak
    unreferenced object 0xffff8881053e7f80 (size 128):
    comm "syz-executor242", pid 5066, jiffies 4294946079 (age 13.220s)
    hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [<ffffffff812ae26a>] alloc_pid+0x6a/0x560 kernel/pid.c:180
    [<ffffffff812718df>] copy_process+0x169f/0x26c0 kernel/fork.c:2285
    [<ffffffff81272b37>] kernel_clone+0xf7/0x610 kernel/fork.c:2684
    [<ffffffff812730cc>] __do_sys_clone+0x7c/0xb0 kernel/fork.c:2825
    [<ffffffff849ad699>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff849ad699>] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    Fixes: 314001f0bf92 ("af_unix: Add OOB support")
    Reported-by: syzbot+7699d9e5635c10253a27@syzkaller.appspotmail.com
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Rao Shoaib <rao.shoaib@oracle.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20230307164530.771896-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

af_unix: Remove unnecessary brackets around CONFIG_AF_UNIX_OOB. [+ + +]
Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Date:   Thu Mar 17 12:23:08 2022 +0900

    af_unix: Remove unnecessary brackets around CONFIG_AF_UNIX_OOB.
    
    [ Upstream commit 4edf21aa94ee33c75f819f2b6eb6dd52ef8a1628 ]
    
    Let's remove unnecessary brackets around CONFIG_AF_UNIX_OOB.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
    Link: https://lore.kernel.org/r/20220317032308.65372-1-kuniyu@amazon.co.jp
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 2aab4b969002 ("af_unix: fix struct pid leaks in OOB support")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
alpha: fix R_ALPHA_LITERAL reloc for large modules [+ + +]
Author: Edward Humes <aurxenon@lunos.org>
Date:   Sat Aug 27 02:49:39 2022 -0400

    alpha: fix R_ALPHA_LITERAL reloc for large modules
    
    [ Upstream commit b6b17a8b3ecd878d98d5472a9023ede9e669ca72 ]
    
    Previously, R_ALPHA_LITERAL relocations would overflow for large kernel
    modules.
    
    This was because the Alpha's apply_relocate_add was relying on the kernel's
    module loader to have sorted the GOT towards the very end of the module as it
    was mapped into memory in order to correctly assign the global pointer. While
    this behavior would mostly work fine for small kernel modules, this approach
    would overflow on kernel modules with large GOT's since the global pointer
    would be very far away from the GOT, and thus, certain entries would be out of
    range.
    
    This patch fixes this by instead using the Tru64 behavior of assigning the
    global pointer to be 32KB away from the start of the GOT. The change made
    in this patch won't work for multi-GOT kernel modules as it makes the
    assumption the module only has one GOT located at the beginning of .got,
    although for the vast majority kernel modules, this should be fine. Of the
    kernel modules that would previously result in a relocation error, none of
    them, even modules like nouveau, have even come close to filling up a single
    GOT, and they've all worked fine under this patch.
    
    Signed-off-by: Edward Humes <aurxenon@lunos.org>
    Signed-off-by: Matt Turner <mattst88@gmail.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
arch: fix broken BuildID for arm64 and riscv [+ + +]
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Wed Mar 1 19:03:48 2023 -0700

    arch: fix broken BuildID for arm64 and riscv
    
    commit 99cb0d917ffa1ab628bb67364ca9b162c07699b1 upstream.
    
    Dennis Gilmore reports that the BuildID is missing in the arm64 vmlinux
    since commit 994b7ac1697b ("arm64: remove special treatment for the
    link order of head.o").
    
    The issue is that the type of .notes section, which contains the BuildID,
    changed from NOTES to PROGBITS.
    
    Ard Biesheuvel figured out that whichever object gets linked first gets
    to decide the type of a section. The PROGBITS type is the result of the
    compiler emitting .note.GNU-stack as PROGBITS rather than NOTE.
    
    While Ard provided a fix for arm64, I want to fix this globally because
    the same issue is happening on riscv since commit 2348e6bf4421 ("riscv:
    remove special treatment for the link order of head.o"). This problem
    will happen in general for other architectures if they start to drop
    unneeded entries from scripts/head-object-list.txt.
    
    Discard .note.GNU-stack in include/asm-generic/vmlinux.lds.h.
    
    Link: https://lore.kernel.org/lkml/CAABkxwuQoz1CTbyb57n0ZX65eSYiTonFCU8-LCQc=74D=xE=rA@mail.gmail.com/
    Fixes: 994b7ac1697b ("arm64: remove special treatment for the link order of head.o")
    Fixes: 2348e6bf4421 ("riscv: remove special treatment for the link order of head.o")
    Reported-by: Dennis Gilmore <dennis@ausil.us>
    Suggested-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
    [Tom: stable backport 5.15.y, 5.10.y, 5.4.y]
    
    Though the above "Fixes:" commits are not in this kernel, the conditions
    which lead to a missing Build ID in arm64 vmlinux are similar.
    
    Evidence points to these conditions:
    1. ld version > 2.36 (exact binutils commit documented in a494398bde27)
    2. first object which gets linked (head.o) has a PROGBITS .note.GNU-stack segment
    
    These conditions can be observed when:
    - 5.15.60+ OR 5.10.136+ OR 5.4.210+
    - AND ld version > 2.36
    - AND arch=arm64
    - AND CONFIG_MODVERSIONS=y
    
    There are notable differences in the vmlinux elf files produced
    before(bad) and after(good) applying this series.
    
    Good: p_type:PT_NOTE segment exists.
     Bad: p_type:PT_NOTE segment is missing.
    
    Good: sh_name_str:.notes section has sh_type:SHT_NOTE
     Bad: sh_name_str:.notes section has sh_type:SHT_PROGBITS
    
    `readelf -n` (as of v2.40) searches for Build Id
    by processing only the very first note in sh_type:SHT_NOTE sections.
    
    This was previously bisected to the stable backport of 0d362be5b142.
    Follow-up experiments were discussed here: https://lore.kernel.org/all/20221221235413.xaisboqmr7dkqwn6@oracle.com/
    which strongly hints at condition 2.
    Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
arm64: efi: Make efi_rt_lock a raw_spinlock [+ + +]
Author: Pierre Gondois <pierre.gondois@arm.com>
Date:   Wed Feb 15 17:10:47 2023 +0100

    arm64: efi: Make efi_rt_lock a raw_spinlock
    
    [ Upstream commit 0e68b5517d3767562889f1d83fdb828c26adb24f ]
    
    Running a rt-kernel base on 6.2.0-rc3-rt1 on an Ampere Altra outputs
    the following:
      BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
      in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: kworker/u320:0
      preempt_count: 2, expected: 0
      RCU nest depth: 0, expected: 0
      3 locks held by kworker/u320:0/9:
      #0: ffff3fff8c27d128 ((wq_completion)efi_rts_wq){+.+.}-{0:0}, at: process_one_work (./include/linux/atomic/atomic-long.h:41)
      #1: ffff80000861bdd0 ((work_completion)(&efi_rts_work.work)){+.+.}-{0:0}, at: process_one_work (./include/linux/atomic/atomic-long.h:41)
      #2: ffffdf7e1ed3e460 (efi_rt_lock){+.+.}-{3:3}, at: efi_call_rts (drivers/firmware/efi/runtime-wrappers.c:101)
      Preemption disabled at:
      efi_virtmap_load (./arch/arm64/include/asm/mmu_context.h:248)
      CPU: 0 PID: 9 Comm: kworker/u320:0 Tainted: G        W          6.2.0-rc3-rt1
      Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
      Workqueue: efi_rts_wq efi_call_rts
      Call trace:
      dump_backtrace (arch/arm64/kernel/stacktrace.c:158)
      show_stack (arch/arm64/kernel/stacktrace.c:165)
      dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
      dump_stack (lib/dump_stack.c:114)
      __might_resched (kernel/sched/core.c:10134)
      rt_spin_lock (kernel/locking/rtmutex.c:1769 (discriminator 4))
      efi_call_rts (drivers/firmware/efi/runtime-wrappers.c:101)
      [...]
    
    This seems to come from commit ff7a167961d1 ("arm64: efi: Execute
    runtime services from a dedicated stack") which adds a spinlock. This
    spinlock is taken through:
    efi_call_rts()
    \-efi_call_virt()
      \-efi_call_virt_pointer()
        \-arch_efi_call_virt_setup()
    
    Make 'efi_rt_lock' a raw_spinlock to avoid being preempted.
    
    [ardb: The EFI runtime services are called with a different set of
           translation tables, and are permitted to use the SIMD registers.
           The context switch code preserves/restores neither, and so EFI
           calls must be made with preemption disabled, rather than only
           disabling migration.]
    
    Fixes: ff7a167961d1 ("arm64: efi: Execute runtime services from a dedicated stack")
    Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
    Cc: <stable@vger.kernel.org> # v6.1+
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
attr: add in_group_or_capable() [+ + +]
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Mar 7 10:59:18 2023 -0800

    attr: add in_group_or_capable()
    
    commit 11c2a8700cdcabf9b639b7204a1e38e2a0b6798e upstream.
    
    [backport to 5.15.y, prior to vfsgid_t]
    
    In setattr_{copy,prepare}() we need to perform the same permission
    checks to determine whether we need to drop the setgid bit or not.
    Instead of open-coding it twice add a simple helper the encapsulates the
    logic. We will reuse this helpers to make dropping the setgid bit during
    write operations more consistent in a follow up patch.
    
    Reviewed-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

attr: add setattr_should_drop_sgid() [+ + +]
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Mar 7 10:59:20 2023 -0800

    attr: add setattr_should_drop_sgid()
    
    commit 72ae017c5451860443a16fb2a8c243bff3e396b8 upstream.
    
    [backport to 5.15.y, prior to vfsgid_t]
    
    The current setgid stripping logic during write and ownership change
    operations is inconsistent and strewn over multiple places. In order to
    consolidate it and make more consistent we'll add a new helper
    setattr_should_drop_sgid(). The function retains the old behavior where
    we remove the S_ISGID bit unconditionally when S_IXGRP is set but also
    when it isn't set and the caller is neither in the group of the inode
    nor privileged over the inode.
    
    We will use this helper both in write operation permission removal such
    as file_remove_privs() as well as in ownership change operations.
    
    Reviewed-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

attr: use consistent sgid stripping checks [+ + +]
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Mar 7 10:59:21 2023 -0800

    attr: use consistent sgid stripping checks
    
    commit ed5a7047d2011cb6b2bf84ceb6680124cc6a7d95 upstream.
    
    [backport to 5.15.y, prior to vfsgid_t]
    
    Currently setgid stripping in file_remove_privs()'s should_remove_suid()
    helper is inconsistent with other parts of the vfs. Specifically, it only
    raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
    inode isn't in the caller's groups and the caller isn't privileged over the
    inode although we require this already in setattr_prepare() and
    setattr_copy() and so all filesystem implement this requirement implicitly
    because they have to use setattr_{prepare,copy}() anyway.
    
    But the inconsistency shows up in setgid stripping bugs for overlayfs in
    xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
    generic/687). For example, we test whether suid and setgid stripping works
    correctly when performing various write-like operations as an unprivileged
    user (fallocate, reflink, write, etc.):
    
    echo "Test 1 - qa_user, non-exec file $verb"
    setup_testfile
    chmod a+rws $junk_file
    commit_and_check "$qa_user" "$verb" 64k 64k
    
    The test basically creates a file with 6666 permissions. While the file has
    the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
    regular filesystem like xfs what will happen is:
    
    sys_fallocate()
    -> vfs_fallocate()
       -> xfs_file_fallocate()
          -> file_modified()
             -> __file_remove_privs()
                -> dentry_needs_remove_privs()
                   -> should_remove_suid()
                -> __remove_privs()
                   newattrs.ia_valid = ATTR_FORCE | kill;
                   -> notify_change()
                      -> setattr_copy()
    
    In should_remove_suid() we can see that ATTR_KILL_SUID is raised
    unconditionally because the file in the test has S_ISUID set.
    
    But we also see that ATTR_KILL_SGID won't be set because while the file
    is S_ISGID it is not S_IXGRP (see above) which is a condition for
    ATTR_KILL_SGID being raised.
    
    So by the time we call notify_change() we have attr->ia_valid set to
    ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
    ATTR_KILL_SUID is set and does:
    
    ia_valid = attr->ia_valid |= ATTR_MODE
    attr->ia_mode = (inode->i_mode & ~S_ISUID);
    
    which means that when we call setattr_copy() later we will definitely
    update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.
    
    Now we call into the filesystem's ->setattr() inode operation which will
    end up calling setattr_copy(). Since ATTR_MODE is set we will hit:
    
    if (ia_valid & ATTR_MODE) {
            umode_t mode = attr->ia_mode;
            vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
            if (!vfsgid_in_group_p(vfsgid) &&
                !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                    mode &= ~S_ISGID;
            inode->i_mode = mode;
    }
    
    and since the caller in the test is neither capable nor in the group of the
    inode the S_ISGID bit is stripped.
    
    But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
    has the consequence that neither the setgid nor the suid bits are stripped
    even though it should be stripped because the inode isn't in the caller's
    groups and the caller isn't privileged over the inode.
    
    If overlayfs is in the mix things become a bit more complicated and the bug
    shows up more clearly. When e.g., ovl_setattr() is hit from
    ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
    ATTR_KILL_SGID might be raised but because the check in notify_change() is
    questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
    stripped the S_ISGID bit isn't removed even though it should be stripped:
    
    sys_fallocate()
    -> vfs_fallocate()
       -> ovl_fallocate()
          -> file_remove_privs()
             -> dentry_needs_remove_privs()
                -> should_remove_suid()
             -> __remove_privs()
                newattrs.ia_valid = ATTR_FORCE | kill;
                -> notify_change()
                   -> ovl_setattr()
                      // TAKE ON MOUNTER'S CREDS
                      -> ovl_do_notify_change()
                         -> notify_change()
                      // GIVE UP MOUNTER'S CREDS
         // TAKE ON MOUNTER'S CREDS
         -> vfs_fallocate()
            -> xfs_file_fallocate()
               -> file_modified()
                  -> __file_remove_privs()
                     -> dentry_needs_remove_privs()
                        -> should_remove_suid()
                     -> __remove_privs()
                        newattrs.ia_valid = attr_force | kill;
                        -> notify_change()
    
    The fix for all of this is to make file_remove_privs()'s
    should_remove_suid() helper to perform the same checks as we already
    require in setattr_prepare() and setattr_copy() and have notify_change()
    not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
    first place because the caller must calculate the flags via
    should_remove_suid() anyway which would raise ATTR_KILL_SGID.
    
    While we're at it we move should_remove_suid() from inode.c to attr.c
    where it belongs with the rest of the iattr helpers. Especially since it
    returns ATTR_KILL_S{G,U}ID flags. We also rename it to
    setattr_should_drop_suidgid() to better reflect that it indicates both
    setuid and setgid bit removal and also that it returns attr flags.
    
    Running xfstests with this doesn't report any regressions. We should really
    try and use consistent checks.
    
    Reviewed-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
bgmac: fix *initial* chip reset to support BCM5358 [+ + +]
Author: Rafał Miłecki <rafal@milecki.pl>
Date:   Mon Feb 27 10:11:56 2023 +0100

    bgmac: fix *initial* chip reset to support BCM5358
    
    [ Upstream commit f99e6d7c4ed3be2531bd576425a5bd07fb133bd7 ]
    
    While bringing hardware up we should perform a full reset including the
    switch bit (BGMAC_BCMA_IOCTL_SW_RESET aka SICF_SWRST). It's what
    specification says and what reference driver does.
    
    This seems to be critical for the BCM5358. Without this hardware doesn't
    get initialized properly and doesn't seem to transmit or receive any
    packets.
    
    Originally bgmac was calling bgmac_chip_reset() before setting
    "has_robosw" property which resulted in expected behaviour. That has
    changed as a side effect of adding platform device support which
    regressed BCM5358 support.
    
    Fixes: f6a95a24957a ("net: ethernet: bgmac: Add platform device support")
    Cc: Jon Mason <jdmason@kudzu.us>
    Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
    Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Link: https://lore.kernel.org/r/20230227091156.19509-1-zajec5@gmail.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
block/brd: add error handling support for add_disk() [+ + +]
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Fri Oct 15 16:52:07 2021 -0700

    block/brd: add error handling support for add_disk()
    
    [ Upstream commit e1528830bd4ebf435d91c154e309e6e028336210 ]
    
    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20211015235219.2191207-2-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stable-dep-of: 67205f80be99 ("brd: mark as nowait compatible")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
bnxt_en: Avoid order-5 memory allocation for TPA data [+ + +]
Author: Michael Chan <michael.chan@broadcom.com>
Date:   Fri Mar 3 18:43:57 2023 -0800

    bnxt_en: Avoid order-5 memory allocation for TPA data
    
    [ Upstream commit accd7e23693aaaa9aa0d3e9eca0ae77d1be80ab3 ]
    
    The driver needs to keep track of all the possible concurrent TPA (GRO/LRO)
    completions on the aggregation ring.  On P5 chips, the maximum number
    of concurrent TPA is 256 and the amount of memory we allocate is order-5
    on systems using 4K pages.  Memory allocation failure has been reported:
    
    NetworkManager: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
    CPU: 15 PID: 2995 Comm: NetworkManager Kdump: loaded Not tainted 5.10.156 #1
    Hardware name: Dell Inc. PowerEdge R660/0M1CC5, BIOS 0.2.25 08/12/2022
    Call Trace:
     dump_stack+0x57/0x6e
     warn_alloc.cold.120+0x7b/0xdd
     ? _cond_resched+0x15/0x30
     ? __alloc_pages_direct_compact+0x15f/0x170
     __alloc_pages_slowpath.constprop.108+0xc58/0xc70
     __alloc_pages_nodemask+0x2d0/0x300
     kmalloc_order+0x24/0xe0
     kmalloc_order_trace+0x19/0x80
     bnxt_alloc_mem+0x1150/0x15c0 [bnxt_en]
     ? bnxt_get_func_stat_ctxs+0x13/0x60 [bnxt_en]
     __bnxt_open_nic+0x12e/0x780 [bnxt_en]
     bnxt_open+0x10b/0x240 [bnxt_en]
     __dev_open+0xe9/0x180
     __dev_change_flags+0x1af/0x220
     dev_change_flags+0x21/0x60
     do_setlink+0x35c/0x1100
    
    Instead of allocating this big chunk of memory and dividing it up for the
    concurrent TPA instances, allocate each small chunk separately for each
    TPA instance.  This will reduce it to order-0 allocations.
    
    Fixes: 79632e9ba386 ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.")
    Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
    Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
    Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
    Signed-off-by: Michael Chan <michael.chan@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() [+ + +]
Author: Liu Jian <liujian56@huawei.com>
Date:   Fri Mar 3 16:09:46 2023 +0800

    bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser()
    
    [ Upstream commit d900f3d20cc3169ce42ec72acc850e662a4d4db2 ]
    
    When the buffer length of the recvmsg system call is 0, we got the
    flollowing soft lockup problem:
    
    watchdog: BUG: soft lockup - CPU#3 stuck for 27s! [a.out:6149]
    CPU: 3 PID: 6149 Comm: a.out Kdump: loaded Not tainted 6.2.0+ #30
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
    RIP: 0010:remove_wait_queue+0xb/0xc0
    Code: 5e 41 5f c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 <41> 56 41 55 41 54 55 48 89 fd 53 48 89 f3 4c 8d 6b 18 4c 8d 73 20
    RSP: 0018:ffff88811b5978b8 EFLAGS: 00000246
    RAX: 0000000000000000 RBX: ffff88811a7d3780 RCX: ffffffffb7a4d768
    RDX: dffffc0000000000 RSI: ffff88811b597908 RDI: ffff888115408040
    RBP: 1ffff110236b2f1b R08: 0000000000000000 R09: ffff88811a7d37e7
    R10: ffffed10234fa6fc R11: 0000000000000001 R12: ffff88811179b800
    R13: 0000000000000001 R14: ffff88811a7d38a8 R15: ffff88811a7d37e0
    FS:  00007f6fb5398740(0000) GS:ffff888237180000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020000000 CR3: 000000010b6ba002 CR4: 0000000000370ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     tcp_msg_wait_data+0x279/0x2f0
     tcp_bpf_recvmsg_parser+0x3c6/0x490
     inet_recvmsg+0x280/0x290
     sock_recvmsg+0xfc/0x120
     ____sys_recvmsg+0x160/0x3d0
     ___sys_recvmsg+0xf0/0x180
     __sys_recvmsg+0xea/0x1a0
     do_syscall_64+0x3f/0x90
     entry_SYSCALL_64_after_hwframe+0x72/0xdc
    
    The logic in tcp_bpf_recvmsg_parser is as follows:
    
    msg_bytes_ready:
            copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
            if (!copied) {
                    wait data;
                    goto msg_bytes_ready;
            }
    
    In this case, "copied" always is 0, the infinite loop occurs.
    
    According to the Linux system call man page, 0 should be returned in this
    case. Therefore, in tcp_bpf_recvmsg_parser(), if the length is 0, directly
    return. Also modify several other functions with the same problem.
    
    Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap")
    Fixes: 9825d866ce0d ("af_unix: Implement unix_dgram_bpf_recvmsg()")
    Fixes: c5d2177a72a1 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Liu Jian <liujian56@huawei.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Cc: Jakub Sitnicki <jakub@cloudflare.com>
    Link: https://lore.kernel.org/bpf/20230303080946.1146638-1-liujian56@huawei.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
brd: mark as nowait compatible [+ + +]
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Feb 15 16:43:47 2023 -0700

    brd: mark as nowait compatible
    
    [ Upstream commit 67205f80be9910207481406c47f7d85e703fb2e9 ]
    
    By default, non-mq drivers do not support nowait. This causes io_uring
    to use a slower path as the driver cannot be trust not to block. brd
    can safely set the nowait flag, as worst case all it does is a NOIO
    allocation.
    
    For io_uring, this makes a substantial difference. Before:
    
    submitter=0, tid=453, file=/dev/ram0, node=-1
    polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
    Engine=io_uring, sq_ring=128, cq_ring=128
    IOPS=440.03K, BW=1718MiB/s, IOS/call=32/31
    IOPS=428.96K, BW=1675MiB/s, IOS/call=32/32
    IOPS=442.59K, BW=1728MiB/s, IOS/call=32/31
    IOPS=419.65K, BW=1639MiB/s, IOS/call=32/32
    IOPS=426.82K, BW=1667MiB/s, IOS/call=32/31
    
    and after:
    
    submitter=0, tid=354, file=/dev/ram0, node=-1
    polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
    Engine=io_uring, sq_ring=128, cq_ring=128
    IOPS=3.37M, BW=13.15GiB/s, IOS/call=32/31
    IOPS=3.45M, BW=13.46GiB/s, IOS/call=32/31
    IOPS=3.43M, BW=13.42GiB/s, IOS/call=32/32
    IOPS=3.43M, BW=13.39GiB/s, IOS/call=32/31
    IOPS=3.43M, BW=13.38GiB/s, IOS/call=32/31
    
    or about an 8x in difference. Now that brd is prepared to deal with
    REQ_NOWAIT reads/writes, mark it as supporting that.
    
    Cc: stable@vger.kernel.org # 5.10+
    Link: https://lore.kernel.org/linux-block/20230203103005.31290-1-p.raghav@samsung.com/
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR [+ + +]
Author: Lorenz Bauer <lorenz.bauer@isovalent.com>
Date:   Mon Mar 6 11:21:37 2023 +0000

    btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
    
    [ Upstream commit 9b459804ff9973e173fabafba2a1319f771e85fa ]
    
    btf_datasec_resolve contains a bug that causes the following BTF
    to fail loading:
    
        [1] DATASEC a size=2 vlen=2
            type_id=4 offset=0 size=1
            type_id=7 offset=1 size=1
        [2] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
        [3] PTR (anon) type_id=2
        [4] VAR a type_id=3 linkage=0
        [5] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
        [6] TYPEDEF td type_id=5
        [7] VAR b type_id=6 linkage=0
    
    This error message is printed during btf_check_all_types:
    
        [1] DATASEC a size=2 vlen=2
            type_id=7 offset=1 size=1 Invalid type
    
    By tracing btf_*_resolve we can pinpoint the problem:
    
        btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_TBD) = 0
            btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_TBD) = 0
                btf_ptr_resolve(depth: 3, type_id: 3, mode: RESOLVE_PTR) = 0
            btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_PTR) = 0
        btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_PTR) = -22
    
    The last invocation of btf_datasec_resolve should invoke btf_var_resolve
    by means of env_stack_push, instead it returns EINVAL. The reason is that
    env_stack_push is never executed for the second VAR.
    
        if (!env_type_is_resolve_sink(env, var_type) &&
            !env_type_is_resolved(env, var_type_id)) {
            env_stack_set_next_member(env, i + 1);
            return env_stack_push(env, var_type, var_type_id);
        }
    
    env_type_is_resolve_sink() changes its behaviour based on resolve_mode.
    For RESOLVE_PTR, we can simplify the if condition to the following:
    
        (btf_type_is_modifier() || btf_type_is_ptr) && !env_type_is_resolved()
    
    Since we're dealing with a VAR the clause evaluates to false. This is
    not sufficient to trigger the bug however. The log output and EINVAL
    are only generated if btf_type_id_size() fails.
    
        if (!btf_type_id_size(btf, &type_id, &type_size)) {
            btf_verifier_log_vsi(env, v->t, vsi, "Invalid type");
            return -EINVAL;
        }
    
    Most types are sized, so for example a VAR referring to an INT is not a
    problem. The bug is only triggered if a VAR points at a modifier. Since
    we skipped btf_var_resolve that modifier was also never resolved, which
    means that btf_resolved_type_id returns 0 aka VOID for the modifier.
    This in turn causes btf_type_id_size to return NULL, triggering EINVAL.
    
    To summarise, the following conditions are necessary:
    
    - VAR pointing at PTR, STRUCT, UNION or ARRAY
    - Followed by a VAR pointing at TYPEDEF, VOLATILE, CONST, RESTRICT or
      TYPE_TAG
    
    The fix is to reset resolve_mode to RESOLVE_TBD before attempting to
    resolve a VAR from a DATASEC.
    
    Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec")
    Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
    Link: https://lore.kernel.org/r/20230306112138.155352-2-lmb@isovalent.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
btrfs: fix percent calculation for bg reclaim message [+ + +]
Author: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Date:   Tue Feb 21 10:11:24 2023 -0800

    btrfs: fix percent calculation for bg reclaim message
    
    commit 95cd356ca23c3807b5f3503687161e216b1c520d upstream.
    
    We have a report, that the info message for block-group reclaim is
    crossing the 100% used mark.
    
    This is happening as we were truncating the divisor for the division
    (the block_group->length) to a 32bit value.
    
    Fix this by using div64_u64() to not truncate the divisor.
    
    In the worst case, it can lead to a div by zero error and should be
    possible to trigger on 4 disks RAID0, and each device is large enough:
    
      $ mkfs.btrfs  -f /dev/test/scratch[1234] -m raid1 -d raid0
      btrfs-progs v6.1
      [...]
      Filesystem size:    40.00GiB
      Block group profiles:
        Data:             RAID0             4.00GiB <<<
        Metadata:         RAID1           256.00MiB
        System:           RAID1             8.00MiB
    
    Reported-by: Forza <forza@tnonline.net>
    Link: https://lore.kernel.org/linux-btrfs/e99483.c11a58d.1863591ca52@tnonline.net/
    Fixes: 5f93e776c673 ("btrfs: zoned: print unusable percentage when reclaiming block groups")
    CC: stable@vger.kernel.org # 5.15+
    Reviewed-by: Anand Jain <anand.jain@oracle.com>
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    [ add Qu's note ]
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
drm/amdgpu: fix error checking in amdgpu_read_mm_registers for soc15 [+ + +]
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Mar 6 10:34:20 2023 -0500

    drm/amdgpu: fix error checking in amdgpu_read_mm_registers for soc15
    
    commit 0dcdf8498eae2727bb33cef3576991dc841d4343 upstream.
    
    Properly skip non-existent registers as well.
    
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442
    Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
    Reviewed-by: Evan Quan <evan.quan@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
drm/connector: print max_requested_bpc in state debugfs [+ + +]
Author: Harry Wentland <harry.wentland@amd.com>
Date:   Fri Jan 13 11:24:09 2023 -0500

    drm/connector: print max_requested_bpc in state debugfs
    
    commit 7d386975f6a495902e679a3a250a7456d7e54765 upstream.
    
    This is useful to understand the bpc defaults and
    support of a driver.
    
    Signed-off-by: Harry Wentland <harry.wentland@amd.com>
    Cc: Pekka Paalanen <ppaalanen@gmail.com>
    Cc: Sebastian Wick <sebastian.wick@redhat.com>
    Cc: Vitaly.Prosyak@amd.com
    Cc: Uma Shankar <uma.shankar@intel.com>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Cc: Joshua Ashton <joshua@froggi.es>
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: dri-devel@lists.freedesktop.org
    Cc: amd-gfx@lists.freedesktop.org
    Reviewed-By: Joshua Ashton <joshua@froggi.es>
    Link: https://patchwork.freedesktop.org/patch/msgid/20230113162428.33874-3-harry.wentland@amd.com
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
drm/msm/a5xx: fix context faults during ring switch [+ + +]
Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Tue Feb 14 05:09:56 2023 +0300

    drm/msm/a5xx: fix context faults during ring switch
    
    [ Upstream commit 32e7083429d46f29080626fe387ff90c086b1fbe ]
    
    The rptr_addr is set in the preempt_init_ring(), which is called from
    a5xx_gpu_init(). It uses shadowptr() to set the address, however the
    shadow_iova is not yet initialized at that time. Move the rptr_addr
    setting to the a5xx_preempt_hw_init() which is called after setting the
    shadow_iova, getting the correct value for the address.
    
    Fixes: 8907afb476ac ("drm/msm: Allow a5xx to mark the RPTR shadow as privileged")
    Suggested-by: Rob Clark <robdclark@gmail.com>
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Patchwork: https://patchwork.freedesktop.org/patch/522640/
    Link: https://lore.kernel.org/r/20230214020956.164473-5-dmitry.baryshkov@linaro.org
    Signed-off-by: Rob Clark <robdclark@chromium.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/a5xx: fix highest bank bit for a530 [+ + +]
Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Tue Feb 14 05:09:54 2023 +0300

    drm/msm/a5xx: fix highest bank bit for a530
    
    [ Upstream commit 141f66ebbfa17cc7e2075f06c50107da978c965b ]
    
    A530 has highest bank bit equal to 15 (like A540). Fix values written to
    REG_A5XX_RB_MODE_CNTL and REG_A5XX_TPL1_MODE_CNTL registers.
    
    Fixes: 1d832ab30ce6 ("drm/msm/a5xx: Add support for Adreno 508, 509, 512 GPUs")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Patchwork: https://patchwork.freedesktop.org/patch/522639/
    Link: https://lore.kernel.org/r/20230214020956.164473-3-dmitry.baryshkov@linaro.org
    Signed-off-by: Rob Clark <robdclark@chromium.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/a5xx: fix setting of the CP_PREEMPT_ENABLE_LOCAL register [+ + +]
Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Tue Feb 14 05:09:53 2023 +0300

    drm/msm/a5xx: fix setting of the CP_PREEMPT_ENABLE_LOCAL register
    
    [ Upstream commit a7a4c19c36de1e4b99b06e4060ccc8ab837725bc ]
    
    Rather than writing CP_PREEMPT_ENABLE_GLOBAL twice, follow the vendor
    kernel and set CP_PREEMPT_ENABLE_LOCAL register instead. a5xx_submit()
    will override it during submission, but let's get the sequence correct.
    
    Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Patchwork: https://patchwork.freedesktop.org/patch/522638/
    Link: https://lore.kernel.org/r/20230214020956.164473-2-dmitry.baryshkov@linaro.org
    Signed-off-by: Rob Clark <robdclark@chromium.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/a5xx: fix the emptyness check in the preempt code [+ + +]
Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Tue Feb 14 05:09:55 2023 +0300

    drm/msm/a5xx: fix the emptyness check in the preempt code
    
    [ Upstream commit b4fb748f0b734ce1d2e7834998cc599fcbd25d67 ]
    
    Quoting Yassine: ring->memptrs->rptr is never updated and stays 0, so
    the comparison always evaluates to false and get_next_ring always
    returns ring 0 thinking it isn't empty.
    
    Fix this by calling get_rptr() instead of reading rptr directly.
    
    Reported-by: Yassine Oudjana <y.oudjana@protonmail.com>
    Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Patchwork: https://patchwork.freedesktop.org/patch/522642/
    Link: https://lore.kernel.org/r/20230214020956.164473-4-dmitry.baryshkov@linaro.org
    Signed-off-by: Rob Clark <robdclark@chromium.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
drm/msm/dpu: fix len of sc7180 ctl blocks [+ + +]
Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Sun Feb 12 01:12:13 2023 +0200

    drm/msm/dpu: fix len of sc7180 ctl blocks
    
    [ Upstream commit ce6bd00abc220e9edf10986234fadba6462b4abf ]
    
    Change sc7180's ctl block len to 0x1dc.
    
    Fixes: 7bdc0c4b8126 ("msm:disp:dpu1: add support for display for SC7180 target")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/522210/
    Link: https://lore.kernel.org/r/20230211231259.1308718-5-dmitry.baryshkov@linaro.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
drm/msm: Fix potential invalid ptr free [+ + +]
Author: Rob Clark <robdclark@chromium.org>
Date:   Wed Feb 15 15:50:48 2023 -0800

    drm/msm: Fix potential invalid ptr free
    
    [ Upstream commit 8a86f213f4426f19511a16d886871805b35c3acf ]
    
    The error path cleanup expects that chain and syncobj are either NULL or
    valid pointers.  But post_deps was not allocated with __GFP_ZERO.
    
    Fixes: ab723b7a992a ("drm/msm: Add syncobj support.")
    Signed-off-by: Rob Clark <robdclark@chromium.org>
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
    Patchwork: https://patchwork.freedesktop.org/patch/523051/
    Link: https://lore.kernel.org/r/20230215235048.1166484-1-robdclark@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
drm/nouveau/kms/nv50-: remove unused functions [+ + +]
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Wed Jun 1 20:46:06 2022 +1000

    drm/nouveau/kms/nv50-: remove unused functions
    
    [ Upstream commit 89ed996b888faaf11c69bb4cbc19f21475c9050e ]
    
    Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
    Reviewed-by: Dave Airlie <airlied@redhat.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Stable-dep-of: 3638a820c5c3 ("drm/nouveau/kms/nv50: fix nv50_wndw_new_ prototype")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
drm/nouveau/kms/nv50: fix nv50_wndw_new_ prototype [+ + +]
Author: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Date:   Mon Oct 31 12:42:29 2022 +0100

    drm/nouveau/kms/nv50: fix nv50_wndw_new_ prototype
    
    [ Upstream commit 3638a820c5c3b52f327cebb174fd4274bee08aa7 ]
    
    gcc-13 warns about mismatching types for enums. That revealed switched
    arguments of nv50_wndw_new_():
      drivers/gpu/drm/nouveau/dispnv50/wndw.c:696:1: error: conflicting types for 'nv50_wndw_new_' due to enum/integer mismatch; have 'int(const struct nv50_wndw_func *, struct drm_device *, enum drm_plane_type,  const char *, int,  const u32 *, u32,  enum nv50_disp_interlock_type,  u32,  struct nv50_wndw **)'
      drivers/gpu/drm/nouveau/dispnv50/wndw.h:36:5: note: previous declaration of 'nv50_wndw_new_' with type 'int(const struct nv50_wndw_func *, struct drm_device *, enum drm_plane_type,  const char *, int,  const u32 *, enum nv50_disp_interlock_type,  u32,  u32,  struct nv50_wndw **)'
    
    It can be barely visible, but the declaration says about the parameters
    in the middle:
      enum nv50_disp_interlock_type,
      u32 interlock_data,
      u32 heads,
    
    While the definition states differently:
      u32 heads,
      enum nv50_disp_interlock_type interlock_type,
      u32 interlock_data,
    
    Unify/fix the declaration to match the definition.
    
    Fixes: 53e0a3e70de6 ("drm/nouveau/kms/nv50-: simplify tracking of channel interlocks")
    Cc: Martin Liska <mliska@suse.cz>
    Cc: Ben Skeggs <bskeggs@redhat.com>
    Cc: Karol Herbst <kherbst@redhat.com>
    Cc: Lyude Paul <lyude@redhat.com>
    Cc: David Airlie <airlied@gmail.com>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: dri-devel@lists.freedesktop.org
    Cc: nouveau@lists.freedesktop.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
    Signed-off-by: Karol Herbst <kherbst@redhat.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20221031114229.10289-1-jirislaby@kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
ext4: add ext4_sb_block_valid() refactored out of ext4_inode_block_valid() [+ + +]
Author: Ritesh Harjani <riteshh@linux.ibm.com>
Date:   Wed Feb 16 12:32:49 2022 +0530

    ext4: add ext4_sb_block_valid() refactored out of ext4_inode_block_valid()
    
    commit 6bc6c2bdf1baca6522b8d9ba976257d722423085 upstream.
    
    This API will be needed at places where we don't have an inode
    for e.g. while freeing blocks in ext4_group_add_blocks()
    
    Suggested-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Link: https://lore.kernel.org/r/dd34a236543ad5ae7123eeebe0cb69e6bdd44f34.1644992610.git.riteshh@linux.ibm.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: add strict range checks while freeing blocks [+ + +]
Author: Ritesh Harjani <riteshh@linux.ibm.com>
Date:   Wed Feb 16 12:32:50 2022 +0530

    ext4: add strict range checks while freeing blocks
    
    commit a00b482b82fb098956a5bed22bd7873e56f152f1 upstream.
    
    Currently ext4_mb_clear_bb() & ext4_group_add_blocks() only checks
    whether the given block ranges (which is to be freed) belongs to any FS
    metadata blocks or not, of the block's respective block group.
    But to detect any FS error early, it is better to add more strict
    checkings in those functions which checks whether the given blocks
    belongs to any critical FS metadata or not within system-zone.
    
    Suggested-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/ddd9143d064774e32d6364a99667817c6e8bfdc0.1644992610.git.riteshh@linux.ibm.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: block range must be validated before use in ext4_mb_clear_bb() [+ + +]
Author: Lukas Czerner <lczerner@redhat.com>
Date:   Thu Jul 14 18:59:03 2022 +0200

    ext4: block range must be validated before use in ext4_mb_clear_bb()
    
    commit 1e1c2b86ef86a8477fd9b9a4f48a6bfe235606f6 upstream.
    
    Block range to free is validated in ext4_free_blocks() using
    ext4_inode_block_valid() and then it's passed to ext4_mb_clear_bb().
    However in some situations on bigalloc file system the range might be
    adjusted after the validation in ext4_free_blocks() which can lead to
    troubles on corrupted file systems such as one found by syzkaller that
    resulted in the following BUG
    
    kernel BUG at fs/ext4/ext4.h:3319!
    PREEMPT SMP NOPTI
    CPU: 28 PID: 4243 Comm: repro Kdump: loaded Not tainted 5.19.0-rc6+ #1
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
    RIP: 0010:ext4_free_blocks+0x95e/0xa90
    Call Trace:
     <TASK>
     ? lock_timer_base+0x61/0x80
     ? __es_remove_extent+0x5a/0x760
     ? __mod_timer+0x256/0x380
     ? ext4_ind_truncate_ensure_credits+0x90/0x220
     ext4_clear_blocks+0x107/0x1b0
     ext4_free_data+0x15b/0x170
     ext4_ind_truncate+0x214/0x2c0
     ? _raw_spin_unlock+0x15/0x30
     ? ext4_discard_preallocations+0x15a/0x410
     ? ext4_journal_check_start+0xe/0x90
     ? __ext4_journal_start_sb+0x2f/0x110
     ext4_truncate+0x1b5/0x460
     ? __ext4_journal_start_sb+0x2f/0x110
     ext4_evict_inode+0x2b4/0x6f0
     evict+0xd0/0x1d0
     ext4_enable_quotas+0x11f/0x1f0
     ext4_orphan_cleanup+0x3de/0x430
     ? proc_create_seq_private+0x43/0x50
     ext4_fill_super+0x295f/0x3ae0
     ? snprintf+0x39/0x40
     ? sget_fc+0x19c/0x330
     ? ext4_reconfigure+0x850/0x850
     get_tree_bdev+0x16d/0x260
     vfs_get_tree+0x25/0xb0
     path_mount+0x431/0xa70
     __x64_sys_mount+0xe2/0x120
     do_syscall_64+0x5b/0x80
     ? do_user_addr_fault+0x1e2/0x670
     ? exc_page_fault+0x70/0x170
     entry_SYSCALL_64_after_hwframe+0x46/0xb0
    RIP: 0033:0x7fdf4e512ace
    
    Fix it by making sure that the block range is properly validated before
    used every time it changes in ext4_free_blocks() or ext4_mb_clear_bb().
    
    Link: https://syzkaller.appspot.com/bug?id=5266d464285a03cee9dbfda7d2452a72c3c2ae7c
    Reported-by: syzbot+15cd994e273307bf5cfa@syzkaller.appspotmail.com
    Signed-off-by: Lukas Czerner <lczerner@redhat.com>
    Cc: Tadeusz Struk <tadeusz.struk@linaro.org>
    Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
    Link: https://lore.kernel.org/r/20220714165903.58260-1-lczerner@redhat.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix another off-by-one fsmap error on 1k block filesystems [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Thu Feb 16 10:55:48 2023 -0800

    ext4: fix another off-by-one fsmap error on 1k block filesystems
    
    commit c993799baf9c5861f8df91beb80e1611b12efcbd upstream.
    
    Apparently syzbot figured out that issuing this FSMAP call:
    
    struct fsmap_head cmd = {
            .fmh_count      = ...;
            .fmh_keys       = {
                    { .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
                    { .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
            },
    ...
    };
    ret = ioctl(fd, FS_IOC_GETFSMAP, &cmd);
    
    Produces this crash if the underlying filesystem is a 1k-block ext4
    filesystem:
    
    kernel BUG at fs/ext4/ext4.h:3331!
    invalid opcode: 0000 [#1] PREEMPT SMP
    CPU: 3 PID: 3227965 Comm: xfs_io Tainted: G        W  O       6.2.0-rc8-achx
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
    RIP: 0010:ext4_mb_load_buddy_gfp+0x47c/0x570 [ext4]
    RSP: 0018:ffffc90007c03998 EFLAGS: 00010246
    RAX: ffff888004978000 RBX: ffffc90007c03a20 RCX: ffff888041618000
    RDX: 0000000000000000 RSI: 00000000000005a4 RDI: ffffffffa0c99b11
    RBP: ffff888012330000 R08: ffffffffa0c2b7d0 R09: 0000000000000400
    R10: ffffc90007c03950 R11: 0000000000000000 R12: 0000000000000001
    R13: 00000000ffffffff R14: 0000000000000c40 R15: ffff88802678c398
    FS:  00007fdf2020c880(0000) GS:ffff88807e100000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007ffd318a5fe8 CR3: 000000007f80f001 CR4: 00000000001706e0
    Call Trace:
     <TASK>
     ext4_mballoc_query_range+0x4b/0x210 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
     ext4_getfsmap_datadev+0x713/0x890 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
     ext4_getfsmap+0x2b7/0x330 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
     ext4_ioc_getfsmap+0x153/0x2b0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
     __ext4_ioctl+0x2a7/0x17e0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
     __x64_sys_ioctl+0x82/0xa0
     do_syscall_64+0x2b/0x80
     entry_SYSCALL_64_after_hwframe+0x46/0xb0
    RIP: 0033:0x7fdf20558aff
    RSP: 002b:00007ffd318a9e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    RAX: ffffffffffffffda RBX: 00000000000200c0 RCX: 00007fdf20558aff
    RDX: 00007fdf1feb2010 RSI: 00000000c0c0583b RDI: 0000000000000003
    RBP: 00005625c0634be0 R08: 00005625c0634c40 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdf1feb2010
    R13: 00005625be70d994 R14: 0000000000000800 R15: 0000000000000000
    
    For GETFSMAP calls, the caller selects a physical block device by
    writing its block number into fsmap_head.fmh_keys[01].fmr_device.
    To query mappings for a subrange of the device, the starting byte of the
    range is written to fsmap_head.fmh_keys[0].fmr_physical and the last
    byte of the range goes in fsmap_head.fmh_keys[1].fmr_physical.
    
    IOWs, to query what mappings overlap with bytes 3-14 of /dev/sda, you'd
    set the inputs as follows:
    
            fmh_keys[0] = { .fmr_device = major(8, 0), .fmr_physical = 3},
            fmh_keys[1] = { .fmr_device = major(8, 0), .fmr_physical = 14},
    
    Which would return you whatever is mapped in the 12 bytes starting at
    physical offset 3.
    
    The crash is due to insufficient range validation of keys[1] in
    ext4_getfsmap_datadev.  On 1k-block filesystems, block 0 is not part of
    the filesystem, which means that s_first_data_block is nonzero.
    ext4_get_group_no_and_offset subtracts this quantity from the blocknr
    argument before cracking it into a group number and a block number
    within a group.  IOWs, block group 0 spans blocks 1-8192 (1-based)
    instead of 0-8191 (0-based) like what happens with larger blocksizes.
    
    The net result of this encoding is that blocknr < s_first_data_block is
    not a valid input to this function.  The end_fsb variable is set from
    the keys that are copied from userspace, which means that in the above
    example, its value is zero.  That leads to an underflow here:
    
            blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
    
    The division then operates on -1:
    
            offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
                    EXT4_SB(sb)->s_cluster_bits;
    
    Leaving an impossibly large group number (2^32-1) in blocknr.
    ext4_getfsmap_check_keys checked that keys[0].fmr_physical and
    keys[1].fmr_physical are in increasing order, but
    ext4_getfsmap_datadev adjusts keys[0].fmr_physical to be at least
    s_first_data_block.  This implies that we have to check it again after
    the adjustment, which is the piece that I forgot.
    
    Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
    Fixes: 4a4956249dac ("ext4: fix off-by-one fsmap error on 1k block filesystems")
    Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
    Cc: stable@vger.kernel.org
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/Y+58NPTH7VNGgzdd@magnolia
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix cgroup writeback accounting with fs-layer encryption [+ + +]
Author: Eric Biggers <ebiggers@google.com>
Date:   Thu Feb 2 16:55:03 2023 -0800

    ext4: fix cgroup writeback accounting with fs-layer encryption
    
    commit ffec85d53d0f39ee4680a2cf0795255e000e1feb upstream.
    
    When writing a page from an encrypted file that is using
    filesystem-layer encryption (not inline encryption), ext4 encrypts the
    pagecache page into a bounce page, then writes the bounce page.
    
    It also passes the bounce page to wbc_account_cgroup_owner().  That's
    incorrect, because the bounce page is a newly allocated temporary page
    that doesn't have the memory cgroup of the original pagecache page.
    This makes wbc_account_cgroup_owner() not account the I/O to the owner
    of the pagecache page as it should.
    
    Fix this by always passing the pagecache page to
    wbc_account_cgroup_owner().
    
    Fixes: 001e4a8775f6 ("ext4: implement cgroup writeback support")
    Cc: stable@vger.kernel.org
    Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Eric Biggers <ebiggers@google.com>
    Acked-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20230203005503.141557-1-ebiggers@kernel.org
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: Fix deadlock during directory rename [+ + +]
Author: Jan Kara <jack@suse.cz>
Date:   Wed Mar 1 15:10:04 2023 +0100

    ext4: Fix deadlock during directory rename
    
    [ Upstream commit 3c92792da8506a295afb6d032b4476e46f979725 ]
    
    As lockdep properly warns, we should not be locking i_rwsem while having
    transactions started as the proper lock ordering used by all directory
    handling operations is i_rwsem -> transaction start. Fix the lock
    ordering by moving the locking of the directory earlier in
    ext4_rename().
    
    Reported-by: syzbot+9d16c39efb5fade84574@syzkaller.appspotmail.com
    Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory")
    Link: https://syzkaller.appspot.com/bug?extid=9d16c39efb5fade84574
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20230301141004.15087-1-jack@suse.cz
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ext4: Fix possible corruption when moving a directory [+ + +]
Author: Jan Kara <jack@suse.cz>
Date:   Thu Jan 26 12:22:21 2023 +0100

    ext4: Fix possible corruption when moving a directory
    
    [ Upstream commit 0813299c586b175d7edb25f56412c54b812d0379 ]
    
    When we are renaming a directory to a different directory, we need to
    update '..' entry in the moved directory. However nothing prevents moved
    directory from being modified and even converted from the inline format
    to the normal format. When such race happens the rename code gets
    confused and we crash. Fix the problem by locking the moved directory.
    
    CC: stable@vger.kernel.org
    Fixes: 32f7f22c0b52 ("ext4: let ext4_rename handle inline dir")
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20230126112221.11866-1-jack@suse.cz
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ext4: fix RENAME_WHITEOUT handling for inline directories [+ + +]
Author: Eric Whitney <enwlinux@gmail.com>
Date:   Fri Feb 10 12:32:44 2023 -0500

    ext4: fix RENAME_WHITEOUT handling for inline directories
    
    commit c9f62c8b2dbf7240536c0cc9a4529397bb8bf38e upstream.
    
    A significant number of xfstests can cause ext4 to log one or more
    warning messages when they are run on a test file system where the
    inline_data feature has been enabled.  An example:
    
    "EXT4-fs warning (device vdc): ext4_dirblock_csum_set:425: inode
     #16385: comm fsstress: No space for directory leaf checksum. Please
    run e2fsck -D."
    
    The xfstests include: ext4/057, 058, and 307; generic/013, 051, 068,
    070, 076, 078, 083, 232, 269, 270, 390, 461, 475, 476, 482, 579, 585,
    589, 626, 631, and 650.
    
    In this situation, the warning message indicates a bug in the code that
    performs the RENAME_WHITEOUT operation on a directory entry that has
    been stored inline.  It doesn't detect that the directory is stored
    inline, and incorrectly attempts to compute a dirent block checksum on
    the whiteout inode when creating it.  This attempt fails as a result
    of the integrity checking in get_dirent_tail (usually due to a failure
    to match the EXT4_FT_DIR_CSUM magic cookie), and the warning message
    is then emitted.
    
    Fix this by simply collecting the inlined data state at the time the
    search for the source directory entry is performed.  Existing code
    handles the rest, and this is sufficient to eliminate all spurious
    warning messages produced by the tests above.  Go one step further
    and do the same in the code that resets the source directory entry in
    the event of failure.  The inlined state should be present in the
    "old" struct, but given the possibility of a race there's no harm
    in taking a conservative approach and getting that information again
    since the directory entry is being reread anyway.
    
    Fixes: b7ff91fd030d ("ext4: find old entry again if failed to rename whiteout")
    Cc: stable@kernel.org
    Signed-off-by: Eric Whitney <enwlinux@gmail.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20230210173244.679890-1-enwlinux@gmail.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix WARNING in ext4_update_inline_data [+ + +]
Author: Ye Bin <yebin10@huawei.com>
Date:   Tue Mar 7 09:52:53 2023 +0800

    ext4: fix WARNING in ext4_update_inline_data
    
    commit 2b96b4a5d9443ca4cad58b0040be455803c05a42 upstream.
    
    Syzbot found the following issue:
    EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 without journal. Quota mode: none.
    fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-aesni"
    fscrypt: AES-256-XTS using implementation "xts-aes-aesni"
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 5071 at mm/page_alloc.c:5525 __alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
    Modules linked in:
    CPU: 1 PID: 5071 Comm: syz-executor263 Not tainted 6.2.0-rc1-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
    RIP: 0010:__alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
    RSP: 0018:ffffc90003c2f1c0 EFLAGS: 00010246
    RAX: ffffc90003c2f220 RBX: 0000000000000014 RCX: 0000000000000000
    RDX: 0000000000000028 RSI: 0000000000000000 RDI: ffffc90003c2f248
    RBP: ffffc90003c2f2d8 R08: dffffc0000000000 R09: ffffc90003c2f220
    R10: fffff52000785e49 R11: 1ffff92000785e44 R12: 0000000000040d40
    R13: 1ffff92000785e40 R14: dffffc0000000000 R15: 1ffff92000785e3c
    FS:  0000555556c0d300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f95d5e04138 CR3: 00000000793aa000 CR4: 00000000003506f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     __alloc_pages_node include/linux/gfp.h:237 [inline]
     alloc_pages_node include/linux/gfp.h:260 [inline]
     __kmalloc_large_node+0x95/0x1e0 mm/slab_common.c:1113
     __do_kmalloc_node mm/slab_common.c:956 [inline]
     __kmalloc+0xfe/0x190 mm/slab_common.c:981
     kmalloc include/linux/slab.h:584 [inline]
     kzalloc include/linux/slab.h:720 [inline]
     ext4_update_inline_data+0x236/0x6b0 fs/ext4/inline.c:346
     ext4_update_inline_dir fs/ext4/inline.c:1115 [inline]
     ext4_try_add_inline_entry+0x328/0x990 fs/ext4/inline.c:1307
     ext4_add_entry+0x5a4/0xeb0 fs/ext4/namei.c:2385
     ext4_add_nondir+0x96/0x260 fs/ext4/namei.c:2772
     ext4_create+0x36c/0x560 fs/ext4/namei.c:2817
     lookup_open fs/namei.c:3413 [inline]
     open_last_lookups fs/namei.c:3481 [inline]
     path_openat+0x12ac/0x2dd0 fs/namei.c:3711
     do_filp_open+0x264/0x4f0 fs/namei.c:3741
     do_sys_openat2+0x124/0x4e0 fs/open.c:1310
     do_sys_open fs/open.c:1326 [inline]
     __do_sys_openat fs/open.c:1342 [inline]
     __se_sys_openat fs/open.c:1337 [inline]
     __x64_sys_openat+0x243/0x290 fs/open.c:1337
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    Above issue happens as follows:
    ext4_iget
       ext4_find_inline_data_nolock ->i_inline_off=164 i_inline_size=60
    ext4_try_add_inline_entry
       __ext4_mark_inode_dirty
          ext4_expand_extra_isize_ea ->i_extra_isize=32 s_want_extra_isize=44
             ext4_xattr_shift_entries
             ->after shift i_inline_off is incorrect, actually is change to 176
    ext4_try_add_inline_entry
      ext4_update_inline_dir
        get_max_inline_xattr_value_size
          if (EXT4_I(inode)->i_inline_off)
            entry = (struct ext4_xattr_entry *)((void *)raw_inode +
                            EXT4_I(inode)->i_inline_off);
            free += EXT4_XATTR_SIZE(le32_to_cpu(entry->e_value_size));
            ->As entry is incorrect, then 'free' may be negative
       ext4_update_inline_data
          value = kzalloc(len, GFP_NOFS);
          -> len is unsigned int, maybe very large, then trigger warning when
             'kzalloc()'
    
    To resolve the above issue we need to update 'i_inline_off' after
    'ext4_xattr_shift_entries()'.  We do not need to set
    EXT4_STATE_MAY_INLINE_DATA flag here, since ext4_mark_inode_dirty()
    already sets this flag if needed.  Setting EXT4_STATE_MAY_INLINE_DATA
    when it is needed may trigger a BUG_ON in ext4_writepages().
    
    Reported-by: syzbot+d30838395804afc2fa6f@syzkaller.appspotmail.com
    Cc: stable@kernel.org
    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20230307015253.2232062-3-yebin@huaweicloud.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: move where set the MAY_INLINE_DATA flag is set [+ + +]
Author: Ye Bin <yebin10@huawei.com>
Date:   Tue Mar 7 09:52:52 2023 +0800

    ext4: move where set the MAY_INLINE_DATA flag is set
    
    commit 1dcdce5919115a471bf4921a57f20050c545a236 upstream.
    
    The only caller of ext4_find_inline_data_nolock() that needs setting of
    EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode().  In
    ext4_write_inline_data_end() we just need to update inode->i_inline_off.
    Since we are going to add one more caller that does not need to set
    EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA
    out to ext4_iget_extra_inode().
    
    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Cc: stable@kernel.org
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20230307015253.2232062-2-yebin@huaweicloud.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: refactor ext4_free_blocks() to pull out ext4_mb_clear_bb() [+ + +]
Author: Ritesh Harjani <riteshh@linux.ibm.com>
Date:   Wed Feb 16 12:32:45 2022 +0530

    ext4: refactor ext4_free_blocks() to pull out ext4_mb_clear_bb()
    
    commit 8ac3939db99f99667b8eb670cf4baf292896e72d upstream.
    
    ext4_free_blocks() function became too long and confusing, this patch
    just pulls out the ext4_mb_clear_bb() function logic from it
    which clears the block bitmap and frees it.
    
    No functionality change in this patch
    
    Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/22c30fbb26ba409cf8aa5f0c7912970272c459e8.1644992610.git.riteshh@linux.ibm.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: zero i_disksize when initializing the bootloader inode [+ + +]
Author: Zhihao Cheng <chengzhihao1@huawei.com>
Date:   Wed Mar 8 11:26:43 2023 +0800

    ext4: zero i_disksize when initializing the bootloader inode
    
    commit f5361da1e60d54ec81346aee8e3d8baf1be0b762 upstream.
    
    If the boot loader inode has never been used before, the
    EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the
    i_size to 0.  However, if the "never before used" boot loader has a
    non-zero i_size, then i_disksize will be non-zero, and the
    inconsistency between i_size and i_disksize can trigger a kernel
    warning:
    
     WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319
     CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa
     RIP: 0010:ext4_file_write_iter+0xbc7/0xd10
     Call Trace:
      vfs_write+0x3b1/0x5c0
      ksys_write+0x77/0x160
      __x64_sys_write+0x22/0x30
      do_syscall_64+0x39/0x80
    
    Reproducer:
     1. create corrupted image and mount it:
           mke2fs -t ext4 /tmp/foo.img 200
           debugfs -wR "sif <5> size 25700" /tmp/foo.img
           mount -t ext4 /tmp/foo.img /mnt
           cd /mnt
           echo 123 > file
     2. Run the reproducer program:
           posix_memalign(&buf, 1024, 1024)
           fd = open("file", O_RDWR | O_DIRECT);
           ioctl(fd, EXT4_IOC_SWAP_BOOT);
           write(fd, buf, 1024);
    
    Fix this by setting i_disksize as well as i_size to zero when
    initiaizing the boot loader inode.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159
    Cc: stable@kernel.org
    Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
    Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
f2fs: avoid down_write on nat_tree_lock during checkpoint [+ + +]
Author: Jaegeuk Kim <jaegeuk@kernel.org>
Date:   Mon Dec 13 13:28:40 2021 -0800

    f2fs: avoid down_write on nat_tree_lock during checkpoint
    
    [ Upstream commit 0df035c7208c5e3e2ae7685548353ae536a19015 ]
    
    Let's cache nat entry if there's no lock contention only.
    
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Stable-dep-of: 3aa51c61cb4a ("f2fs: retry to update the inode page given data corruption")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

f2fs: do not bother checkpoint by f2fs_get_node_info [+ + +]
Author: Jaegeuk Kim <jaegeuk@kernel.org>
Date:   Mon Dec 13 14:16:32 2021 -0800

    f2fs: do not bother checkpoint by f2fs_get_node_info
    
    [ Upstream commit a9419b63bf414775e8aeee95d8c4a5e0df690748 ]
    
    This patch tries to mitigate lock contention between f2fs_write_checkpoint and
    f2fs_get_node_info along with nat_tree_lock.
    
    The idea is, if checkpoint is currently running, other threads that try to grab
    nat_tree_lock would be better to wait for checkpoint.
    
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Stable-dep-of: 3aa51c61cb4a ("f2fs: retry to update the inode page given data corruption")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

f2fs: retry to update the inode page given data corruption [+ + +]
Author: Jaegeuk Kim <jaegeuk@kernel.org>
Date:   Mon Jan 30 15:20:09 2023 -0800

    f2fs: retry to update the inode page given data corruption
    
    [ Upstream commit 3aa51c61cb4a4dcb40df51ac61171e9ac5a35321 ]
    
    If the storage gives a corrupted node block due to short power failure and
    reset, f2fs stops the entire operations by setting the checkpoint failure flag.
    
    Let's give more chances to live by re-issuing IOs for a while in such critical
    path.
    
    Cc: stable@vger.kernel.org
    Suggested-by: Randall Huang <huangrandall@google.com>
    Suggested-by: Chao Yu <chao@kernel.org>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
filelocks: use mount idmapping for setlease permission check [+ + +]
Author: Seth Forshee <sforshee@kernel.org>
Date:   Thu Mar 9 14:39:09 2023 -0600

    filelocks: use mount idmapping for setlease permission check
    
    commit 42d0c4bdf753063b6eec55415003184d3ca24f6e upstream.
    
    A user should be allowed to take out a lease via an idmapped mount if
    the fsuid matches the mapped uid of the inode. generic_setlease() is
    checking the unmapped inode uid, causing these operations to be denied.
    
    Fix this by comparing against the mapped inode uid instead of the
    unmapped uid.
    
    Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
    Cc: stable@vger.kernel.org
    Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
fork: allow CLONE_NEWTIME in clone3 flags [+ + +]
Author: Tobias Klauser <tklauser@distanz.ch>
Date:   Wed Mar 8 11:51:26 2023 +0100

    fork: allow CLONE_NEWTIME in clone3 flags
    
    commit a402f1e35313fc7ce2ca60f543c4402c2c7c3544 upstream.
    
    Currently, calling clone3() with CLONE_NEWTIME in clone_args->flags
    fails with -EINVAL. This is because CLONE_NEWTIME intersects with
    CSIGNAL. However, CSIGNAL was deprecated when clone3 was introduced in
    commit 7f192e3cd316 ("fork: add clone3"), allowing re-use of that part
    of clone flags.
    
    Fix this by explicitly allowing CLONE_NEWTIME in clone3_args_valid. This
    is also in line with the respective check in check_unshare_flags which
    allow CLONE_NEWTIME for unshare().
    
    Fixes: 769071ac9f20 ("ns: Introduce Time Namespace")
    Cc: Andrey Vagin <avagin@openvz.org>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: stable@vger.kernel.org
    Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
fs: add mode_strip_sgid() helper [+ + +]
Author: Yang Xu <xuyang2018.jy@fujitsu.com>
Date:   Tue Mar 7 10:59:16 2023 -0800

    fs: add mode_strip_sgid() helper
    
    commit 2b3416ceff5e6bd4922f6d1c61fb68113dd82302 upsream.
    
    Add a dedicated helper to handle the setgid bit when creating a new file
    in a setgid directory. This is a preparatory patch for moving setgid
    stripping into the vfs. The patch contains no functional changes.
    
    Currently the setgid stripping logic is open-coded directly in
    inode_init_owner() and the individual filesystems are responsible for
    handling setgid inheritance. Since this has proven to be brittle as
    evidenced by old issues we uncovered over the last months (see [1] to
    [3] below) we will try to move this logic into the vfs.
    
    Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [1]
    Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2]
    Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3]
    Link: https://lore.kernel.org/r/1657779088-2242-1-git-send-email-xuyang2018.jy@fujitsu.com
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fs: dlm: add midcomms init/start functions [+ + +]
Author: Alexander Aring <aahringo@redhat.com>
Date:   Thu Nov 17 17:11:46 2022 -0500

    fs: dlm: add midcomms init/start functions
    
    [ Upstream commit 8b0188b0d60b6f6183b48380bac49fe080c5ded9 ]
    
    This patch introduces leftovers of init, start, stop and exit
    functionality. The dlm application layer should always call the midcomms
    layer which getting aware of such event and redirect it to the lowcomms
    layer. Some functionality which is currently handled inside the start
    functionality of midcomms and lowcomms should be handled in the init
    functionality as it only need to be initialized once when dlm is loaded.
    
    Signed-off-by: Alexander Aring <aahringo@redhat.com>
    Signed-off-by: David Teigland <teigland@redhat.com>
    Stable-dep-of: aad633dc0cf9 ("fs: dlm: start midcomms before scand")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fs: dlm: fix log of lowcomms vs midcomms [+ + +]
Author: Alexander Aring <aahringo@redhat.com>
Date:   Thu Oct 27 16:45:26 2022 -0400

    fs: dlm: fix log of lowcomms vs midcomms
    
    [ Upstream commit 3e54c9e80e68b765d8877023d93f1eea1b9d1c54 ]
    
    This patch will fix a small issue when printing out that
    dlm_midcomms_start() failed to start and it was printing out that the
    dlm subcomponent lowcomms was failed but lowcomms is behind the midcomms
    layer.
    
    Signed-off-by: Alexander Aring <aahringo@redhat.com>
    Signed-off-by: David Teigland <teigland@redhat.com>
    Stable-dep-of: aad633dc0cf9 ("fs: dlm: start midcomms before scand")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fs: dlm: start midcomms before scand [+ + +]
Author: Alexander Aring <aahringo@redhat.com>
Date:   Thu Jan 12 17:10:31 2023 -0500

    fs: dlm: start midcomms before scand
    
    [ Upstream commit aad633dc0cf90093998b1ae0ba9f19b5f1dab644 ]
    
    The scand kthread can send dlm messages out, especially dlm remove
    messages to free memory for unused rsb on other nodes. To send out dlm
    messages, midcomms must be initialized. This patch moves the midcomms
    start before scand is started.
    
    Cc: stable@vger.kernel.org
    Fixes: e7fd41792fc0 ("[DLM] The core of the DLM for GFS2/CLVM")
    Signed-off-by: Alexander Aring <aahringo@redhat.com>
    Signed-off-by: David Teigland <teigland@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fs: hold writers when changing mount's idmapping [+ + +]
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue May 10 11:58:40 2022 +0200

    fs: hold writers when changing mount's idmapping
    
    commit e1bbcd277a53e08d619ffeec56c5c9287f2bf42f upstream.
    
    Hold writers when changing a mount's idmapping to make it more robust.
    
    The vfs layer takes care to retrieve the idmapping of a mount once
    ensuring that the idmapping used for vfs permission checking is
    identical to the idmapping passed down to the filesystem.
    
    For ioctl codepaths the filesystem itself is responsible for taking the
    idmapping into account if they need to. While all filesystems with
    FS_ALLOW_IDMAP raised take the same precautions as the vfs we should
    enforce it explicitly by making sure there are no active writers on the
    relevant mount while changing the idmapping.
    
    This is similar to turning a mount ro with the difference that in
    contrast to turning a mount ro changing the idmapping can only ever be
    done once while a mount can transition between ro and rw as much as it
    wants.
    
    This is a minor user-visible change. But it is extremely unlikely to
    matter. The caller must've created a detached mount via OPEN_TREE_CLONE
    and then handed that O_PATH fd to another process or thread which then
    must've gotten a writable fd for that mount and started creating files
    in there while the caller is still changing mount properties. While not
    impossible it will be an extremely rare corner-case and should in
    general be considered a bug in the application. Consider making a mount
    MOUNT_ATTR_NOEXEC or MOUNT_ATTR_NODEV while allowing someone else to
    perform lookups or exec'ing in parallel by handing them a copy of the
    OPEN_TREE_CLONE fd or another fd beneath that mount.
    
    Link: https://lore.kernel.org/r/20220510095840.152264-1-brauner@kernel.org
    Cc: Seth Forshee <seth.forshee@digitalocean.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs: move S_ISGID stripping into the vfs_*() helpers [+ + +]
Author: Yang Xu <xuyang2018.jy@fujitsu.com>
Date:   Tue Mar 7 10:59:17 2023 -0800

    fs: move S_ISGID stripping into the vfs_*() helpers
    
    commit 1639a49ccdce58ea248841ed9b23babcce6dbb0b upsream.
    
    Move setgid handling out of individual filesystems and into the VFS
    itself to stop the proliferation of setgid inheritance bugs.
    
    Creating files that have both the S_IXGRP and S_ISGID bit raised in
    directories that themselves have the S_ISGID bit set requires additional
    privileges to avoid security issues.
    
    When a filesystem creates a new inode it needs to take care that the
    caller is either in the group of the newly created inode or they have
    CAP_FSETID in their current user namespace and are privileged over the
    parent directory of the new inode. If any of these two conditions is
    true then the S_ISGID bit can be raised for an S_IXGRP file and if not
    it needs to be stripped.
    
    However, there are several key issues with the current implementation:
    
    * S_ISGID stripping logic is entangled with umask stripping.
    
      If a filesystem doesn't support or enable POSIX ACLs then umask
      stripping is done directly in the vfs before calling into the
      filesystem.
      If the filesystem does support POSIX ACLs then unmask stripping may be
      done in the filesystem itself when calling posix_acl_create().
    
      Since umask stripping has an effect on S_ISGID inheritance, e.g., by
      stripping the S_IXGRP bit from the file to be created and all relevant
      filesystems have to call posix_acl_create() before inode_init_owner()
      where we currently take care of S_ISGID handling S_ISGID handling is
      order dependent. IOW, whether or not you get a setgid bit depends on
      POSIX ACLs and umask and in what order they are called.
    
      Note that technically filesystems are free to impose their own
      ordering between posix_acl_create() and inode_init_owner() meaning
      that there's additional ordering issues that influence S_SIGID
      inheritance.
    
    * Filesystems that don't rely on inode_init_owner() don't get S_ISGID
      stripping logic.
    
      While that may be intentional (e.g. network filesystems might just
      defer setgid stripping to a server) it is often just a security issue.
    
    This is not just ugly it's unsustainably messy especially since we do
    still have bugs in this area years after the initial round of setgid
    bugfixes.
    
    So the current state is quite messy and while we won't be able to make
    it completely clean as posix_acl_create() is still a filesystem specific
    call we can improve the S_SIGD stripping situation quite a bit by
    hoisting it out of inode_init_owner() and into the vfs creation
    operations. This means we alleviate the burden for filesystems to handle
    S_ISGID stripping correctly and can standardize the ordering between
    S_ISGID and umask stripping in the vfs.
    
    We add a new helper vfs_prepare_mode() so S_ISGID handling is now done
    in the VFS before umask handling. This has S_ISGID handling is
    unaffected unaffected by whether umask stripping is done by the VFS
    itself (if no POSIX ACLs are supported or enabled) or in the filesystem
    in posix_acl_create() (if POSIX ACLs are supported).
    
    The vfs_prepare_mode() helper is called directly in vfs_*() helpers that
    create new filesystem objects. We need to move them into there to make
    sure that filesystems like overlayfs hat have callchains like:
    
    sys_mknod()
    -> do_mknodat(mode)
       -> .mknod = ovl_mknod(mode)
          -> ovl_create(mode)
             -> vfs_mknod(mode)
    
    get S_ISGID stripping done when calling into lower filesystems via
    vfs_*() creation helpers. Moving vfs_prepare_mode() into e.g.
    vfs_mknod() takes care of that. This is in any case semantically cleaner
    because S_ISGID stripping is VFS security requirement.
    
    Security hooks so far have seen the mode with the umask applied but
    without S_ISGID handling done. The relevant hooks are called outside of
    vfs_*() creation helpers so by calling vfs_prepare_mode() from vfs_*()
    helpers the security hooks would now see the mode without umask
    stripping applied. For now we fix this by passing the mode with umask
    settings applied to not risk any regressions for LSM hooks. IOW, nothing
    changes for LSM hooks. It is worth pointing out that security hooks
    never saw the mode that is seen by the filesystem when actually creating
    the file. They have always been completely misplaced for that to work.
    
    The following filesystems use inode_init_owner() and thus relied on
    S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus,
    hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs,
    reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs.
    
    All of the above filesystems end up calling inode_init_owner() when new
    filesystem objects are created through the ->mkdir(), ->mknod(),
    ->create(), ->tmpfile(), ->rename() inode operations.
    
    Since directories always inherit the S_ISGID bit with the exception of
    xfs when irix_sgid_inherit mode is turned on S_ISGID stripping doesn't
    apply. The ->symlink() and ->link() inode operations trivially inherit
    the mode from the target and the ->rename() inode operation inherits the
    mode from the source inode. All other creation inode operations will get
    S_ISGID handling via vfs_prepare_mode() when called from their relevant
    vfs_*() helpers.
    
    In addition to this there are filesystems which allow the creation of
    filesystem objects through ioctl()s or - in the case of spufs -
    circumventing the vfs in other ways. If filesystem objects are created
    through ioctl()s the vfs doesn't know about it and can't apply regular
    permission checking including S_ISGID logic. Therfore, a filesystem
    relying on S_ISGID stripping in inode_init_owner() in their ioctl()
    callpath will be affected by moving this logic into the vfs. We audited
    those filesystems:
    
    * btrfs allows the creation of filesystem objects through various
      ioctls(). Snapshot creation literally takes a snapshot and so the mode
      is fully preserved and S_ISGID stripping doesn't apply.
    
      Creating a new subvolum relies on inode_init_owner() in
      btrfs_new_subvol_inode() but only creates directories and doesn't
      raise S_ISGID.
    
    * ocfs2 has a peculiar implementation of reflinks. In contrast to e.g.
      xfs and btrfs FICLONE/FICLONERANGE ioctl() that is only concerned with
      the actual extents ocfs2 uses a separate ioctl() that also creates the
      target file.
    
      Iow, ocfs2 circumvents the vfs entirely here and did indeed rely on
      inode_init_owner() to strip the S_ISGID bit. This is the only place
      where a filesystem needs to call mode_strip_sgid() directly but this
      is self-inflicted pain.
    
    * spufs doesn't go through the vfs at all and doesn't use ioctl()s
      either. Instead it has a dedicated system call spufs_create() which
      allows the creation of filesystem objects. But spufs only creates
      directories and doesn't allo S_SIGID bits, i.e. it specifically only
      allows 0777 bits.
    
    * bpf uses vfs_mkobj() but also doesn't allow S_ISGID bits to be created.
    
    The patch will have an effect on ext2 when the EXT2_MOUNT_GRPID mount
    option is used, on ext4 when the EXT4_MOUNT_GRPID mount option is used,
    and on xfs when the XFS_FEAT_GRPID mount option is used. When any of
    these filesystems are mounted with their respective GRPID option then
    newly created files inherit the parent directories group
    unconditionally. In these cases non of the filesystems call
    inode_init_owner() and thus did never strip the S_ISGID bit for newly
    created files. Moving this logic into the VFS means that they now get
    the S_ISGID bit stripped. This is a user visible change. If this leads
    to regressions we will either need to figure out a better way or we need
    to revert. However, given the various setgid bugs that we found just in
    the last two years this is a regression risk we should take.
    
    Associated with this change is a new set of fstests to enforce the
    semantics for all new filesystems.
    
    Link: https://lore.kernel.org/ceph-devel/20220427092201.wvsdjbnc7b4dttaw@wittgenstein [1]
    Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [2]
    Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [3]
    Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [4]
    Link: https://lore.kernel.org/r/1657779088-2242-3-git-send-email-xuyang2018.jy@fujitsu.com
    Suggested-by: Dave Chinner <david@fromorbit.com>
    Suggested-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
    [<brauner@kernel.org>: rewrote commit message]
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fs: move should_remove_suid() [+ + +]
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Mar 7 10:59:19 2023 -0800

    fs: move should_remove_suid()
    
    commit e243e3f94c804ecca9a8241b5babe28f35258ef4 upstream.
    
    Move the helper from inode.c to attr.c. This keeps the the core of the
    set{g,u}id stripping logic in one place when we add follow-up changes.
    It is the better place anyway, since should_remove_suid() returns
    ATTR_KILL_S{G,U}ID flags.
    
    Reviewed-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fs: prevent out-of-bounds array speculation when closing a file descriptor [+ + +]
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Mon Mar 6 13:54:50 2023 -0500

    fs: prevent out-of-bounds array speculation when closing a file descriptor
    
    commit 609d54441493c99f21c1823dfd66fa7f4c512ff4 upstream.
    
    Google-Bug-Id: 114199369
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fs: use consistent setgid checks in is_sxid() [+ + +]
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Mar 7 10:59:22 2023 -0800

    fs: use consistent setgid checks in is_sxid()
    
    commit 8d84e39d76bd83474b26cb44f4b338635676e7e8 upstream.
    
    Now that we made the VFS setgid checking consistent an inode can't be
    marked security irrelevant even if the setgid bit is still set. Make
    this function consistent with all other helpers.
    
    Note that enforcing consistent setgid stripping checks for file
    modification and mode- and ownership changes will cause the setgid bit
    to be lost in more cases than useed to be the case. If an unprivileged
    user wrote to a non-executable setgid file that they don't have
    privilege over the setgid bit will be dropped. This will lead to
    temporary failures in some xfstests until they have been updated.
    
    Reported-by: Miklos Szeredi <miklos@szeredi.hu>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
ice: copy last block omitted in ice_get_module_eeprom() [+ + +]
Author: Petr Oros <poros@redhat.com>
Date:   Wed Mar 1 21:47:07 2023 +0100

    ice: copy last block omitted in ice_get_module_eeprom()
    
    [ Upstream commit 84cba1840e68430325ac133a11be06bfb2f7acd8 ]
    
    ice_get_module_eeprom() is broken since commit e9c9692c8a81 ("ice:
    Reimplement module reads used by ethtool") In this refactor,
    ice_get_module_eeprom() reads the eeprom in blocks of size 8.
    But the condition that should protect the buffer overflow
    ignores the last block. The last block always contains zeros.
    
    Bug uncovered by ethtool upstream commit 9538f384b535
    ("netlink: eeprom: Defer page requests to individual parsers")
    After this commit, ethtool reads a block with length = 1;
    to read the SFF-8024 identifier value.
    
    unpatched driver:
    $ ethtool -m enp65s0f0np0 offset 0x90 length 8
    Offset          Values
    ------          ------
    0x0090:         00 00 00 00 00 00 00 00
    $ ethtool -m enp65s0f0np0 offset 0x90 length 12
    Offset          Values
    ------          ------
    0x0090:         00 00 01 a0 4d 65 6c 6c 00 00 00 00
    $
    
    $ ethtool -m enp65s0f0np0
    Offset          Values
    ------          ------
    0x0000:         11 06 06 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0010:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0020:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0030:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0040:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0050:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0060:         00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 00
    0x0070:         00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    
    patched driver:
    $ ethtool -m enp65s0f0np0 offset 0x90 length 8
    Offset          Values
    ------          ------
    0x0090:         00 00 01 a0 4d 65 6c 6c
    $ ethtool -m enp65s0f0np0 offset 0x90 length 12
    Offset          Values
    ------          ------
    0x0090:         00 00 01 a0 4d 65 6c 6c 61 6e 6f 78
    $ ethtool -m enp65s0f0np0
        Identifier                                : 0x11 (QSFP28)
        Extended identifier                       : 0x00
        Extended identifier description           : 1.5W max. Power consumption
        Extended identifier description           : No CDR in TX, No CDR in RX
        Extended identifier description           : High Power Class (> 3.5 W) not enabled
        Connector                                 : 0x23 (No separable connector)
        Transceiver codes                         : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 40G Ethernet: 40G Base-CR4
        Transceiver type                          : 25G Ethernet: 25G Base-CR CA-N
        Encoding                                  : 0x05 (64B/66B)
        BR, Nominal                               : 25500Mbps
        Rate identifier                           : 0x00
        Length (SMF,km)                           : 0km
        Length (OM3 50um)                         : 0m
        Length (OM2 50um)                         : 0m
        Length (OM1 62.5um)                       : 0m
        Length (Copper or Active cable)           : 1m
        Transmitter technology                    : 0xa0 (Copper cable unequalized)
        Attenuation at 2.5GHz                     : 4db
        Attenuation at 5.0GHz                     : 5db
        Attenuation at 7.0GHz                     : 7db
        Attenuation at 12.9GHz                    : 10db
        ........
        ....
    
    Fixes: e9c9692c8a81 ("ice: Reimplement module reads used by ethtool")
    Signed-off-by: Petr Oros <poros@redhat.com>
    Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
ila: do not generate empty messages in ila_xlat_nl_cmd_get_mapping() [+ + +]
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Feb 27 15:30:24 2023 +0000

    ila: do not generate empty messages in ila_xlat_nl_cmd_get_mapping()
    
    [ Upstream commit 693aa2c0d9b6d5b1f2745d31b6e70d09dbbaf06e ]
    
    ila_xlat_nl_cmd_get_mapping() generates an empty skb,
    triggerring a recent sanity check [1].
    
    Instead, return an error code, so that user space
    can get it.
    
    [1]
    skb_assert_len
    WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 skb_assert_len include/linux/skbuff.h:2527 [inline]
    WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
    Modules linked in:
    CPU: 0 PID: 5923 Comm: syz-executor269 Not tainted 6.2.0-syzkaller-18300-g2ebd1fbb946d #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
    pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : skb_assert_len include/linux/skbuff.h:2527 [inline]
    pc : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
    lr : skb_assert_len include/linux/skbuff.h:2527 [inline]
    lr : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
    sp : ffff80001e0d6c40
    x29: ffff80001e0d6e60 x28: dfff800000000000 x27: ffff0000c86328c0
    x26: dfff800000000000 x25: ffff0000c8632990 x24: ffff0000c8632a00
    x23: 0000000000000000 x22: 1fffe000190c6542 x21: ffff0000c8632a10
    x20: ffff0000c8632a00 x19: ffff80001856e000 x18: ffff80001e0d5fc0
    x17: 0000000000000000 x16: ffff80001235d16c x15: 0000000000000000
    x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000001
    x11: ff80800008353a30 x10: 0000000000000000 x9 : 21567eaf25bfb600
    x8 : 21567eaf25bfb600 x7 : 0000000000000001 x6 : 0000000000000001
    x5 : ffff80001e0d6558 x4 : ffff800015c74760 x3 : ffff800008596744
    x2 : 0000000000000001 x1 : 0000000100000000 x0 : 000000000000000e
    Call trace:
    skb_assert_len include/linux/skbuff.h:2527 [inline]
    __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
    dev_queue_xmit include/linux/netdevice.h:3033 [inline]
    __netlink_deliver_tap_skb net/netlink/af_netlink.c:307 [inline]
    __netlink_deliver_tap+0x45c/0x6f8 net/netlink/af_netlink.c:325
    netlink_deliver_tap+0xf4/0x174 net/netlink/af_netlink.c:338
    __netlink_sendskb net/netlink/af_netlink.c:1283 [inline]
    netlink_sendskb+0x6c/0x154 net/netlink/af_netlink.c:1292
    netlink_unicast+0x334/0x8d4 net/netlink/af_netlink.c:1380
    nlmsg_unicast include/net/netlink.h:1099 [inline]
    genlmsg_unicast include/net/genetlink.h:433 [inline]
    genlmsg_reply include/net/genetlink.h:443 [inline]
    ila_xlat_nl_cmd_get_mapping+0x620/0x7d0 net/ipv6/ila/ila_xlat.c:493
    genl_family_rcv_msg_doit net/netlink/genetlink.c:968 [inline]
    genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
    genl_rcv_msg+0x938/0xc1c net/netlink/genetlink.c:1065
    netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2574
    genl_rcv+0x38/0x50 net/netlink/genetlink.c:1076
    netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
    netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
    netlink_sendmsg+0x800/0xae0 net/netlink/af_netlink.c:1942
    sock_sendmsg_nosec net/socket.c:714 [inline]
    sock_sendmsg net/socket.c:734 [inline]
    ____sys_sendmsg+0x558/0x844 net/socket.c:2479
    ___sys_sendmsg net/socket.c:2533 [inline]
    __sys_sendmsg+0x26c/0x33c net/socket.c:2562
    __do_sys_sendmsg net/socket.c:2571 [inline]
    __se_sys_sendmsg net/socket.c:2569 [inline]
    __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2569
    __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
    invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
    el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
    do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
    el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
    el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
    el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
    irq event stamp: 136484
    hardirqs last enabled at (136483): [<ffff800008350244>] __up_console_sem+0x60/0xb4 kernel/printk/printk.c:345
    hardirqs last disabled at (136484): [<ffff800012358d60>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:405
    softirqs last enabled at (136418): [<ffff800008020ea8>] softirq_handle_end kernel/softirq.c:414 [inline]
    softirqs last enabled at (136418): [<ffff800008020ea8>] __do_softirq+0xd4c/0xfa4 kernel/softirq.c:600
    softirqs last disabled at (136371): [<ffff80000802b4a4>] ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80
    ---[ end trace 0000000000000000 ]---
    skb len=0 headroom=0 headlen=0 tailroom=192
    mac=(0,0) net=(0,-1) trans=-1
    shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
    csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
    hash(0x0 sw=0 l4=0) proto=0x0010 pkttype=6 iif=0
    dev name=nlmon0 feat=0x0000000000005861
    
    Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter [+ + +]
Author: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Date:   Thu Feb 2 08:26:56 2023 +0000

    iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter
    
    [ Upstream commit b6b26d86c61c441144c72f842f7469bb686e1211 ]
    
    The 'acpiid' buffer in the parse_ivrs_acpihid function may overflow,
    because the string specifier in the format string sscanf()
    has no width limitation.
    
    Found by InfoTeCS on behalf of Linux Verification Center
    (linuxtesting.org) with SVACE.
    
    Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter")
    Cc: stable@vger.kernel.org
    Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru>
    Reviewed-by: Kim Phillips <kim.phillips@amd.com>
    Link: https://lore.kernel.org/r/20230202082719.1513849-1-Ilia.Gavrilov@infotecs.ru
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/amd: Add PCI segment support for ivrs_[ioapic/hpet/acpihid] commands [+ + +]
Author: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Date:   Wed Jul 6 17:08:22 2022 +0530

    iommu/amd: Add PCI segment support for ivrs_[ioapic/hpet/acpihid] commands
    
    [ Upstream commit bbe3a106580c21bc883fb0c9fa3da01534392fe8 ]
    
    By default, PCI segment is zero and can be omitted. To support system
    with non-zero PCI segment ID, modify the parsing functions to allow
    PCI segment ID.
    
    Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
    Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
    Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    Link: https://lore.kernel.org/r/20220706113825.25582-33-vasant.hegde@amd.com
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Stable-dep-of: b6b26d86c61c ("iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options [+ + +]
Author: Kim Phillips <kim.phillips@amd.com>
Date:   Mon Sep 19 10:56:38 2022 -0500

    iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options
    
    [ Upstream commit 1198d2316dc4265a97d0e8445a22c7a6d17580a4 ]
    
    Currently, these options cause the following libkmod error:
    
    libkmod: ERROR ../libkmod/libkmod-config.c:489 kcmdline_parse_result: \
            Ignoring bad option on kernel command line while parsing module \
            name: 'ivrs_xxxx[XX:XX'
    
    Fix by introducing a new parameter format for these options and
    throw a warning for the deprecated format.
    
    Users are still allowed to omit the PCI Segment if zero.
    
    Adding a Link: to the reason why we're modding the syntax parsing
    in the driver and not in libkmod.
    
    Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter")
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/linux-modules/20200310082308.14318-2-lucas.demarchi@intel.com/
    Reported-by: Kim Phillips <kim.phillips@amd.com>
    Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    Signed-off-by: Kim Phillips <kim.phillips@amd.com>
    Link: https://lore.kernel.org/r/20220919155638.391481-2-kim.phillips@amd.com
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Stable-dep-of: b6b26d86c61c ("iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
iommu/vt-d: Fix PASID directory pointer coherency [+ + +]
Author: Jacob Pan <jacob.jun.pan@linux.intel.com>
Date:   Thu Feb 16 21:08:15 2023 +0800

    iommu/vt-d: Fix PASID directory pointer coherency
    
    [ Upstream commit 194b3348bdbb7db65375c72f3f774aee4cc6614e ]
    
    On platforms that do not support IOMMU Extended capability bit 0
    Page-walk Coherency, CPU caches are not snooped when IOMMU is accessing
    any translation structures. IOMMU access goes only directly to
    memory. Intel IOMMU code was missing a flush for the PASID table
    directory that resulted in the unrecoverable fault as shown below.
    
    This patch adds clflush calls whenever allocating and updating
    a PASID table directory to ensure cache coherency.
    
    On the reverse direction, there's no need to clflush the PASID directory
    pointer when we deactivate a context entry in that IOMMU hardware will
    not see the old PASID directory pointer after we clear the context entry.
    PASID directory entries are also never freed once allocated.
    
     DMAR: DRHD: handling fault status reg 3
     DMAR: [DMA Read NO_PASID] Request device [00:0d.2] fault addr 0x1026a4000
           [fault reason 0x51] SM: Present bit in Directory Entry is clear
     DMAR: Dump dmar1 table entries for IOVA 0x1026a4000
     DMAR: scalable mode root entry: hi 0x0000000102448001, low 0x0000000101b3e001
     DMAR: context entry: hi 0x0000000000000000, low 0x0000000101b4d401
     DMAR: pasid dir entry: 0x0000000101b4e001
     DMAR: pasid table entry[0]: 0x0000000000000109
     DMAR: pasid table entry[1]: 0x0000000000000001
     DMAR: pasid table entry[2]: 0x0000000000000000
     DMAR: pasid table entry[3]: 0x0000000000000000
     DMAR: pasid table entry[4]: 0x0000000000000000
     DMAR: pasid table entry[5]: 0x0000000000000000
     DMAR: pasid table entry[6]: 0x0000000000000000
     DMAR: pasid table entry[7]: 0x0000000000000000
     DMAR: PTE not present at level 4
    
    Cc: <stable@vger.kernel.org>
    Fixes: 0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables")
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    Reported-by: Sukumar Ghorai <sukumar.ghorai@intel.com>
    Signed-off-by: Ashok Raj <ashok.raj@intel.com>
    Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
    Link: https://lore.kernel.org/r/20230209212843.1788125-1-jacob.jun.pan@linux.intel.com
    Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
Linux: ipmi:ssif: Add a timer between request retries [+ + +]
Author: Corey Minyard <cminyard@mvista.com>
Date:   Wed Jan 25 10:34:47 2023 -0600

    ipmi:ssif: Add a timer between request retries
    
    [ Upstream commit 00bb7e763ec9f384cb382455cb6ba5588b5375cf ]
    
    The IPMI spec has a time (T6) specified between request retries.  Add
    the handling for that.
    
    Reported by: Tony Camuso <tcamuso@redhat.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Corey Minyard <cminyard@mvista.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Linux: ipmi:ssif: Increase the message retry time [+ + +]
Author: Corey Minyard <cminyard@mvista.com>
Date:   Thu Nov 3 15:03:11 2022 -0500

    ipmi:ssif: Increase the message retry time
    
    [ Upstream commit 39721d62bbc16ebc9bb2bdc2c163658f33da3b0b ]
    
    The spec states that the minimum message retry time is 60ms, but it was
    set to 20ms.  Correct it.
    
    Reported by: Tony Camuso <tcamuso@redhat.com>
    Signed-off-by: Corey Minyard <cminyard@mvista.com>
    Stable-dep-of: 00bb7e763ec9 ("ipmi:ssif: Add a timer between request retries")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
irqdomain: Fix mapping-creation race [+ + +]
Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Mon Feb 13 11:42:48 2023 +0100

    irqdomain: Fix mapping-creation race
    
    [ Upstream commit 601363cc08da25747feb87c55573dd54de91d66a ]
    
    Parallel probing of devices that share interrupts (e.g. when a driver
    uses asynchronous probing) can currently result in two mappings for the
    same hardware interrupt to be created due to missing serialisation.
    
    Make sure to hold the irq_domain_mutex when creating mappings so that
    looking for an existing mapping before creating a new one is done
    atomically.
    
    Fixes: 765230b5f084 ("driver-core: add asynchronous probing support for drivers")
    Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings")
    Link: https://lore.kernel.org/r/YuJXMHoT4ijUxnRb@hovoldconsulting.com
    Cc: stable@vger.kernel.org      # 4.8
    Cc: Dmitry Torokhov <dtor@chromium.org>
    Cc: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
    Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20230213104302.17307-7-johan+linaro@kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

irqdomain: Refactor __irq_domain_alloc_irqs() [+ + +]
Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Mon Feb 13 11:42:47 2023 +0100

    irqdomain: Refactor __irq_domain_alloc_irqs()
    
    [ Upstream commit d55f7f4c58c07beb5050a834bf57ae2ede599c7e ]
    
    Refactor __irq_domain_alloc_irqs() so that it can be called internally
    while holding the irq_domain_mutex.
    
    This will be used to fix a shared-interrupt mapping race, hence the
    Fixes tag.
    
    Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings")
    Cc: stable@vger.kernel.org      # 4.8
    Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
    Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20230213104302.17307-6-johan+linaro@kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
KVM: fix memoryleak in kvm_init() [+ + +]
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Tue Aug 23 14:34:14 2022 +0800

    KVM: fix memoryleak in kvm_init()
    
    commit 5a2a961be2ad6a16eb388a80442443b353c11d16 upstream.
    
    When alloc_cpumask_var_node() fails for a certain cpu, there might be some
    allocated cpumasks for percpu cpu_kick_mask. We should free these cpumasks
    or memoryleak will occur.
    
    Fixes: baff59ccdc65 ("KVM: Pre-allocate cpumasks for kvm_make_all_cpus_request_except()")
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Link: https://lore.kernel.org/r/20220823063414.59778-1-linmiaohe@huawei.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: nVMX: Don't use Enlightened MSR Bitmap for L3 [+ + +]
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Mon Nov 29 10:47:01 2021 +0100

    KVM: nVMX: Don't use Enlightened MSR Bitmap for L3
    
    commit 250552b925ce400c17d166422fde9bb215958481 upstream.
    
    When KVM runs as a nested hypervisor on top of Hyper-V it uses Enlightened
    VMCS and enables Enlightened MSR Bitmap feature for its L1s and L2s (which
    are actually L2s and L3s from Hyper-V's perspective). When MSR bitmap is
    updated, KVM has to reset HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP from
    clean fields to make Hyper-V aware of the change. For KVM's L1s, this is
    done in vmx_disable_intercept_for_msr()/vmx_enable_intercept_for_msr().
    MSR bitmap for L2 is build in nested_vmx_prepare_msr_bitmap() by blending
    MSR bitmap for L1 and L1's idea of MSR bitmap for L2. KVM, however, doesn't
    check if the resulting bitmap is different and never cleans
    HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP in eVMCS02. This is incorrect and
    may result in Hyper-V missing the update.
    
    The issue could've been solved by calling evmcs_touch_msr_bitmap() for
    eVMCS02 from nested_vmx_prepare_msr_bitmap() unconditionally but doing so
    would not give any performance benefits (compared to not using Enlightened
    MSR Bitmap at all). 3-level nesting is also not a very common setup
    nowadays.
    
    Don't enable 'Enlightened MSR Bitmap' feature for KVM's L2s (real L3s) for
    now.
    
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Message-Id: <20211129094704.326635-2-vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: Optimize kvm_make_vcpus_request_mask() a bit [+ + +]
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Fri Sep 3 09:51:37 2021 +0200

    KVM: Optimize kvm_make_vcpus_request_mask() a bit
    
    [ Upstream commit ae0946cd3601752dc58f86d84258e5361e9c8cd4 ]
    
    Iterating over set bits in 'vcpu_bitmap' should be faster than going
    through all vCPUs, especially when just a few bits are set.
    
    Drop kvm_make_vcpus_request_mask() call from kvm_make_all_cpus_request_except()
    to avoid handling the special case when 'vcpu_bitmap' is NULL, move the
    code to kvm_make_all_cpus_request_except() itself.
    
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Message-Id: <20210903075141.403071-5-vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Stable-dep-of: 2b0128127373 ("KVM: Register /dev/kvm as the _very_ last thing during initialization")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: Pre-allocate cpumasks for kvm_make_all_cpus_request_except() [+ + +]
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Fri Sep 3 09:51:40 2021 +0200

    KVM: Pre-allocate cpumasks for kvm_make_all_cpus_request_except()
    
    [ Upstream commit baff59ccdc657d290be51b95b38ebe5de40036b4 ]
    
    Allocating cpumask dynamically in zalloc_cpumask_var() is not ideal.
    Allocation is somewhat slow and can (in theory and when CPUMASK_OFFSTACK)
    fail. kvm_make_all_cpus_request_except() already disables preemption so
    we can use pre-allocated per-cpu cpumasks instead.
    
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Message-Id: <20210903075141.403071-8-vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Stable-dep-of: 2b0128127373 ("KVM: Register /dev/kvm as the _very_ last thing during initialization")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: Register /dev/kvm as the _very_ last thing during initialization [+ + +]
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Nov 30 23:08:45 2022 +0000

    KVM: Register /dev/kvm as the _very_ last thing during initialization
    
    [ Upstream commit 2b01281273738bf2d6551da48d65db2df3f28998 ]
    
    Register /dev/kvm, i.e. expose KVM to userspace, only after all other
    setup has completed.  Once /dev/kvm is exposed, userspace can start
    invoking KVM ioctls, creating VMs, etc...  If userspace creates a VM
    before KVM is done with its configuration, bad things may happen, e.g.
    KVM will fail to properly migrate vCPU state if a VM is created before
    KVM has registered preemption notifiers.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20221130230934.1014142-2-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: SVM: Don't rewrite guest ICR on AVIC IPI virtualization failure [+ + +]
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Feb 4 21:41:59 2022 +0000

    KVM: SVM: Don't rewrite guest ICR on AVIC IPI virtualization failure
    
    [ Upstream commit b51818afdc1d3c7cc269e295953685558d3af71c ]
    
    Don't bother rewriting the ICR value into the vAPIC page on an AVIC IPI
    virtualization failure, the access is a trap, i.e. the value has already
    been written to the vAPIC page.  The one caveat is if hardware left the
    BUSY flag set (which appears to happen somewhat arbitrarily), in which
    case go through the "nodecode" APIC-write path in order to clear the BUSY
    flag.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220204214205.3306634-6-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Stable-dep-of: 5aede752a839 ("KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target [+ + +]
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jan 6 01:12:37 2023 +0000

    KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target
    
    [ Upstream commit 5aede752a839904059c2b5d68be0dc4501c6c15f ]
    
    Emulate ICR writes on AVIC IPI failures due to invalid targets using the
    same logic as failures due to invalid types.  AVIC acceleration fails if
    _any_ of the targets are invalid, and crucially VM-Exits before sending
    IPIs to targets that _are_ valid.  In logical mode, the destination is a
    bitmap, i.e. a single IPI can target multiple logical IDs.  Doing nothing
    causes KVM to drop IPIs if at least one target is valid and at least one
    target is invalid.
    
    Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
    Cc: stable@vger.kernel.org
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20230106011306.85230-5-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: VMX: Fix crash due to uninitialized current_vmcs [+ + +]
Author: Alexandru Matei <alexandru.matei@uipath.com>
Date:   Tue Jan 24 00:12:08 2023 +0200

    KVM: VMX: Fix crash due to uninitialized current_vmcs
    
    commit 93827a0a36396f2fd6368a54a020f420c8916e9b upstream.
    
    KVM enables 'Enlightened VMCS' and 'Enlightened MSR Bitmap' when running as
    a nested hypervisor on top of Hyper-V. When MSR bitmap is updated,
    evmcs_touch_msr_bitmap function uses current_vmcs per-cpu variable to mark
    that the msr bitmap was changed.
    
    vmx_vcpu_create() modifies the msr bitmap via vmx_disable_intercept_for_msr
    -> vmx_msr_bitmap_l01_changed which in the end calls this function. The
    function checks for current_vmcs if it is null but the check is
    insufficient because current_vmcs is not initialized. Because of this, the
    code might incorrectly write to the structure pointed by current_vmcs value
    left by another task. Preemption is not disabled, the current task can be
    preempted and moved to another CPU while current_vmcs is accessed multiple
    times from evmcs_touch_msr_bitmap() which leads to crash.
    
    The manipulation of MSR bitmaps by callers happens only for vmcs01 so the
    solution is to use vmx->vmcs01.vmcs instead of current_vmcs.
    
      BUG: kernel NULL pointer dereference, address: 0000000000000338
      PGD 4e1775067 P4D 0
      Oops: 0002 [#1] PREEMPT SMP NOPTI
      ...
      RIP: 0010:vmx_msr_bitmap_l01_changed+0x39/0x50 [kvm_intel]
      ...
      Call Trace:
       vmx_disable_intercept_for_msr+0x36/0x260 [kvm_intel]
       vmx_vcpu_create+0xe6/0x540 [kvm_intel]
       kvm_arch_vcpu_create+0x1d1/0x2e0 [kvm]
       kvm_vm_ioctl_create_vcpu+0x178/0x430 [kvm]
       kvm_vm_ioctl+0x53f/0x790 [kvm]
       __x64_sys_ioctl+0x8a/0xc0
       do_syscall_64+0x5c/0x90
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    Fixes: ceef7d10dfb6 ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support")
    Cc: stable@vger.kernel.org
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
    Link: https://lore.kernel.org/r/20230123221208.4964-1-alexandru.matei@uipath.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    [manual backport: evmcs.h got renamed to hyperv.h in a later
    version, modified in evmcs.h instead]
    Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: VMX: Introduce vmx_msr_bitmap_l01_changed() helper [+ + +]
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Mon Nov 29 10:47:02 2021 +0100

    KVM: VMX: Introduce vmx_msr_bitmap_l01_changed() helper
    
    commit b84155c38076b36d625043a06a2f1c90bde62903 upstream.
    
    In preparation to enabling 'Enlightened MSR Bitmap' feature for Hyper-V
    guests move MSR bitmap update tracking to a dedicated helper.
    
    Note: vmx_msr_bitmap_l01_changed() is called when MSR bitmap might be
    updated. KVM doesn't check if the bit we're trying to set is already set
    (or the bit it's trying to clear is already cleared). Such situations
    should not be common and a few false positives should not be a problem.
    
    No functional change intended.
    
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20211129094704.326635-3-vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
Linux: Linux 5.15.103 [+ + +]
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Fri Mar 17 08:49:05 2023 +0100

    Linux 5.15.103
    
    Link: https://lore.kernel.org/r/20230315115738.951067403@linuxfoundation.org
    Tested-by: Chris Paterson (CIP) <chris.paterson2@renesas.com>
    Tested-by: Florian Fainelli <f.fainelli@gmail.com>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
    Link: https://lore.kernel.org/r/20230316083443.411936182@linuxfoundation.org
    Tested-by: Chris Paterson (CIP) <chris.paterson2@renesas.com>
    Tested-by: Florian Fainelli <f.fainelli@gmail.com>
    Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Tom Saeger <tom.saeger@oracle.com>
    Tested-by: Guenter Roeck <linux@roeck-us.net>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
macintosh: windfarm: Use unsigned type for 1-bit bitfields [+ + +]
Author: Nathan Chancellor <nathan@kernel.org>
Date:   Wed Feb 15 10:12:12 2023 -0700

    macintosh: windfarm: Use unsigned type for 1-bit bitfields
    
    [ Upstream commit 748ea32d2dbd813d3bd958117bde5191182f909a ]
    
    Clang warns:
    
      drivers/macintosh/windfarm_lm75_sensor.c:63:14: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
                      lm->inited = 1;
                                 ^ ~
    
      drivers/macintosh/windfarm_smu_sensors.c:356:19: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
                      pow->fake_volts = 1;
                                      ^ ~
      drivers/macintosh/windfarm_smu_sensors.c:368:18: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
                      pow->quadratic = 1;
                                     ^ ~
    
    There is no bug here since no code checks the actual value of these
    fields, just whether or not they are zero (boolean context), but this
    can be easily fixed by switching to an unsigned type.
    
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20230215-windfarm-wsingle-bit-bitfield-constant-conversion-v1-1-26415072e855@kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
Makefile: use -gdwarf-{4|5} for assembler for DEBUG_INFO_DWARF{4|5} [+ + +]
Author: Nick Desaulniers <ndesaulniers@google.com>
Date:   Wed Mar 15 14:40:59 2023 -0700

    Makefile: use -gdwarf-{4|5} for assembler for DEBUG_INFO_DWARF{4|5}
    
    This is _not_ an upstream commit and just for 5.15.y only. It is based
    on upstream
    commit 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files").
    
    When the user has chosen not to use their compiler's implicit default
    DWARF version (which changes over time) via selecting
    - CONFIG_DEBUG_INFO_DWARF4 or
    - CONFIG_DEBUG_INFO_DWARF5
    we need to tell the compiler this for Asm sources as well as C sources.
    (We use the compiler to drive assembler jobs in kbuild, since most asm
    needs to be preprocessed first).  Otherwise, we will get object files
    built from Asm sources with the compiler's implicit default DWARF
    version.
    
    For example, selecting CONFIG_DEBUG_INFO_DWARF4 would produce a DWARFv5
    vmlinux, since it was a mix of DWARFv4 object files from C sources and
    DWARFv5 object files from Asm sources when using Clang as the assembler
    (ex. `make LLVM=1`).
    
    Fixes: 0ee2f0567a56 ("Makefile.debug: re-enable debug info for .S files")
    Reported-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
    Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
media: ov5640: Fix analogue gain control [+ + +]
Author: Paul Elder <paul.elder@ideasonboard.com>
Date:   Mon Nov 28 09:02:01 2022 +0100

    media: ov5640: Fix analogue gain control
    
    [ Upstream commit afa4805799c1d332980ad23339fdb07b5e0cf7e0 ]
    
    Gain control is badly documented in publicly available (including
    leaked) documentation.
    
    There is an AGC pre-gain in register 0x3a13, expressed as a 6-bit value
    (plus an enable bit in bit 6). The driver hardcodes it to 0x43, which
    one application note states is equal to x1.047. The documentation also
    states that 0x40 is equel to x1.000. The pre-gain thus seems to be
    expressed as in 1/64 increments, and thus ranges from x1.00 to x1.984.
    What the pre-gain does is however unspecified.
    
    There is then an AGC gain limit, in registers 0x3a18 and 0x3a19,
    expressed as a 10-bit "real gain format" value. One application note
    sets it to 0x00f8 and states it is equal to x15.5, so it appears to be
    expressed in 1/16 increments, up to x63.9375.
    
    The manual gain is stored in registers 0x350a and 0x350b, also as a
    10-bit "real gain format" value. It is documented in the application
    note as a Q6.4 values, up to x63.9375.
    
    One version of the datasheet indicates that the sensor supports a
    digital gain:
    
      The OV5640 supports 1/2/4 digital gain. Normally, the gain is
      controlled automatically by the automatic gain control (AGC) block.
    
    It isn't clear how that would be controlled manually.
    
    There appears to be no indication regarding whether the gain controlled
    through registers 0x350a and 0x350b is an analogue gain only or also
    includes digital gain. The words "real gain" don't necessarily mean
    "combined analogue and digital gains". Some OmniVision sensors (such as
    the OV8858) are documented as supoprting different formats for the gain
    values, selectable through a register bit, and they are called "real
    gain format" and "sensor gain format". For that sensor, we have (one of)
    the gain registers documented as
    
      0x3503[2]=0, gain[7:0] is real gain format, where low 4 bits are
      fraction bits, for example, 0x10 is 1x gain, 0x28 is 2.5x gain
    
      If 0x3503[2]=1, gain[7:0] is sensor gain format, gain[7:4] is coarse
      gain, 00000: 1x, 00001: 2x, 00011: 4x, 00111: 8x, gain[7] is 1,
      gain[3:0] is fine gain. For example, 0x10 is 1x gain, 0x30 is 2x gain,
      0x70 is 4x gain
    
    (The second part of the text makes little sense)
    
    "Real gain" may thus refer to the combination of the coarse and fine
    analogue gains as a single value.
    
    The OV5640 0x350a and 0x350b registers thus appear to control analogue
    gain. The driver incorrectly uses V4L2_CID_GAIN as V4L2 has a specific
    control for analogue gain, V4L2_CID_ANALOGUE_GAIN. Use it.
    
    If registers 0x350a and 0x350b are later found to control digital gain
    as well, the driver could then restrict the range of the analogue gain
    control value to lower than x64 and add a separate digital gain control.
    
    Signed-off-by: Paul Elder <paul.elder@ideasonboard.com>
    Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
    Reviewed-by: Jai Luthra <j-luthra@ti.com>
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

media: rc: gpio-ir-recv: add remove function [+ + +]
Author: Li Jun <jun.li@nxp.com>
Date:   Wed Jan 11 10:39:21 2023 +0100

    media: rc: gpio-ir-recv: add remove function
    
    [ Upstream commit 30040818b338b8ebc956ce0ebd198f8d593586a6 ]
    
    In case runtime PM is enabled, do runtime PM clean up to remove
    cpu latency qos request, otherwise driver removal may have below
    kernel dump:
    
    [   19.463299] Unable to handle kernel NULL pointer dereference at
    virtual address 0000000000000048
    [   19.472161] Mem abort info:
    [   19.474985]   ESR = 0x0000000096000004
    [   19.478754]   EC = 0x25: DABT (current EL), IL = 32 bits
    [   19.484081]   SET = 0, FnV = 0
    [   19.487149]   EA = 0, S1PTW = 0
    [   19.490361]   FSC = 0x04: level 0 translation fault
    [   19.495256] Data abort info:
    [   19.498149]   ISV = 0, ISS = 0x00000004
    [   19.501997]   CM = 0, WnR = 0
    [   19.504977] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000049f81000
    [   19.511432] [0000000000000048] pgd=0000000000000000,
    p4d=0000000000000000
    [   19.518245] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
    [   19.524520] Modules linked in: gpio_ir_recv(+) rc_core [last
    unloaded: rc_core]
    [   19.531845] CPU: 0 PID: 445 Comm: insmod Not tainted
    6.2.0-rc1-00028-g2c397a46d47c #72
    [   19.531854] Hardware name: FSL i.MX8MM EVK board (DT)
    [   19.531859] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
    BTYPE=--)
    [   19.551777] pc : cpu_latency_qos_remove_request+0x20/0x110
    [   19.557277] lr : gpio_ir_recv_runtime_suspend+0x18/0x30
    [gpio_ir_recv]
    [   19.557294] sp : ffff800008ce3740
    [   19.557297] x29: ffff800008ce3740 x28: 0000000000000000 x27:
    ffff800008ce3d50
    [   19.574270] x26: ffffc7e3e9cea100 x25: 00000000000f4240 x24:
    ffffc7e3f9ef0e30
    [   19.574284] x23: 0000000000000000 x22: ffff0061803820f4 x21:
    0000000000000008
    [   19.574296] x20: ffffc7e3fa75df30 x19: 0000000000000020 x18:
    ffffffffffffffff
    [   19.588570] x17: 0000000000000000 x16: ffffc7e3f9efab70 x15:
    ffffffffffffffff
    [   19.595712] x14: ffff800008ce37b8 x13: ffff800008ce37aa x12:
    0000000000000001
    [   19.602853] x11: 0000000000000001 x10: ffffcbe3ec0dff87 x9 :
    0000000000000008
    [   19.609991] x8 : 0101010101010101 x7 : 0000000000000000 x6 :
    000000000f0bfe9f
    [   19.624261] x5 : 00ffffffffffffff x4 : 0025ab8e00000000 x3 :
    ffff006180382010
    [   19.631405] x2 : ffffc7e3e9ce8030 x1 : ffffc7e3fc3eb810 x0 :
    0000000000000020
    [   19.638548] Call trace:
    [   19.640995]  cpu_latency_qos_remove_request+0x20/0x110
    [   19.646142]  gpio_ir_recv_runtime_suspend+0x18/0x30 [gpio_ir_recv]
    [   19.652339]  pm_generic_runtime_suspend+0x2c/0x44
    [   19.657055]  __rpm_callback+0x48/0x1dc
    [   19.660807]  rpm_callback+0x6c/0x80
    [   19.664301]  rpm_suspend+0x10c/0x640
    [   19.667880]  rpm_idle+0x250/0x2d0
    [   19.671198]  update_autosuspend+0x38/0xe0
    [   19.675213]  pm_runtime_set_autosuspend_delay+0x40/0x60
    [   19.680442]  gpio_ir_recv_probe+0x1b4/0x21c [gpio_ir_recv]
    [   19.685941]  platform_probe+0x68/0xc0
    [   19.689610]  really_probe+0xc0/0x3dc
    [   19.693189]  __driver_probe_device+0x7c/0x190
    [   19.697550]  driver_probe_device+0x3c/0x110
    [   19.701739]  __driver_attach+0xf4/0x200
    [   19.705578]  bus_for_each_dev+0x70/0xd0
    [   19.709417]  driver_attach+0x24/0x30
    [   19.712998]  bus_add_driver+0x17c/0x240
    [   19.716834]  driver_register+0x78/0x130
    [   19.720676]  __platform_driver_register+0x28/0x34
    [   19.725386]  gpio_ir_recv_driver_init+0x20/0x1000 [gpio_ir_recv]
    [   19.731404]  do_one_initcall+0x44/0x2ac
    [   19.735243]  do_init_module+0x48/0x1d0
    [   19.739003]  load_module+0x19fc/0x2034
    [   19.742759]  __do_sys_finit_module+0xac/0x12c
    [   19.747124]  __arm64_sys_finit_module+0x20/0x30
    [   19.751664]  invoke_syscall+0x48/0x114
    [   19.755420]  el0_svc_common.constprop.0+0xcc/0xec
    [   19.760132]  do_el0_svc+0x38/0xb0
    [   19.763456]  el0_svc+0x2c/0x84
    [   19.766516]  el0t_64_sync_handler+0xf4/0x120
    [   19.770789]  el0t_64_sync+0x190/0x194
    [   19.774460] Code: 910003fd a90153f3 aa0003f3 91204021 (f9401400)
    [   19.780556] ---[ end trace 0000000000000000 ]---
    
    Signed-off-by: Li Jun <jun.li@nxp.com>
    Signed-off-by: Sean Young <sean@mess.org>
    Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
MIPS: Fix a compilation issue [+ + +]
Author: xurui <xurui@kylinos.cn>
Date:   Wed Jan 18 16:59:12 2023 +0800

    MIPS: Fix a compilation issue
    
    [ Upstream commit 109d587a4b4d7ccca2200ab1f808f43ae23e2585 ]
    
    arch/mips/include/asm/mach-rc32434/pci.h:377:
    cc1: error: result of ‘-117440512 << 16’ requires 44 bits to represent, but ‘int’ only has 32 bits [-Werror=shift-overflow=]
    
    All bits in KORINA_STAT are already at the correct position, so there is
    no addtional shift needed.
    
    Signed-off-by: xurui <xurui@kylinos.cn>
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
nbd: use the correct block_device in nbd_bdev_reset [+ + +]
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Mar 30 07:29:03 2022 +0200

    nbd: use the correct block_device in nbd_bdev_reset
    
    [ Upstream commit 2a852a693f8839bb877fc731ffbc9ece3a9c16d7 ]
    
    The bdev parameter to ->ioctl contains the block device that the ioctl
    is called on, which can be the partition.  But the openers check in
    nbd_bdev_reset really needs to check use the whole device, so switch to
    using that.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20220330052917.2566582-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stable-dep-of: e5cfefa97bcc ("block: fix scan partition for exclusively open device again")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
net/smc: fix fallback failed while sendmsg with fastopen [+ + +]
Author: D. Wythe <alibuda@linux.alibaba.com>
Date:   Tue Mar 7 11:23:46 2023 +0800

    net/smc: fix fallback failed while sendmsg with fastopen
    
    [ Upstream commit ce7ca794712f186da99719e8b4e97bd5ddbb04c3 ]
    
    Before determining whether the msg has unsupported options, it has been
    prematurely terminated by the wrong status check.
    
    For the application, the general usages of MSG_FASTOPEN likes
    
    fd = socket(...)
    /* rather than connect */
    sendto(fd, data, len, MSG_FASTOPEN)
    
    Hence, We need to check the flag before state check, because the sock
    state here is always SMC_INIT when applications tries MSG_FASTOPEN.
    Once we found unsupported options, fallback it to TCP.
    
    Fixes: ee9dfbef02d1 ("net/smc: handle sockopts forcing fallback")
    Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
    Signed-off-by: Simon Horman <simon.horman@corigine.com>
    
    v2 -> v1: Optimize code style
    Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
net: caif: Fix use-after-free in cfusbl_device_notify() [+ + +]
Author: Shigeru Yoshida <syoshida@redhat.com>
Date:   Thu Mar 2 01:39:13 2023 +0900

    net: caif: Fix use-after-free in cfusbl_device_notify()
    
    [ Upstream commit 9781e98a97110f5e76999058368b4be76a788484 ]
    
    syzbot reported use-after-free in cfusbl_device_notify() [1].  This
    causes a stack trace like below:
    
    BUG: KASAN: use-after-free in cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138
    Read of size 8 at addr ffff88807ac4e6f0 by task kworker/u4:6/1214
    
    CPU: 0 PID: 1214 Comm: kworker/u4:6 Not tainted 5.19.0-rc3-syzkaller-00146-g92f20ff72066 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: netns cleanup_net
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
     print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
     print_report mm/kasan/report.c:429 [inline]
     kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
     cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138
     notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
     call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1945
     call_netdevice_notifiers_extack net/core/dev.c:1983 [inline]
     call_netdevice_notifiers net/core/dev.c:1997 [inline]
     netdev_wait_allrefs_any net/core/dev.c:10227 [inline]
     netdev_run_todo+0xbc0/0x10f0 net/core/dev.c:10341
     default_device_exit_batch+0x44e/0x590 net/core/dev.c:11334
     ops_exit_list+0x125/0x170 net/core/net_namespace.c:167
     cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594
     process_one_work+0x996/0x1610 kernel/workqueue.c:2289
     worker_thread+0x665/0x1080 kernel/workqueue.c:2436
     kthread+0x2e9/0x3a0 kernel/kthread.c:376
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
     </TASK>
    
    When unregistering a net device, unregister_netdevice_many_notify()
    sets the device's reg_state to NETREG_UNREGISTERING, calls notifiers
    with NETDEV_UNREGISTER, and adds the device to the todo list.
    
    Later on, devices in the todo list are processed by netdev_run_todo().
    netdev_run_todo() waits devices' reference count become 1 while
    rebdoadcasting NETDEV_UNREGISTER notification.
    
    When cfusbl_device_notify() is called with NETDEV_UNREGISTER multiple
    times, the parent device might be freed.  This could cause UAF.
    Processing NETDEV_UNREGISTER multiple times also causes inbalance of
    reference count for the module.
    
    This patch fixes the issue by accepting only first NETDEV_UNREGISTER
    notification.
    
    Fixes: 7ad65bf68d70 ("caif: Add support for CAIF over CDC NCM USB interface")
    CC: sjur.brandeland@stericsson.com <sjur.brandeland@stericsson.com>
    Reported-by: syzbot+b563d33852b893653a9e@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?id=c3bfd8e2450adab3bffe4d80821fbbced600407f [1]
    Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
    Link: https://lore.kernel.org/r/20230301163913.391304-1-syoshida@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC [+ + +]
Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Tue Mar 7 17:54:11 2023 +0200

    net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
    
    [ Upstream commit c8b8a3c601f2cfad25ab5ce5b04df700048aef6e ]
    
    The MT7530 switch from the MT7621 SoC has 2 ports which can be set up as
    internal: port 5 and 6. Arınç reports that the GMAC1 attached to port 5
    receives corrupted frames, unless port 6 (attached to GMAC0) has been
    brought up by the driver. This is true regardless of whether port 5 is
    used as a user port or as a CPU port (carrying DSA tags).
    
    Offline debugging (blind for me) which began in the linked thread showed
    experimentally that the configuration done by the driver for port 6
    contains a step which is needed by port 5 as well - the write to
    CORE_GSWPLL_GRP2 (note that I've no idea as to what it does, apart from
    the comment "Set core clock into 500Mhz"). Prints put by Arınç show that
    the reset value of CORE_GSWPLL_GRP2 is RG_GSWPLL_POSDIV_500M(1) |
    RG_GSWPLL_FBKDIV_500M(40) (0x128), both on the MCM MT7530 from the
    MT7621 SoC, as well as on the standalone MT7530 from MT7623NI Bananapi
    BPI-R2. Apparently, port 5 on the standalone MT7530 can work under both
    values of the register, while on the MT7621 SoC it cannot.
    
    The call path that triggers the register write is:
    
    mt753x_phylink_mac_config() for port 6
    -> mt753x_pad_setup()
       -> mt7530_pad_clk_setup()
    
    so this fully explains the behavior noticed by Arınç, that bringing port
    6 up is necessary.
    
    The simplest fix for the problem is to extract the register writes which
    are needed for both port 5 and 6 into a common mt7530_pll_setup()
    function, which is called at mt7530_setup() time, immediately after
    switch reset. We can argue that this mirrors the code layout introduced
    in mt7531_setup() by commit 42bc4fafe359 ("net: mt7531: only do PLL once
    after the reset"), in that the PLL setup has the exact same positioning,
    and further work to consolidate the separate setup() functions is not
    hindered.
    
    Testing confirms that:
    
    - the slight reordering of writes to MT7530_P6ECR and to
      CORE_GSWPLL_GRP1 / CORE_GSWPLL_GRP2 introduced by this change does not
      appear to cause problems for the operation of port 6 on MT7621 and on
      MT7623 (where port 5 also always worked)
    
    - packets sent through port 5 are not corrupted anymore, regardless of
      whether port 6 is enabled by phylink or not (or even present in the
      device tree)
    
    My algorithm for determining the Fixes: tag is as follows. Testing shows
    that some logic from mt7530_pad_clk_setup() is needed even for port 5.
    Prior to commit ca366d6c889b ("net: dsa: mt7530: Convert to PHYLINK
    API"), a call did exist for all phy_is_pseudo_fixed_link() ports - so
    port 5 included. That commit replaced it with a temporary "Port 5 is not
    supported!" comment, and the following commit 38f790a80560 ("net: dsa:
    mt7530: Add support for port 5") replaced that comment with a
    configuration procedure in mt7530_setup_port5() which was insufficient
    for port 5 to work. I'm laying the blame on the patch that claimed
    support for port 5, although one would have also needed the change from
    commit c3b8e07909db ("net: dsa: mt7530: setup core clock even in TRGMII
    mode") for the write to be performed completely independently from port
    6's configuration.
    
    Thanks go to Arınç for describing the problem, for debugging and for
    testing.
    
    Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
    Link: https://lore.kernel.org/netdev/f297c2c4-6e7c-57ac-2394-f6025d309b9d@arinc9.com/
    Fixes: 38f790a80560 ("net: dsa: mt7530: Add support for port 5")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Link: https://lore.kernel.org/r/20230307155411.868573-1-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ethernet: mtk_eth_soc: fix RX data corruption issue [+ + +]
Author: Daniel Golle <daniel@makrotopia.org>
Date:   Sat Mar 4 13:43:20 2023 +0000

    net: ethernet: mtk_eth_soc: fix RX data corruption issue
    
    [ Upstream commit 193250ace270fecd586dd2d0dfbd9cbd2ade977f ]
    
    Fix data corruption issue with SerDes connected PHYs operating at 1.25
    Gbps speed where we could previously observe about 30% packet loss while
    the bad packet counter was increasing.
    
    As almost all boards with MediaTek MT7622 or MT7986 use either the MT7531
    switch IC operating at 3.125Gbps SerDes rate or single-port PHYs using
    rate-adaptation to 2500Base-X mode, this issue only got exposed now when
    we started trying to use SFP modules operating with 1.25 Gbps with the
    BananaPi R3 board.
    
    The fix is to set bit 12 which disables the RX FIFO clear function when
    setting up MAC MCR, MediaTek SDK did the same change stating:
    "If without this patch, kernel might receive invalid packets that are
    corrupted by GMAC."[1]
    
    [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63
    
    Fixes: 42c03844e93d ("net-next: mediatek: add support for MediaTek MT7622 SoC")
    Tested-by: Bjørn Mork <bjorn@mork.no>
    Signed-off-by: Daniel Golle <daniel@makrotopia.org>
    Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Link: https://lore.kernel.org/r/138da2735f92c8b6f8578ec2e5a794ee515b665f.1677937317.git.daniel@makrotopia.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: lan78xx: fix accessing the LAN7800's internal phy specific registers from the MAC driver [+ + +]
Author: Yuiko Oshino <yuiko.oshino@microchip.com>
Date:   Wed Mar 1 08:43:07 2023 -0700

    net: lan78xx: fix accessing the LAN7800's internal phy specific registers from the MAC driver
    
    [ Upstream commit e57cf3639c323eeed05d3725fd82f91b349adca8 ]
    
    Move the LAN7800 internal phy (phy ID  0x0007c132) specific register
    accesses to the phy driver (microchip.c).
    
    Fix the error reported by Enguerrand de Ribaucourt in December 2022,
    "Some operations during the cable switch workaround modify the register
    LAN88XX_INT_MASK of the PHY. However, this register is specific to the
    LAN8835 PHY. For instance, if a DP8322I PHY is connected to the LAN7801,
    that register (0x19), corresponds to the LED and MAC address
    configuration, resulting in unapropriate behavior."
    
    I did not test with the DP8322I PHY, but I tested with an EVB-LAN7800
    with the internal PHY.
    
    Fixes: 14437e3fa284 ("lan78xx: workaround of forced 100 Full/Half duplex mode error")
    Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Link: https://lore.kernel.org/r/20230301154307.30438-1-yuiko.oshino@microchip.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: phy: smsc: Cache interrupt mask [+ + +]
Author: Lukas Wunner <lukas@wunner.de>
Date:   Thu May 12 10:42:06 2022 +0200

    net: phy: smsc: Cache interrupt mask
    
    [ Upstream commit 7e8b617eb93f9fcaedac02cd19edcad31c767386 ]
    
    Cache the interrupt mask to avoid re-reading it from the PHY upon every
    interrupt.
    
    This will simplify a subsequent commit which detects hot-removal in the
    interrupt handler and bails out.
    
    Analyzing and debugging PHY transactions also becomes simpler if such
    redundant reads are avoided.
    
    Last not least, interrupt overhead and latency is slightly improved.
    
    Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
    Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
    Signed-off-by: Lukas Wunner <lukas@wunner.de>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 58aac3a2ef41 ("net: phy: smsc: fix link up detection in forced irq mode")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: phy: smsc: fix link up detection in forced irq mode [+ + +]
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Sat Mar 4 11:52:44 2023 +0100

    net: phy: smsc: fix link up detection in forced irq mode
    
    [ Upstream commit 58aac3a2ef414fea6d7fdf823ea177744a087d13 ]
    
    Currently link up can't be detected in forced mode if polling
    isn't used. Only link up interrupt source we have is aneg
    complete which isn't applicable in forced mode. Therefore we
    have to use energy-on as link up indicator.
    
    Fixes: 7365494550f6 ("net: phy: smsc: skip ENERGYON interrupt if disabled")
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: phylib: get rid of unnecessary locking [+ + +]
Author: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Date:   Fri Mar 3 16:37:54 2023 +0000

    net: phylib: get rid of unnecessary locking
    
    [ Upstream commit f4b47a2e9463950df3e7c8b70e017877c1d4eb11 ]
    
    The locking in phy_probe() and phy_remove() does very little to prevent
    any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA
    warnings. Remove it.
    
    ======================================================
    WARNING: possible circular locking dependency detected
    6.2.0-dirty #1108 Tainted: G        W   E
    ------------------------------------------------------
    ip/415 is trying to acquire lock:
    ffff5c268f81ef50 (&dev->lock){+.+.}-{3:3}, at: phy_attach_direct+0x17c/0x3a0 [libphy]
    
    but task is already holding lock:
    ffffaef6496cb518 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x154/0x560
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #1 (rtnl_mutex){+.+.}-{3:3}:
           __lock_acquire+0x35c/0x6c0
           lock_acquire.part.0+0xcc/0x220
           lock_acquire+0x68/0x84
           __mutex_lock+0x8c/0x414
           mutex_lock_nested+0x34/0x40
           rtnl_lock+0x24/0x30
           sfp_bus_add_upstream+0x34/0x150
           phy_sfp_probe+0x4c/0x94 [libphy]
           mv3310_probe+0x148/0x184 [marvell10g]
           phy_probe+0x8c/0x200 [libphy]
           call_driver_probe+0xbc/0x15c
           really_probe+0xc0/0x320
           __driver_probe_device+0x84/0x120
           driver_probe_device+0x44/0x120
           __device_attach_driver+0xc4/0x160
           bus_for_each_drv+0x80/0xe0
           __device_attach+0xb0/0x1f0
           device_initial_probe+0x1c/0x2c
           bus_probe_device+0xa4/0xb0
           device_add+0x360/0x53c
           phy_device_register+0x60/0xa4 [libphy]
           fwnode_mdiobus_phy_device_register+0xc0/0x190 [fwnode_mdio]
           fwnode_mdiobus_register_phy+0x160/0xd80 [fwnode_mdio]
           of_mdiobus_register+0x140/0x340 [of_mdio]
           orion_mdio_probe+0x298/0x3c0 [mvmdio]
           platform_probe+0x70/0xe0
           call_driver_probe+0x34/0x15c
           really_probe+0xc0/0x320
           __driver_probe_device+0x84/0x120
           driver_probe_device+0x44/0x120
           __driver_attach+0x104/0x210
           bus_for_each_dev+0x78/0xdc
           driver_attach+0x2c/0x3c
           bus_add_driver+0x184/0x240
           driver_register+0x80/0x13c
           __platform_driver_register+0x30/0x3c
           xt_compat_calc_jump+0x28/0xa4 [x_tables]
           do_one_initcall+0x50/0x1b0
           do_init_module+0x50/0x1fc
           load_module+0x684/0x744
           __do_sys_finit_module+0xc4/0x140
           __arm64_sys_finit_module+0x28/0x34
           invoke_syscall+0x50/0x120
           el0_svc_common.constprop.0+0x6c/0x1b0
           do_el0_svc+0x34/0x44
           el0_svc+0x48/0xf0
           el0t_64_sync_handler+0xb8/0xc0
           el0t_64_sync+0x1a0/0x1a4
    
    -> #0 (&dev->lock){+.+.}-{3:3}:
           check_prev_add+0xb4/0xc80
           validate_chain+0x414/0x47c
           __lock_acquire+0x35c/0x6c0
           lock_acquire.part.0+0xcc/0x220
           lock_acquire+0x68/0x84
           __mutex_lock+0x8c/0x414
           mutex_lock_nested+0x34/0x40
           phy_attach_direct+0x17c/0x3a0 [libphy]
           phylink_fwnode_phy_connect.part.0+0x70/0xe4 [phylink]
           phylink_fwnode_phy_connect+0x48/0x60 [phylink]
           mvpp2_open+0xec/0x2e0 [mvpp2]
           __dev_open+0x104/0x214
           __dev_change_flags+0x1d4/0x254
           dev_change_flags+0x2c/0x7c
           do_setlink+0x254/0xa50
           __rtnl_newlink+0x430/0x514
           rtnl_newlink+0x58/0x8c
           rtnetlink_rcv_msg+0x17c/0x560
           netlink_rcv_skb+0x64/0x150
           rtnetlink_rcv+0x20/0x30
           netlink_unicast+0x1d4/0x2b4
           netlink_sendmsg+0x1a4/0x400
           ____sys_sendmsg+0x228/0x290
           ___sys_sendmsg+0x88/0xec
           __sys_sendmsg+0x70/0xd0
           __arm64_sys_sendmsg+0x2c/0x40
           invoke_syscall+0x50/0x120
           el0_svc_common.constprop.0+0x6c/0x1b0
           do_el0_svc+0x34/0x44
           el0_svc+0x48/0xf0
           el0t_64_sync_handler+0xb8/0xc0
           el0t_64_sync+0x1a0/0x1a4
    
    other info that might help us debug this:
    
     Possible unsafe locking scenario:
    
           CPU0                    CPU1
           ----                    ----
      lock(rtnl_mutex);
                                   lock(&dev->lock);
                                   lock(rtnl_mutex);
      lock(&dev->lock);
    
     *** DEADLOCK ***
    
    Fixes: 298e54fa810e ("net: phy: add core phylib sfp support")
    Reported-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: stmmac: add to set device wake up flag when stmmac init phy [+ + +]
Author: Rongguang Wei <weirongguang@kylinos.cn>
Date:   Thu Mar 2 14:21:43 2023 +0800

    net: stmmac: add to set device wake up flag when stmmac init phy
    
    [ Upstream commit a9334b702a03b693f54ebd3b98f67bf722b74870 ]
    
    When MAC is not support PMT, driver will check PHY's WoL capability
    and set device wakeup capability in stmmac_init_phy(). We can enable
    the WoL through ethtool, the driver would enable the device wake up
    flag. Now the device_may_wakeup() return true.
    
    But if there is a way which enable the PHY's WoL capability derectly,
    like in BIOS. The driver would not know the enable thing and would not
    set the device wake up flag. The phy_suspend may failed like this:
    
    [   32.409063] PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x50 returns -16
    [   32.409065] PM: Device stmmac-1:00 failed to suspend: error -16
    [   32.409067] PM: Some devices failed to suspend, or early wake event detected
    
    Add to set the device wakeup enable flag according to the get_wol
    function result in PHY can fix the error in this scene.
    
    v2: add a Fixes tag.
    
    Fixes: 1d8e5b0f3f2c ("net: stmmac: Support WOL with phy")
    Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
netfilter: conntrack: adopt safer max chain length [+ + +]
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Mar 7 05:22:54 2023 +0000

    netfilter: conntrack: adopt safer max chain length
    
    [ Upstream commit c77737b736ceb50fdf150434347dbd81ec76dbb1 ]
    
    Customers using GKE 1.25 and 1.26 are facing conntrack issues
    root caused to commit c9c3b6811f74 ("netfilter: conntrack: make
    max chain length random").
    
    Even if we assume Uniform Hashing, a bucket often reachs 8 chained
    items while the load factor of the hash table is smaller than 0.5
    
    With a limit of 16, we reach load factors of 3.
    With a limit of 32, we reach load factors of 11.
    With a limit of 40, we reach load factors of 15.
    With a limit of 50, we reach load factors of 24.
    
    This patch changes MIN_CHAINLEN to 50, to minimize risks.
    
    Ideally, we could in the future add a cushion based on expected
    load factor (2 * nf_conntrack_max / nf_conntrack_buckets),
    because some setups might expect unusual values.
    
    Fixes: c9c3b6811f74 ("netfilter: conntrack: make max chain length random")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: ctnetlink: revert to dumping mark regardless of event type [+ + +]
Author: Ivan Delalande <colona@arista.com>
Date:   Thu Mar 2 17:48:31 2023 -0800

    netfilter: ctnetlink: revert to dumping mark regardless of event type
    
    [ Upstream commit 9f7dd42f0db1dc6915a52d4a8a96ca18dd8cc34e ]
    
    It seems that change was unintentional, we have userspace code that
    needs the mark while listening for events like REPLY, DESTROY, etc.
    Also include 0-marks in requested dumps, as they were before that fix.
    
    Fixes: 1feeae071507 ("netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark")
    Signed-off-by: Ivan Delalande <colona@arista.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: tproxy: fix deadlock due to missing BH disable [+ + +]
Author: Florian Westphal <fw@strlen.de>
Date:   Fri Mar 3 10:58:56 2023 +0100

    netfilter: tproxy: fix deadlock due to missing BH disable
    
    [ Upstream commit 4a02426787bf024dafdb79b362285ee325de3f5e ]
    
    The xtables packet traverser performs an unconditional local_bh_disable(),
    but the nf_tables evaluation loop does not.
    
    Functions that are called from either xtables or nftables must assume
    that they can be called in process context.
    
    inet_twsk_deschedule_put() assumes that no softirq interrupt can occur.
    If tproxy is used from nf_tables its possible that we'll deadlock
    trying to aquire a lock already held in process context.
    
    Add a small helper that takes care of this and use it.
    
    Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/
    Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support")
    Reported-and-tested-by: Major Dávid <major.david@balasys.hu>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
nfc: change order inside nfc_se_io error path [+ + +]
Author: Fedor Pchelkin <pchelkin@ispras.ru>
Date:   Tue Mar 7 00:26:50 2023 +0300

    nfc: change order inside nfc_se_io error path
    
    commit 7d834b4d1ab66c48e8c0810fdeadaabb80fa2c81 upstream.
    
    cb_context should be freed on the error path in nfc_se_io as stated by
    commit 25ff6f8a5a3b ("nfc: fix memory leak of se_io context in
    nfc_genl_se_io").
    
    Make the error path in nfc_se_io unwind everything in reverse order, i.e.
    free the cb_context after unlocking the device.
    
    Suggested-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
    Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Link: https://lore.kernel.org/r/20230306212650.230322-1-pchelkin@ispras.ru
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: fdp: add null check of devm_kmalloc_array in fdp_nci_i2c_read_device_properties [+ + +]
Author: Kang Chen <void0red@gmail.com>
Date:   Mon Feb 27 17:30:37 2023 +0800

    nfc: fdp: add null check of devm_kmalloc_array in fdp_nci_i2c_read_device_properties
    
    [ Upstream commit 11f180a5d62a51b484e9648f9b310e1bd50b1a57 ]
    
    devm_kmalloc_array may fails, *fw_vsc_cfg might be null and cause
    out-of-bounds write in device_property_read_u8_array later.
    
    Fixes: a06347c04c13 ("NFC: Add Intel Fields Peak NFC solution driver")
    Signed-off-by: Kang Chen <void0red@gmail.com>
    Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Link: https://lore.kernel.org/r/20230227093037.907654-1-void0red@gmail.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
octeontx2-af: Unlock contexts in the queue context cache in case of fault detection [+ + +]
Author: Suman Ghosh <sumang@marvell.com>
Date:   Tue Mar 7 16:19:08 2023 +0530

    octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
    
    [ Upstream commit ea9dd2e5c6d12c8b65ce7514c8359a70eeaa0e70 ]
    
    NDC caches contexts of frequently used queue's (Rx and Tx queues)
    contexts. Due to a HW errata when NDC detects fault/poision while
    accessing contexts it could go into an illegal state where a cache
    line could get locked forever. To makesure all cache lines in NDC
    are available for optimum performance upon fault/lockerror/posion
    errors scan through all cache lines in NDC and clear the lock bit.
    
    Fixes: 4a3581cd5995 ("octeontx2-af: NPA AQ instruction enqueue support")
    Signed-off-by: Suman Ghosh <sumang@marvell.com>
    Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
    Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
PCI: Add SolidRun vendor ID [+ + +]
Author: Alvaro Karsz <alvaro.karsz@solid-run.com>
Date:   Tue Jan 10 18:56:36 2023 +0200

    PCI: Add SolidRun vendor ID
    
    [ Upstream commit db6c4dee4c104f50ed163af71c53bfdb878a8318 ]
    
    Add SolidRun vendor ID to pci_ids.h
    
    The vendor ID is used in 2 different source files, the SNET vDPA driver
    and PCI quirks.
    
    Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    Message-Id: <20230110165638.123745-2-alvaro.karsz@solid-run.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
perf inject: Fix --buildid-all not to eat up MMAP2 [+ + +]
Author: Namhyung Kim <namhyung@kernel.org>
Date:   Wed Feb 22 23:01:55 2023 -0800

    perf inject: Fix --buildid-all not to eat up MMAP2
    
    commit ce9f1c05d2edfa6cdf2c1a510495d333e11810a8 upstream.
    
    When MMAP2 has the PERF_RECORD_MISC_MMAP_BUILD_ID flag, it means the
    record already has the build-id info.  So it marks the DSO as hit, to
    skip if the same DSO is not processed if it happens to miss the build-id
    later.
    
    But it missed to copy the MMAP2 record itself so it'd fail to symbolize
    samples for those regions.
    
    For example, the following generates 249 MMAP2 events.
    
      $ perf record --buildid-mmap -o- true | perf report --stat -i- | grep MMAP2
               MMAP2 events:        249  (86.8%)
    
    Adding perf inject should not change the number of events like this
    
      $ perf record --buildid-mmap -o- true | perf inject -b | \
      > perf report --stat -i- | grep MMAP2
               MMAP2 events:        249  (86.5%)
    
    But when --buildid-all is used, it eats most of the MMAP2 events.
    
      $ perf record --buildid-mmap -o- true | perf inject -b --buildid-all | \
      > perf report --stat -i- | grep MMAP2
               MMAP2 events:          1  ( 2.5%)
    
    With this patch, it shows the original number now.
    
      $ perf record --buildid-mmap -o- true | perf inject -b --buildid-all | \
      > perf report --stat -i- | grep MMAP2
               MMAP2 events:        249  (86.5%)
    
    Committer testing:
    
    Before:
    
      $ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf inject -b | perf report --stat -i- | grep MMAP2
               MMAP2 events:         58  (36.2%)
      $ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf report --stat -i- | grep MMAP2
               MMAP2 events:         58  (36.2%)
      $ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf inject -b --buildid-all | perf report --stat -i- | grep MMAP2
               MMAP2 events:          2  ( 1.9%)
      $
    
    After:
    
      $ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf inject -b | perf report --stat -i- | grep MMAP2
               MMAP2 events:         58  (29.3%)
      $ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf report --stat -i- | grep MMAP2
               MMAP2 events:         58  (34.3%)
      $ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf inject -b --buildid-all | perf report --stat -i- | grep MMAP2
               MMAP2 events:         58  (38.4%)
      $
    
    Fixes: f7fc0d1c915a74ff ("perf inject: Do not inject BUILD_ID record if MMAP2 has it")
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20230223070155.54251-1-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
perf stat: Fix counting when initial delay configured [+ + +]
Author: Changbin Du <changbin.du@huawei.com>
Date:   Thu Mar 2 11:11:44 2023 +0800

    perf stat: Fix counting when initial delay configured
    
    [ Upstream commit 25f69c69bc3ca8c781a94473f28d443d745768e3 ]
    
    When creating counters with initial delay configured, the enable_on_exec
    field is not set. So we need to enable the counters later. The problem
    is, when a workload is specified the target__none() is true. So we also
    need to check stat_config.initial_delay.
    
    In this change, we add a new field 'initial_delay' for struct target
    which could be shared by other subcommands. And define
    target__enable_on_exec() which returns whether enable_on_exec should be
    set on normal cases.
    
    Before this fix the event is not counted:
    
      $ ./perf stat -e instructions -D 100 sleep 2
      Events disabled
      Events enabled
    
       Performance counter stats for 'sleep 2':
    
           <not counted>      instructions
    
             1.901661124 seconds time elapsed
    
             0.001602000 seconds user
             0.000000000 seconds sys
    
    After fix it works:
    
      $ ./perf stat -e instructions -D 100 sleep 2
      Events disabled
      Events enabled
    
       Performance counter stats for 'sleep 2':
    
                 404,214      instructions
    
             1.901743475 seconds time elapsed
    
             0.001617000 seconds user
             0.000000000 seconds sys
    
    Fixes: c587e77e100fa40e ("perf stat: Do not delay the workload with --delay")
    Signed-off-by: Changbin Du <changbin.du@huawei.com>
    Acked-by: Namhyung Kim <namhyung@kernel.org>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Hui Wang <hw.huiwang@huawei.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20230302031146.2801588-2-changbin.du@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
platform: x86: MLX_PLATFORM: select REGMAP instead of depending on it [+ + +]
Author: Randy Dunlap <rdunlap@infradead.org>
Date:   Sat Feb 25 21:39:51 2023 -0800

    platform: x86: MLX_PLATFORM: select REGMAP instead of depending on it
    
    [ Upstream commit 7e7e1541c91615e9950d0b96bcd1806d297e970e ]
    
    REGMAP is a hidden (not user visible) symbol. Users cannot set it
    directly thru "make *config", so drivers should select it instead of
    depending on it if they need it.
    
    Consistently using "select" or "depends on" can also help reduce
    Kconfig circular dependency issues.
    
    Therefore, change the use of "depends on REGMAP" to "select REGMAP".
    
    Fixes: ef0f62264b2a ("platform/x86: mlx-platform: Add physical bus number auto detection")
    Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
    Cc: Vadim Pasternak <vadimp@mellanox.com>
    Cc: Darren Hart <dvhart@infradead.org>
    Cc: Hans de Goede <hdegoede@redhat.com>
    Cc: Mark Gross <markgross@kernel.org>
    Cc: platform-driver-x86@vger.kernel.org
    Link: https://lore.kernel.org/r/20230226053953.4681-7-rdunlap@infradead.org
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
powerpc/iommu: fix memory leak with using debugfs_lookup() [+ + +]
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Thu Feb 2 15:19:19 2023 +0100

    powerpc/iommu: fix memory leak with using debugfs_lookup()
    
    [ Upstream commit b505063910c134778202dfad9332dfcecb76bab3 ]
    
    When calling debugfs_lookup() the result must have dput() called on it,
    otherwise the memory will leak over time.  To make things simpler, just
    call debugfs_lookup_and_remove() instead which handles all of the logic
    at once.
    
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20230202141919.2298821-1-gregkh@linuxfoundation.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
powerpc/kcsan: Exclude udelay to prevent recursive instrumentation [+ + +]
Author: Rohan McLure <rmclure@linux.ibm.com>
Date:   Mon Feb 6 13:17:58 2023 +1100

    powerpc/kcsan: Exclude udelay to prevent recursive instrumentation
    
    [ Upstream commit 2a7ce82dc46c591c9244057d89a6591c9639b9b9 ]
    
    In order for KCSAN to increase its likelihood of observing a data race,
    it sets a watchpoint on memory accesses and stalls, allowing for
    detection of conflicting accesses by other kernel threads or interrupts.
    
    Stalls are implemented by injecting a call to udelay in instrumented code.
    To prevent recursive instrumentation, exclude udelay from being instrumented.
    
    Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20230206021801.105268-3-rmclure@linux.ibm.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT [+ + +]
Author: Michael Ellerman <mpe@ellerman.id.au>
Date:   Wed Mar 1 19:03:49 2023 -0700

    powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT
    
    commit 4b9880dbf3bdba3a7c56445137c3d0e30aaa0a40 upstream.
    
    The powerpc linker script explicitly includes .exit.text, because
    otherwise the link fails due to references from __bug_table and
    __ex_table. The code is freed (discarded) at runtime along with
    .init.text and data.
    
    That has worked in the past despite powerpc not defining
    RUNTIME_DISCARD_EXIT because DISCARDS appears late in the powerpc linker
    script (line 410), and the explicit inclusion of .exit.text
    earlier (line 280) supersedes the discard.
    
    However commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and
    riscv") introduced an earlier use of DISCARD as part of the RO_DATA
    macro (line 136). With binutils < 2.36 that causes the DISCARD
    directives later in the script to be applied earlier [1], causing
    .exit.text to actually be discarded at link time, leading to build
    errors:
    
      '.exit.text' referenced in section '__bug_table' of crypto/algboss.o: defined in
      discarded section '.exit.text' of crypto/algboss.o
      '.exit.text' referenced in section '__ex_table' of drivers/nvdimm/core.o: defined in
      discarded section '.exit.text' of drivers/nvdimm/core.o
    
    Fix it by defining RUNTIME_DISCARD_EXIT, which causes the generic
    DISCARDS macro to not include .exit.text at all.
    
    1: https://lore.kernel.org/lkml/87fscp2v7k.fsf@igel.home/
    
    Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20230105132349.384666-1-mpe@ellerman.id.au
    Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/vmlinux.lds: Don't discard .rela* for relocatable builds [+ + +]
Author: Michael Ellerman <mpe@ellerman.id.au>
Date:   Wed Mar 1 19:03:50 2023 -0700

    powerpc/vmlinux.lds: Don't discard .rela* for relocatable builds
    
    commit 07b050f9290ee012a407a0f64151db902a1520f5 upstream.
    
    Relocatable kernels must not discard relocations, they need to be
    processed at runtime. As such they are included for CONFIG_RELOCATABLE
    builds in the powerpc linker script (line 340).
    
    However they are also unconditionally discarded later in the
    script (line 414). Previously that worked because the earlier inclusion
    superseded the discard.
    
    However commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and
    riscv") introduced an earlier use of DISCARD as part of the RO_DATA
    macro (line 137). With binutils < 2.36 that causes the DISCARD
    directives later in the script to be applied earlier, causing .rela* to
    actually be discarded at link time, leading to build warnings and a
    kernel that doesn't boot:
    
      ld: warning: discarding dynamic section .rela.init.rodata
    
    Fix it by conditionally discarding .rela* only when CONFIG_RELOCATABLE
    is disabled.
    
    Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    
    Link: https://lore.kernel.org/r/20230105132349.384666-2-mpe@ellerman.id.au
    Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
powerpc: dts: t1040rdb: fix compatible string for Rev A boards [+ + +]
Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Fri Feb 24 17:59:39 2023 +0200

    powerpc: dts: t1040rdb: fix compatible string for Rev A boards
    
    [ Upstream commit ae44f1c9d1fc54aeceb335fedb1e73b2c3ee4561 ]
    
    It looks like U-Boot fails to start the kernel properly when the
    compatible string of the board isn't fsl,T1040RDB, so stop overriding it
    from the rev-a.dts.
    
    Fixes: 5ebb74749202 ("powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
regulator: core: Fix off-on-delay-us for always-on/boot-on regulators [+ + +]
Author: Christian Kohlschütter <christian@kohlschutter.com>
Date:   Tue Jul 19 16:02:00 2022 +0200

    regulator: core: Fix off-on-delay-us for always-on/boot-on regulators
    
    [ Upstream commit 218320fec29430438016f88dd4fbebfa1b95ad8d ]
    
    Regulators marked with "regulator-always-on" or "regulator-boot-on"
    as well as an "off-on-delay-us", may run into cycling issues that are
    hard to detect.
    
    This is caused by the "last_off" state not being initialized in this
    case.
    
    Fix the "last_off" initialization by setting it to the current kernel
    time upon initialization, regardless of always_on/boot_on state.
    
    Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com>
    Link: https://lore.kernel.org/r/FAFD5B39-E9C4-47C7-ACF1-2A04CD59758D@kohlschutter.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Stable-dep-of: 80d2c29e09e6 ("regulator: core: Use ktime_get_boottime() to determine how long a regulator was off")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

regulator: core: Use ktime_get_boottime() to determine how long a regulator was off [+ + +]
Author: Matthias Kaehlcke <mka@chromium.org>
Date:   Thu Feb 23 00:33:30 2023 +0000

    regulator: core: Use ktime_get_boottime() to determine how long a regulator was off
    
    [ Upstream commit 80d2c29e09e663761c2778167a625b25ffe01b6f ]
    
    For regulators with 'off-on-delay-us' the regulator framework currently
    uses ktime_get() to determine how long the regulator has been off
    before re-enabling it (after a delay if needed). A problem with using
    ktime_get() is that it doesn't account for the time the system is
    suspended. As a result a regulator with a longer 'off-on-delay' (e.g.
    500ms) that was switched off during suspend might still incurr in a
    delay on resume before it is re-enabled, even though the regulator
    might have been off for hours. ktime_get_boottime() accounts for
    suspend time, use it instead of ktime_get().
    
    Fixes: a8ce7bd89689 ("regulator: core: Fix off_on_delay handling")
    Cc: stable@vger.kernel.org    # 5.13+
    Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
    Reviewed-by: Stephen Boyd <swboyd@chromium.org>
    Link: https://lore.kernel.org/r/20230223003301.v2.1.I9719661b8eb0a73b8c416f9c26cf5bd8c0563f99@changeid
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

regulator: Flag uncontrollable regulators as always_on [+ + +]
Author: Mark Brown <broonie@kernel.org>
Date:   Fri Mar 25 14:46:37 2022 +0000

    regulator: Flag uncontrollable regulators as always_on
    
    [ Upstream commit 261f06315cf7c3744731e36bfd8d4434949e3389 ]
    
    While we currently assume that regulators with no control available are
    just uncontionally enabled this isn't always as clearly displayed to
    users as is desirable, for example the code for disabling unused
    regulators will log that it is about to disable them. Clean this up a
    bit by setting always_on during constraint evaluation if we have no
    available mechanism for controlling the regualtor so things that check
    the constraint will do the right thing.
    
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Link: https://lore.kernel.org/r/20220325144637.1543496-1-broonie@kernel.org
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Stable-dep-of: 80d2c29e09e6 ("regulator: core: Use ktime_get_boottime() to determine how long a regulator was off")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
RISC-V: Avoid dereferening NULL regs in die() [+ + +]
Author: Palmer Dabbelt <palmer@rivosinc.com>
Date:   Tue Sep 20 13:00:37 2022 -0700

    RISC-V: Avoid dereferening NULL regs in die()
    
    [ Upstream commit f2913d006fcdb61719635e093d1b5dd0dafecac7 ]
    
    I don't think we can actually die() without a regs pointer, but the
    compiler was warning about a NULL check after a dereference.  It seems
    prudent to just avoid the possibly-NULL dereference, given that when
    die()ing the system is already toast so who knows how we got there.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
    Link: https://lore.kernel.org/r/20220920200037.6727-1-palmer@rivosinc.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Stable-dep-of: 130aee3fd998 ("riscv: Avoid enabling interrupts in die()")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

RISC-V: Don't check text_mutex during stop_machine [+ + +]
Author: Conor Dooley <conor.dooley@microchip.com>
Date:   Fri Mar 3 14:37:55 2023 +0000

    RISC-V: Don't check text_mutex during stop_machine
    
    [ Upstream commit 2a8db5ec4a28a0fce822d10224db9471a44b6925 ]
    
    We're currently using stop_machine() to update ftrace & kprobes, which
    means that the thread that takes text_mutex during may not be the same
    as the thread that eventually patches the code.  This isn't actually a
    race because the lock is still held (preventing any other concurrent
    accesses) and there is only one thread running during stop_machine(),
    but it does trigger a lockdep failure.
    
    This patch just elides the lockdep check during stop_machine.
    
    Fixes: c15ac4fd60d5 ("riscv/ftrace: Add dynamic function tracer support")
    Suggested-by: Steven Rostedt <rostedt@goodmis.org>
    Reported-by: Changbin Du <changbin.du@gmail.com>
    Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
    Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
    Link: https://lore.kernel.org/r/20230303143754.4005217-1-conor.dooley@microchip.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
riscv: Add header include guards to insn.h [+ + +]
Author: Liao Chang <liaochang1@huawei.com>
Date:   Sun Jan 29 17:42:42 2023 +0800

    riscv: Add header include guards to insn.h
    
    [ Upstream commit 8ac6e619d9d51b3eb5bae817db8aa94e780a0db4 ]
    
    Add header include guards to insn.h to prevent repeating declaration of
    any identifiers in insn.h.
    
    Fixes: edde5584c7ab ("riscv: Add SW single-step support for KDB")
    Signed-off-by: Liao Chang <liaochang1@huawei.com>
    Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
    Fixes: c9c1af3f186a ("RISC-V: rename parse_asm.h to insn.h")
    Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
    Link: https://lore.kernel.org/r/20230129094242.282620-1-liaochang1@huawei.com
    Cc: stable@vger.kernel.org
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

riscv: Avoid enabling interrupts in die() [+ + +]
Author: Mattias Nissler <mnissler@rivosinc.com>
Date:   Wed Feb 15 14:48:28 2023 +0000

    riscv: Avoid enabling interrupts in die()
    
    [ Upstream commit 130aee3fd9981297ff9354e5d5609cd59aafbbea ]
    
    While working on something else, I noticed that the kernel would start
    accepting interrupts again after crashing in an interrupt handler. Since
    the kernel is already in inconsistent state, enabling interrupts is
    dangerous and opens up risk of kernel state deteriorating further.
    Interrupts do get enabled via what looks like an unintended side effect of
    spin_unlock_irq, so switch to the more cautious
    spin_lock_irqsave/spin_unlock_irqrestore instead.
    
    Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
    Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
    Reviewed-by: Björn Töpel <bjorn@kernel.org>
    Link: https://lore.kernel.org/r/20230215144828.3370316-1-mnissler@rivosinc.com
    Cc: stable@vger.kernel.org
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode [+ + +]
Author: Alexandre Ghiti <alexghiti@rivosinc.com>
Date:   Wed Mar 8 10:16:39 2023 +0100

    riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode
    
    [ Upstream commit 76950340cf03b149412fe0d5f0810e52ac1df8cb ]
    
    When CONFIG_FRAME_POINTER is unset, the stack unwinding function
    walk_stackframe randomly reads the stack and then, when KASAN is enabled,
    it can lead to the following backtrace:
    
    [    0.000000] ==================================================================
    [    0.000000] BUG: KASAN: stack-out-of-bounds in walk_stackframe+0xa6/0x11a
    [    0.000000] Read of size 8 at addr ffffffff81807c40 by task swapper/0
    [    0.000000]
    [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.2.0-12919-g24203e6db61f #43
    [    0.000000] Hardware name: riscv-virtio,qemu (DT)
    [    0.000000] Call Trace:
    [    0.000000] [<ffffffff80007ba8>] walk_stackframe+0x0/0x11a
    [    0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a
    [    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
    [    0.000000] [<ffffffff80c49c80>] dump_stack_lvl+0x22/0x36
    [    0.000000] [<ffffffff80c3783e>] print_report+0x198/0x4a8
    [    0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a
    [    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
    [    0.000000] [<ffffffff8015f68a>] kasan_report+0x9a/0xc8
    [    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
    [    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
    [    0.000000] [<ffffffff8006e99c>] desc_make_final+0x80/0x84
    [    0.000000] [<ffffffff8009a04e>] stack_trace_save+0x88/0xa6
    [    0.000000] [<ffffffff80099fc2>] filter_irq_stacks+0x72/0x76
    [    0.000000] [<ffffffff8006b95e>] devkmsg_read+0x32a/0x32e
    [    0.000000] [<ffffffff8015ec16>] kasan_save_stack+0x28/0x52
    [    0.000000] [<ffffffff8006e998>] desc_make_final+0x7c/0x84
    [    0.000000] [<ffffffff8009a04a>] stack_trace_save+0x84/0xa6
    [    0.000000] [<ffffffff8015ec52>] kasan_set_track+0x12/0x20
    [    0.000000] [<ffffffff8015f22e>] __kasan_slab_alloc+0x58/0x5e
    [    0.000000] [<ffffffff8015e7ea>] __kmem_cache_create+0x21e/0x39a
    [    0.000000] [<ffffffff80e133ac>] create_boot_cache+0x70/0x9c
    [    0.000000] [<ffffffff80e17ab2>] kmem_cache_init+0x6c/0x11e
    [    0.000000] [<ffffffff80e00fd6>] mm_init+0xd8/0xfe
    [    0.000000] [<ffffffff80e011d8>] start_kernel+0x190/0x3ca
    [    0.000000]
    [    0.000000] The buggy address belongs to stack of task swapper/0
    [    0.000000]  and is located at offset 0 in frame:
    [    0.000000]  stack_trace_save+0x0/0xa6
    [    0.000000]
    [    0.000000] This frame has 1 object:
    [    0.000000]  [32, 56) 'c'
    [    0.000000]
    [    0.000000] The buggy address belongs to the physical page:
    [    0.000000] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x81a07
    [    0.000000] flags: 0x1000(reserved|zone=0)
    [    0.000000] raw: 0000000000001000 ff600003f1e3d150 ff600003f1e3d150 0000000000000000
    [    0.000000] raw: 0000000000000000 0000000000000000 00000001ffffffff
    [    0.000000] page dumped because: kasan: bad access detected
    [    0.000000]
    [    0.000000] Memory state around the buggy address:
    [    0.000000]  ffffffff81807b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [    0.000000]  ffffffff81807b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [    0.000000] >ffffffff81807c00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f3
    [    0.000000]                                            ^
    [    0.000000]  ffffffff81807c80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
    [    0.000000]  ffffffff81807d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [    0.000000] ==================================================================
    
    Fix that by using READ_ONCE_NOCHECK when reading the stack in imprecise
    mode.
    
    Fixes: 5d8544e2d007 ("RISC-V: Generic library routines and assembly")
    Reported-by: Chathura Rajapaksha <chathura.abeyrathne.lk@gmail.com>
    Link: https://lore.kernel.org/all/CAD7mqryDQCYyJ1gAmtMm8SASMWAQ4i103ptTb0f6Oda=tPY2=A@mail.gmail.com/
    Suggested-by: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Link: https://lore.kernel.org/r/20230308091639.602024-1-alexghiti@rivosinc.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
s390/ftrace: remove dead code [+ + +]
Author: Heiko Carstens <hca@linux.ibm.com>
Date:   Mon Sep 13 16:08:33 2021 +0200

    s390/ftrace: remove dead code
    
    [ Upstream commit b860b9346e2d5667fbae2cefc571bdb6ce665b53 ]
    
    ftrace_shared_hotpatch_trampoline() never returns NULL,
    therefore quite a bit of code can be removed.
    
    Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
    Stable-dep-of: 2a8db5ec4a28 ("RISC-V: Don't check text_mutex during stop_machine")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
s390: define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36 [+ + +]
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Wed Mar 1 19:03:51 2023 -0700

    s390: define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36
    
    commit a494398bde273143c2352dd373cad8211f7d94b2 upstream.
    
    Nathan Chancellor reports that the s390 vmlinux fails to link with
    GNU ld < 2.36 since commit 99cb0d917ffa ("arch: fix broken BuildID
    for arm64 and riscv").
    
    It happens for defconfig, or more specifically for CONFIG_EXPOLINE=y.
    
      $ s390x-linux-gnu-ld --version | head -n1
      GNU ld (GNU Binutils for Debian) 2.35.2
      $ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- allnoconfig
      $ ./scripts/config -e CONFIG_EXPOLINE
      $ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- olddefconfig
      $ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu-
      `.exit.text' referenced in section `.s390_return_reg' of drivers/base/dd.o: defined in discarded section `.exit.text' of drivers/base/dd.o
      make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
      make: *** [Makefile:1252: vmlinux] Error 2
    
    arch/s390/kernel/vmlinux.lds.S wants to keep EXIT_TEXT:
    
            .exit.text : {
                    EXIT_TEXT
            }
    
    But, at the same time, EXIT_TEXT is thrown away by DISCARD because
    s390 does not define RUNTIME_DISCARD_EXIT.
    
    I still do not understand why the latter wins after 99cb0d917ffa,
    but defining RUNTIME_DISCARD_EXIT seems correct because the comment
    line in arch/s390/kernel/vmlinux.lds.S says:
    
            /*
             * .exit.text is discarded at runtime, not link time,
             * to deal with references from __bug_table
             */
    
    Nathan also found that binutils commit 21401fc7bf67 ("Duplicate output
    sections in scripts") cured this issue, so we cannot reproduce it with
    binutils 2.36+, but it is better to not rely on it.
    
    Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
    Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/
    Reported-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Link: https://lore.kernel.org/r/20230105031306.1455409-1-masahiroy@kernel.org
    Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
scripts: handle BrokenPipeError for python scripts [+ + +]
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Thu Jan 12 11:30:06 2023 +0900

    scripts: handle BrokenPipeError for python scripts
    
    [ Upstream commit 87c7ee67deb7fce9951a5f9d80641138694aad17 ]
    
    In the follow-up of commit fb3041d61f68 ("kbuild: fix SIGPIPE error
    message for AR=gcc-ar and AR=llvm-ar"), Kees Cook pointed out that
    tools should _not_ catch their own SIGPIPEs [1] [2].
    
    Based on his feedback, LLVM was fixed [3].
    
    However, Python's default behavior is to show noisy bracktrace when
    SIGPIPE is sent. So, scripts written in Python are basically in the
    same situation as the buggy llvm tools.
    
    Example:
    
      $ make -s allnoconfig
      $ make -s allmodconfig
      $ scripts/diffconfig .config.old .config | head -n1
      -ALIX n
      Traceback (most recent call last):
        File "/home/masahiro/linux/scripts/diffconfig", line 132, in <module>
          main()
        File "/home/masahiro/linux/scripts/diffconfig", line 130, in main
          print_config("+", config, None, b[config])
        File "/home/masahiro/linux/scripts/diffconfig", line 64, in print_config
          print("+%s %s" % (config, new_value))
      BrokenPipeError: [Errno 32] Broken pipe
    
    Python documentation [4] notes how to make scripts die immediately and
    silently:
    
      """
      Piping output of your program to tools like head(1) will cause a
      SIGPIPE signal to be sent to your process when the receiver of its
      standard output closes early. This results in an exception like
      BrokenPipeError: [Errno 32] Broken pipe. To handle this case,
      wrap your entry point to catch this exception as follows:
    
        import os
        import sys
    
        def main():
            try:
                # simulate large output (your code replaces this loop)
                for x in range(10000):
                    print("y")
                # flush output here to force SIGPIPE to be triggered
                # while inside this try block.
                sys.stdout.flush()
            except BrokenPipeError:
                # Python flushes standard streams on exit; redirect remaining output
                # to devnull to avoid another BrokenPipeError at shutdown
                devnull = os.open(os.devnull, os.O_WRONLY)
                os.dup2(devnull, sys.stdout.fileno())
                sys.exit(1)  # Python exits with error code 1 on EPIPE
    
        if __name__ == '__main__':
            main()
    
      Do not set SIGPIPE’s disposition to SIG_DFL in order to avoid
      BrokenPipeError. Doing that would cause your program to exit
      unexpectedly whenever any socket connection is interrupted while
      your program is still writing to it.
      """
    
    Currently, tools/perf/scripts/python/intel-pt-events.py seems to be the
    only script that fixes the issue that way.
    
    tools/perf/scripts/python/compaction-times.py uses another approach
    signal.signal(signal.SIGPIPE, signal.SIG_DFL) but the Python
    documentation clearly says "Don't do it".
    
    I cannot fix all Python scripts since there are so many.
    I fixed some in the scripts/ directory.
    
    [1]: https://lore.kernel.org/all/202211161056.1B9611A@keescook/
    [2]: https://github.com/llvm/llvm-project/issues/59037
    [3]: https://github.com/llvm/llvm-project/commit/4787efa38066adb51e2c049499d25b3610c0877b
    [4]: https://docs.python.org/3/library/signal.html#note-on-sigpipe
    
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
    Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
scsi: core: Remove the /proc/scsi/${proc_name} directory earlier [+ + +]
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Fri Feb 10 12:52:00 2023 -0800

    scsi: core: Remove the /proc/scsi/${proc_name} directory earlier
    
    [ Upstream commit fc663711b94468f4e1427ebe289c9f05669699c9 ]
    
    Remove the /proc/scsi/${proc_name} directory earlier to fix a race
    condition between unloading and reloading kernel modules. This fixes a bug
    introduced in 2009 by commit 77c019768f06 ("[SCSI] fix /proc memory leak in
    the SCSI core").
    
    Fix the following kernel warning:
    
    proc_dir_entry 'scsi/scsi_debug' already registered
    WARNING: CPU: 19 PID: 27986 at fs/proc/generic.c:376 proc_register+0x27d/0x2e0
    Call Trace:
     proc_mkdir+0xb5/0xe0
     scsi_proc_hostdir_add+0xb5/0x170
     scsi_host_alloc+0x683/0x6c0
     sdebug_driver_probe+0x6b/0x2d0 [scsi_debug]
     really_probe+0x159/0x540
     __driver_probe_device+0xdc/0x230
     driver_probe_device+0x4f/0x120
     __device_attach_driver+0xef/0x180
     bus_for_each_drv+0xe5/0x130
     __device_attach+0x127/0x290
     device_initial_probe+0x17/0x20
     bus_probe_device+0x110/0x130
     device_add+0x673/0xc80
     device_register+0x1e/0x30
     sdebug_add_host_helper+0x1a7/0x3b0 [scsi_debug]
     scsi_debug_init+0x64f/0x1000 [scsi_debug]
     do_one_initcall+0xd7/0x470
     do_init_module+0xe7/0x330
     load_module+0x122a/0x12c0
     __do_sys_finit_module+0x124/0x1a0
     __x64_sys_finit_module+0x46/0x50
     do_syscall_64+0x38/0x80
     entry_SYSCALL_64_after_hwframe+0x46/0xb0
    
    Link: https://lore.kernel.org/r/20230210205200.36973-3-bvanassche@acm.org
    Cc: Alan Stern <stern@rowland.harvard.edu>
    Cc: Yi Zhang <yi.zhang@redhat.com>
    Cc: stable@vger.kernel.org
    Fixes: 77c019768f06 ("[SCSI] fix /proc memory leak in the SCSI core")
    Reported-by: Yi Zhang <yi.zhang@redhat.com>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: megaraid_sas: Update max supported LD IDs to 240 [+ + +]
Author: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Date:   Thu Mar 2 16:23:40 2023 +0530

    scsi: megaraid_sas: Update max supported LD IDs to 240
    
    [ Upstream commit bfa659177dcba48cf13f2bd88c1972f12a60bf1c ]
    
    The firmware only supports Logical Disk IDs up to 240 and LD ID 255 (0xFF)
    is reserved for deleted LDs. However, in some cases, firmware was assigning
    LD ID 254 (0xFE) to deleted LDs and this was causing the driver to mark the
    wrong disk as deleted. This in turn caused the wrong disk device to be
    taken offline by the SCSI midlayer.
    
    To address this issue, limit the LD ID range from 255 to 240. This ensures
    the deleted LD ID is properly identified and removed by the driver without
    accidently deleting any valid LDs.
    
    Fixes: ae6874ba4b43 ("scsi: megaraid_sas: Early detection of VD deletion through RaidMap update")
    Reported-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
    Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
    Link: https://lore.kernel.org/r/20230302105342.34933-2-chandrakanth.patil@broadcom.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
selftests: nft_nat: ensuring the listening side is up before starting the client [+ + +]
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Mon Feb 27 17:36:46 2023 +0800

    selftests: nft_nat: ensuring the listening side is up before starting the client
    
    [ Upstream commit 2067e7a00aa604b94de31d64f29b8893b1696f26 ]
    
    The test_local_dnat_portonly() function initiates the client-side as
    soon as it sets the listening side to the background. This could lead to
    a race condition where the server may not be ready to listen. To ensure
    that the server-side is up and running before initiating the
    client-side, a delay is introduced to the test_local_dnat_portonly()
    function.
    
    Before the fix:
      # ./nft_nat.sh
      PASS: netns routing/connectivity: ns0-rthlYrBU can reach ns1-rthlYrBU and ns2-rthlYrBU
      PASS: ping to ns1-rthlYrBU was ip NATted to ns2-rthlYrBU
      PASS: ping to ns1-rthlYrBU OK after ip nat output chain flush
      PASS: ipv6 ping to ns1-rthlYrBU was ip6 NATted to ns2-rthlYrBU
      2023/02/27 04:11:03 socat[6055] E connect(5, AF=2 10.0.1.99:2000, 16): Connection refused
      ERROR: inet port rewrite
    
    After the fix:
      # ./nft_nat.sh
      PASS: netns routing/connectivity: ns0-9sPJV6JJ can reach ns1-9sPJV6JJ and ns2-9sPJV6JJ
      PASS: ping to ns1-9sPJV6JJ was ip NATted to ns2-9sPJV6JJ
      PASS: ping to ns1-9sPJV6JJ OK after ip nat output chain flush
      PASS: ipv6 ping to ns1-9sPJV6JJ was ip6 NATted to ns2-9sPJV6JJ
      PASS: inet port rewrite without l3 address
    
    Fixes: 282e5f8fe907 ("netfilter: nat: really support inet nat without l3 address")
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
sh: define RUNTIME_DISCARD_EXIT [+ + +]
Author: Tom Saeger <tom.saeger@oracle.com>
Date:   Wed Mar 1 19:03:52 2023 -0700

    sh: define RUNTIME_DISCARD_EXIT
    
    commit c1c551bebf928889e7a8fef7415b44f9a64975f4 upstream.
    
    sh vmlinux fails to link with GNU ld < 2.40 (likely < 2.36) since
    commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv").
    
    This is similar to fixes for powerpc and s390:
    commit 4b9880dbf3bd ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT").
    commit a494398bde27 ("s390: define RUNTIME_DISCARD_EXIT to fix link error
    with GNU ld < 2.36").
    
      $ sh4-linux-gnu-ld --version | head -n1
      GNU ld (GNU Binutils for Debian) 2.35.2
    
      $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- microdev_defconfig
      $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu-
    
      `.exit.text' referenced in section `__bug_table' of crypto/algboss.o:
      defined in discarded section `.exit.text' of crypto/algboss.o
      `.exit.text' referenced in section `__bug_table' of
      drivers/char/hw_random/core.o: defined in discarded section
      `.exit.text' of drivers/char/hw_random/core.o
      make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
      make[1]: *** [Makefile:1252: vmlinux] Error 2
    
    arch/sh/kernel/vmlinux.lds.S keeps EXIT_TEXT:
    
            /*
             * .exit.text is discarded at runtime, not link time, to deal with
             * references from __bug_table
             */
            .exit.text : AT(ADDR(.exit.text)) { EXIT_TEXT }
    
    However, EXIT_TEXT is thrown away by
    DISCARD(include/asm-generic/vmlinux.lds.h) because
    sh does not define RUNTIME_DISCARD_EXIT.
    
    GNU ld 2.40 does not have this issue and builds fine.
    This corresponds with Masahiro's comments in a494398bde27:
    "Nathan [Chancellor] also found that binutils
    commit 21401fc7bf67 ("Duplicate output sections in scripts") cured this
    issue, so we cannot reproduce it with binutils 2.36+, but it is better
    to not rely on it."
    
    Link: https://lkml.kernel.org/r/9166a8abdc0f979e50377e61780a4bba1dfa2f52.1674518464.git.tom.saeger@oracle.com
    Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
    Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/
    Link: https://lore.kernel.org/all/20230123194218.47ssfzhrpnv3xfez@oracle.com/
    Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
    Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
    Cc: Ard Biesheuvel <ardb@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Dennis Gilmore <dennis@ausil.us>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Masahiro Yamada <masahiroy@kernel.org>
    Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Palmer Dabbelt <palmer@rivosinc.com>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
staging: rtl8723bs: clean up comparsions to NULL [+ + +]
Author: Michael Straube <straube.linux@gmail.com>
Date:   Sun Aug 29 17:45:33 2021 +0200

    staging: rtl8723bs: clean up comparsions to NULL
    
    [ Upstream commit cd1f1450092216b3e39516f8db58869b6fc20575 ]
    
    Clean up comparsions to NULL reported by checkpatch.
    
    x == NULL -> !x
    x != NULL -> x
    
    Signed-off-by: Michael Straube <straube.linux@gmail.com>
    Link: https://lore.kernel.org/r/20210829154533.11054-1-straube.linux@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Stable-dep-of: 05cbcc415c9b ("staging: rtl8723bs: Fix key-store index handling")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

staging: rtl8723bs: Fix key-store index handling [+ + +]
Author: Hans de Goede <hdegoede@redhat.com>
Date:   Mon Mar 6 16:35:11 2023 +0100

    staging: rtl8723bs: Fix key-store index handling
    
    [ Upstream commit 05cbcc415c9b8c8bc4f9a09f8e03610a89042f03 ]
    
    There are 2 issues with the key-store index handling
    
    1. The non WEP key stores can store keys with indexes 0 - BIP_MAX_KEYID,
       this means that they should be an array with BIP_MAX_KEYID + 1
       entries. But some of the arrays where just BIP_MAX_KEYID entries
       big. While one other array was hardcoded to a size of 6 entries,
       instead of using the BIP_MAX_KEYID define.
    
    2. The rtw_cfg80211_set_encryption() and wpa_set_encryption() functions
       index check where checking that the passed in key-index would fit
       inside both the WEP key store (which only has 4 entries) as well as
       in the non WEP key stores. This breaks any attempts to set non WEP
       keys with index 4 or 5.
    
    Issue 2. specifically breaks wifi connection with some access points
    which advertise PMF support. Without this fix connecting to these
    access points fails with the following wpa_supplicant messages:
    
     nl80211: kernel reports: key addition failed
     wlan0: WPA: Failed to configure IGTK to the driver
     wlan0: RSN: Failed to configure IGTK
     wlan0: CTRL-EVENT-DISCONNECTED bssid=... reason=1 locally_generated=1
    
    Fix 1. by using the right size for the key-stores. After this 2. can
    safely be fixed by checking the right max-index value depending on the
    used algorithm, fixing wifi not working with some PMF capable APs.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Link: https://lore.kernel.org/r/20230306153512.162104-1-hdegoede@redhat.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

staging: rtl8723bs: fix placement of braces [+ + +]
Author: Hannes Braun <hannesbraun@mail.de>
Date:   Sat May 28 14:31:15 2022 +0200

    staging: rtl8723bs: fix placement of braces
    
    [ Upstream commit a8b088d6d98dafddda9874f98ac2a7cefc51639b ]
    
    This patch should eliminate the following errors/warnings emitted by
    checkpatch.pl:
    - that open brace { should be on the previous line
    - else should follow close brace '}'
    - braces {} are not necessary for single statement blocks
    
    Signed-off-by: Hannes Braun <hannesbraun@mail.de>
    Link: https://lore.kernel.org/r/20220528123115.13024-1-hannesbraun@mail.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Stable-dep-of: 05cbcc415c9b ("staging: rtl8723bs: Fix key-store index handling")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

staging: rtl8723bs: Pass correct parameters to cfg80211_get_bss() [+ + +]
Author: Hans de Goede <hdegoede@redhat.com>
Date:   Mon Mar 6 16:35:12 2023 +0100

    staging: rtl8723bs: Pass correct parameters to cfg80211_get_bss()
    
    commit d17789edd6a8270c38459e592ee536a84c6202db upstream.
    
    To last 2 parameters to cfg80211_get_bss() should be of
    the enum ieee80211_bss_type resp. enum ieee80211_privacy types,
    which WLAN_CAPABILITY_ESS very much is not.
    
    Fix both cfg80211_get_bss() calls in ioctl_cfg80211.c to pass
    the right parameters.
    
    Note that the second call was already somewhat fixed by commenting
    out WLAN_CAPABILITY_ESS and passing in 0 instead. This was still
    not entirely correct though since that would limit returned
    BSS-es to ESS type BSS-es with privacy on.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Link: https://lore.kernel.org/r/20230306153512.162104-2-hdegoede@redhat.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
Staging: rtl8723bs: Placing opening { braces in previous line [+ + +]
Author: Jagath Jog J <jagathjog1996@gmail.com>
Date:   Mon Jan 24 09:14:54 2022 +0530

    Staging: rtl8723bs: Placing opening { braces in previous line
    
    [ Upstream commit 1d7280898f683ca824fc5eab5c486a583a81473b ]
    
    Fix following checkpatch.pl error by placing opening {
    braces in previous line
    ERROR: that open brace { should be on the previous line
    
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Jagath Jog J <jagathjog1996@gmail.com>
    Link: https://lore.kernel.org/r/20220124034456.8665-2-jagathjog1996@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Stable-dep-of: 05cbcc415c9b ("staging: rtl8723bs: Fix key-store index handling")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
SUNRPC: Fix a server shutdown leak [+ + +]
Author: Benjamin Coddington <bcodding@redhat.com>
Date:   Fri Mar 3 16:08:32 2023 -0500

    SUNRPC: Fix a server shutdown leak
    
    [ Upstream commit 9ca6705d9d609441d34f8b853e1e4a6369b3b171 ]
    
    Fix a race where kthread_stop() may prevent the threadfn from ever getting
    called.  If that happens the svc_rqst will not be cleaned up.
    
    Fixes: ed6473ddc704 ("NFSv4: Fix callback server shutdown")
    Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
tools bpf_jit_disasm: Fix compilation error with new binutils [+ + +]
Author: Andres Freund <andres@anarazel.de>
Date:   Sun Jul 31 18:38:31 2022 -0700

    tools bpf_jit_disasm: Fix compilation error with new binutils
    
    commit 96ed066054abf11c7d3e106e3011a51f3f1227a3 upstream.
    
    binutils changed the signature of init_disassemble_info(), which now causes
    compilation to fail for tools/bpf/bpf_jit_disasm.c, e.g. on debian
    unstable.
    
    Relevant binutils commit:
    
      https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=60a3da00bd5407f07
    
    Wire up the feature test and switch to init_disassemble_info_compat(),
    which were introduced in prior commits, fixing the compilation failure.
    
    I verified that bpf_jit_disasm can still disassemble bpf programs, both
    with the old and new dis-asm.h API. With old binutils there's no change in
    output before/after this patch. When comparing the output from old
    binutils (2.35) to new bintuils with the patch (upstream snapshot) there
    are a few output differences, but they are unrelated to this patch. An
    example hunk is:
    
         f4:        mov    %r14,%rsi
         f7:        mov    %r15,%rdx
         fa:        mov    $0x2a,%ecx
      -  ff:        callq  0xffffffffea8c4988
      +  ff:        call   0xffffffffea8c4988
        104:        test   %rax,%rax
        107:        jge    0x0000000000000110
        109:        xor    %eax,%eax
      - 10b:        jmpq   0x0000000000000073
      + 10b:        jmp    0x0000000000000073
        110:        cmp    $0x16,%rax
    
    However, I had to use an older kernel to generate the bpf_jit_enabled =
    2 output, as that has been broken since 5.18 / 1022a5498f6f745c ("bpf,
    x86_64: Use bpf_jit_binary_pack_alloc").
    
      https://lore.kernel.org/20220703030210.pmjft7qc2eajzi6c@alap3.anarazel.de
    
    Signed-off-by: Andres Freund <andres@anarazel.de>
    Acked-by: Quentin Monnet <quentin@isovalent.com>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Ben Hutchings <benh@debian.org>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Quentin Monnet <quentin@isovalent.com>
    Cc: Sedat Dilek <sedat.dilek@gmail.com>
    Cc: bpf@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20220622181918.ykrs5rsnmx3og4sv@alap3.anarazel.de
    Link: https://lore.kernel.org/r/20220801013834.156015-6-andres@anarazel.de
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Hauke Mehrtens <hauke@hauke-m.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
tools bpftool: Fix compilation error with new binutils [+ + +]
Author: Andres Freund <andres@anarazel.de>
Date:   Sun Jul 31 18:38:33 2022 -0700

    tools bpftool: Fix compilation error with new binutils
    
    commit 600b7b26c07a070d0153daa76b3806c1e52c9e00 upstream.
    
    binutils changed the signature of init_disassemble_info(), which now causes
    compilation to fail for tools/bpf/bpftool/jit_disasm.c, e.g. on debian
    unstable.
    
    Relevant binutils commit:
    
      https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=60a3da00bd5407f07
    
    Wire up the feature test and switch to init_disassemble_info_compat(),
    which were introduced in prior commits, fixing the compilation failure.
    
    I verified that bpftool can still disassemble bpf programs, both with an
    old and new dis-asm.h API. There are no output changes for plain and json
    formats. When comparing the output from old binutils (2.35)
    to new bintuils with the patch (upstream snapshot) there are a few output
    differences, but they are unrelated to this patch. An example hunk is:
    
         2f:        pop    %r14
         31:        pop    %r13
         33:        pop    %rbx
      -  34:        leaveq
      -  35:        retq
      +  34:        leave
      +  35:        ret
    
    Signed-off-by: Andres Freund <andres@anarazel.de>
    Acked-by: Quentin Monnet <quentin@isovalent.com>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Ben Hutchings <benh@debian.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Quentin Monnet <quentin@isovalent.com>
    Cc: Sedat Dilek <sedat.dilek@gmail.com>
    Cc: bpf@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20220622181918.ykrs5rsnmx3og4sv@alap3.anarazel.de
    Link: https://lore.kernel.org/r/20220801013834.156015-8-andres@anarazel.de
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Hauke Mehrtens <hauke@hauke-m.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
tools build: Add feature test for init_disassemble_info API changes [+ + +]
Author: Andres Freund <andres@anarazel.de>
Date:   Sun Jul 31 18:38:27 2022 -0700

    tools build: Add feature test for init_disassemble_info API changes
    
    commit cfd59ca91467056bb2c36907b2fa67b8e1af9952 upstream.
    
    binutils changed the signature of init_disassemble_info(), which now causes
    compilation failures for tools/{perf,bpf}, e.g. on debian unstable.
    
    Relevant binutils commit:
    
      https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=60a3da00bd5407f07
    
    This commit adds a feature test to detect the new signature.  Subsequent
    commits will use it to fix the build failures.
    
    Signed-off-by: Andres Freund <andres@anarazel.de>
    Acked-by: Quentin Monnet <quentin@isovalent.com>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Ben Hutchings <benh@debian.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Quentin Monnet <quentin@isovalent.com>
    Cc: Sedat Dilek <sedat.dilek@gmail.com>
    Cc: bpf@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20220622181918.ykrs5rsnmx3og4sv@alap3.anarazel.de
    Link: https://lore.kernel.org/r/20220801013834.156015-2-andres@anarazel.de
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Hauke Mehrtens <hauke@hauke-m.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
tools include: add dis-asm-compat.h to handle version differences [+ + +]
Author: Andres Freund <andres@anarazel.de>
Date:   Sun Jul 31 18:38:29 2022 -0700

    tools include: add dis-asm-compat.h to handle version differences
    
    commit a45b3d6926231c3d024ea0de4f7bd967f83709ee upstream.
    
    binutils changed the signature of init_disassemble_info(), which now causes
    compilation failures for tools/{perf,bpf}, e.g. on debian unstable.
    
    Relevant binutils commit:
    
      https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=60a3da00bd5407f07
    
    This commit introduces a wrapper for init_disassemble_info(), to avoid
    spreading #ifdef DISASM_INIT_STYLED to a bunch of places. Subsequent
    commits will use it to fix the build failures.
    
    It likely is worth adding a wrapper for disassember(), to avoid the already
    existing DISASM_FOUR_ARGS_SIGNATURE ifdefery.
    
    Signed-off-by: Andres Freund <andres@anarazel.de>
    Signed-off-by: Ben Hutchings <benh@debian.org>
    Acked-by: Quentin Monnet <quentin@isovalent.com>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Ben Hutchings <benh@debian.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Quentin Monnet <quentin@isovalent.com>
    Cc: Sedat Dilek <sedat.dilek@gmail.com>
    Cc: bpf@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20220622181918.ykrs5rsnmx3og4sv@alap3.anarazel.de
    Link: https://lore.kernel.org/r/20220801013834.156015-4-andres@anarazel.de
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Hauke Mehrtens <hauke@hauke-m.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
tools perf: Fix compilation error with new binutils [+ + +]
Author: Andres Freund <andres@anarazel.de>
Date:   Sun Jul 31 18:38:30 2022 -0700

    tools perf: Fix compilation error with new binutils
    
    commit 83aa0120487e8bc3f231e72c460add783f71f17c upstream.
    
    binutils changed the signature of init_disassemble_info(), which now causes
    compilation failures for tools/perf/util/annotate.c, e.g. on debian
    unstable.
    
    Relevant binutils commit:
    
      https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=60a3da00bd5407f07
    
    Wire up the feature test and switch to init_disassemble_info_compat(),
    which were introduced in prior commits, fixing the compilation failure.
    
    I verified that perf can still disassemble bpf programs by using bpftrace
    under load, recording a perf trace, and then annotating the bpf "function"
    with and without the changes. With old binutils there's no change in output
    before/after this patch. When comparing the output from old binutils (2.35)
    to new bintuils with the patch (upstream snapshot) there are a few output
    differences, but they are unrelated to this patch. An example hunk is:
    
           1.15 :   55:mov    %rbp,%rdx
           0.00 :   58:add    $0xfffffffffffffff8,%rdx
           0.00 :   5c:xor    %ecx,%ecx
      -    1.03 :   5e:callq  0xffffffffe12aca3c
      +    1.03 :   5e:call   0xffffffffe12aca3c
           0.00 :   63:xor    %eax,%eax
      -    2.18 :   65:leaveq
      -    2.82 :   66:retq
      +    2.18 :   65:leave
      +    2.82 :   66:ret
    
    Signed-off-by: Andres Freund <andres@anarazel.de>
    Acked-by: Quentin Monnet <quentin@isovalent.com>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Ben Hutchings <benh@debian.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Sedat Dilek <sedat.dilek@gmail.com>
    Cc: bpf@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20220622181918.ykrs5rsnmx3og4sv@alap3.anarazel.de
    Link: https://lore.kernel.org/r/20220801013834.156015-5-andres@anarazel.de
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Hauke Mehrtens <hauke@hauke-m.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
tpm/eventlog: Don't abort tpm_read_log on faulty ACPI address [+ + +]
Author: Morten Linderud <morten@linderud.pw>
Date:   Wed Feb 15 10:25:52 2023 +0100

    tpm/eventlog: Don't abort tpm_read_log on faulty ACPI address
    
    [ Upstream commit 80a6c216b16d7f5c584d2148c2e4345ea4eb06ce ]
    
    tpm_read_log_acpi() should return -ENODEV when no eventlog from the ACPI
    table is found. If the firmware vendor includes an invalid log address
    we are unable to map from the ACPI memory and tpm_read_log() returns -EIO
    which would abort discovery of the eventlog.
    
    Change the return value from -EIO to -ENODEV when acpi_os_map_iomem()
    fails to map the event log.
    
    The following hardware was used to test this issue:
        Framework Laptop (Pre-production)
        BIOS: INSYDE Corp, Revision: 3.2
        TPM Device: NTC, Firmware Revision: 7.2
    
    Dump of the faulty ACPI TPM2 table:
        [000h 0000   4]                    Signature : "TPM2"    [Trusted Platform Module hardware interface Table]
        [004h 0004   4]                 Table Length : 0000004C
        [008h 0008   1]                     Revision : 04
        [009h 0009   1]                     Checksum : 2B
        [00Ah 0010   6]                       Oem ID : "INSYDE"
        [010h 0016   8]                 Oem Table ID : "TGL-ULT"
        [018h 0024   4]                 Oem Revision : 00000002
        [01Ch 0028   4]              Asl Compiler ID : "ACPI"
        [020h 0032   4]        Asl Compiler Revision : 00040000
    
        [024h 0036   2]               Platform Class : 0000
        [026h 0038   2]                     Reserved : 0000
        [028h 0040   8]              Control Address : 0000000000000000
        [030h 0048   4]                 Start Method : 06 [Memory Mapped I/O]
    
        [034h 0052  12]            Method Parameters : 00 00 00 00 00 00 00 00 00 00 00 00
        [040h 0064   4]           Minimum Log Length : 00010000
        [044h 0068   8]                  Log Address : 000000004053D000
    
    Fixes: 0cf577a03f21 ("tpm: Fix handling of missing event log")
    Tested-by: Erkki Eilonen <erkki@bearmetal.eu>
    Signed-off-by: Morten Linderud <morten@linderud.pw>
    Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
udf: Fix off-by-one error when discarding preallocation [+ + +]
Author: Jan Kara <jack@suse.cz>
Date:   Mon Jan 23 14:29:15 2023 +0100

    udf: Fix off-by-one error when discarding preallocation
    
    [ Upstream commit f54aa97fb7e5329a373f9df4e5e213ced4fc8759 ]
    
    The condition determining whether the preallocation can be used had
    an off-by-one error so we didn't discard preallocation when new
    allocation was just following it. This can then confuse code in
    inode_getblk().
    
    CC: stable@vger.kernel.org
    Fixes: 16d055656814 ("udf: Discard preallocation before extending file with a hole")
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
UML: define RUNTIME_DISCARD_EXIT [+ + +]
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Wed Feb 8 01:41:56 2023 +0900

    UML: define RUNTIME_DISCARD_EXIT
    
    commit b99ddbe8336ee680257c8ab479f75051eaa49dcf upstream.
    
    With CONFIG_VIRTIO_UML=y, GNU ld < 2.36 fails to link UML vmlinux
    (w/wo CONFIG_LD_SCRIPT_STATIC).
    
      `.exit.text' referenced in section `.uml.exitcall.exit' of arch/um/drivers/virtio_uml.o: defined in discarded section `.exit.text' of arch/um/drivers/virtio_uml.o
      collect2: error: ld returned 1 exit status
    
    This fix is similar to the following commits:
    
    - 4b9880dbf3bd ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT")
    - a494398bde27 ("s390: define RUNTIME_DISCARD_EXIT to fix link error
      with GNU ld < 2.36")
    - c1c551bebf92 ("sh: define RUNTIME_DISCARD_EXIT")
    
    Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
    Reported-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Tested-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Richard Weinberger <richard@nod.at>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths [+ + +]
Author: David Disseldorp <ddiss@suse.de>
Date:   Tue Mar 7 16:21:06 2023 +0100

    watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths
    
    [ Upstream commit 03e1d60e177eedbd302b77af4ea5e21b5a7ade31 ]
    
    The watch_queue_set_size() allocation error paths return the ret value
    set via the prior pipe_resize_ring() call, which will always be zero.
    
    As a result, IOC_WATCH_QUEUE_SET_SIZE callers such as "keyctl watch"
    fail to detect kernel wqueue->notes allocation failures and proceed to
    KEYCTL_WATCH_KEY, with any notifications subsequently lost.
    
    Fixes: c73be61cede58 ("pipe: Add general notification queue support")
    Signed-off-by: David Disseldorp <ddiss@suse.de>
    Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
x86/CPU/AMD: Disable XSAVES on AMD family 0x17 [+ + +]
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Tue Mar 7 17:46:43 2023 +0000

    x86/CPU/AMD: Disable XSAVES on AMD family 0x17
    
    commit b0563468eeac88ebc70559d52a0b66efc37e4e9d upstream.
    
    AMD Erratum 1386 is summarised as:
    
      XSAVES Instruction May Fail to Save XMM Registers to the Provided
      State Save Area
    
    This piece of accidental chronomancy causes the %xmm registers to
    occasionally reset back to an older value.
    
    Ignore the XSAVES feature on all AMD Zen1/2 hardware.  The XSAVEC
    instruction (which works fine) is equivalent on affected parts.
    
      [ bp: Typos, move it into the F17h-specific function. ]
    
    Reported-by: Tavis Ormandy <taviso@gmail.com>
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: <stable@kernel.org>
    Link: https://lore.kernel.org/r/20230307174643.1240184-1-andrew.cooper3@citrix.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
xfs: fallocate() should call file_modified() [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Mar 7 10:59:14 2023 -0800

    xfs: fallocate() should call file_modified()
    
    commit fbe7e520036583a783b13ff9744e35c2a329d9a4 upsream.
    
    In XFS, we always update the inode change and modification time when
    any fallocate() operation succeeds.  Furthermore, as various
    fallocate modes can change the file contents (extending EOF,
    punching holes, zeroing things, shifting extents), we should drop
    file privileges like suid just like we do for a regular write().
    There's already a VFS helper that figures all this out for us, so
    use that.
    
    The net effect of this is that we no longer drop suid/sgid if the
    caller is root, but we also now drop file capabilities.
    
    We also move the xfs_update_prealloc_flags() function so that it now
    is only called by the scope that needs to set the the prealloc flag.
    
    Based on a patch from Darrick Wong.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

xfs: remove XFS_PREALLOC_SYNC [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Mar 7 10:59:13 2023 -0800

    xfs: remove XFS_PREALLOC_SYNC
    
    commit 472c6e46f589c26057596dcba160712a5b3e02c5 upstream.
    
    [partial backport for dependency -
     xfs_ioc_space() still uses XFS_PREALLOC_SYNC]
    
    Callers can acheive the same thing by calling xfs_log_force_inode()
    after making their modifications. There is no need for
    xfs_update_prealloc_flags() to do this.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

xfs: remove xfs_setattr_time() declaration [+ + +]
Author: Gaosheng Cui <cuigaosheng1@huawei.com>
Date:   Mon Sep 19 06:53:14 2022 +1000

    xfs: remove xfs_setattr_time() declaration
    
    commit b0463b9dd7030a766133ad2f1571f97f204d7bdf upstream.
    
    xfs_setattr_time() has been removed since
    commit e014f37db1a2 ("xfs: use setattr_copy to set vfs inode
    attributes"), so remove it.
    
    Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: set prealloc flag in xfs_alloc_file_space() [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Mar 7 10:59:15 2023 -0800

    xfs: set prealloc flag in xfs_alloc_file_space()
    
    commit 0b02c8c0d75a738c98c35f02efb36217c170d78c upsream.
    
    Now that we only call xfs_update_prealloc_flags() from
    xfs_file_fallocate() in the case where we need to set the
    preallocation flag, do this in xfs_alloc_file_space() where we
    already have the inode joined into a transaction and get
    rid of the call to xfs_update_prealloc_flags() from the fallocate
    code.
    
    This also means that we now correctly avoid setting the
    XFS_DIFLAG_PREALLOC flag when xfs_is_always_cow_inode() is true, as
    these inodes will never have preallocated extents.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

xfs: use setattr_copy to set vfs inode attributes [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Mar 7 10:59:12 2023 -0800

    xfs: use setattr_copy to set vfs inode attributes
    
    commit e014f37db1a2d109afa750042ac4d69cf3e3d88e upsream.
    
    Filipe Manana pointed out that XFS' behavior w.r.t. setuid/setgid
    revocation isn't consistent with btrfs[1] or ext4.  Those two
    filesystems use the VFS function setattr_copy to convey certain
    attributes from struct iattr into the VFS inode structure.
    
    Andrey Zhadchenko reported[2] that XFS uses the wrong user namespace to
    decide if it should clear setgid and setuid on a file attribute update.
    This is a second symptom of the problem that Filipe noticed.
    
    XFS, on the other hand, open-codes setattr_copy in xfs_setattr_mode,
    xfs_setattr_nonsize, and xfs_setattr_time.  Regrettably, setattr_copy is
    /not/ a simple copy function; it contains additional logic to clear the
    setgid bit when setting the mode, and XFS' version no longer matches.
    
    The VFS implements its own setuid/setgid stripping logic, which
    establishes consistent behavior.  It's a tad unfortunate that it's
    scattered across notify_change, should_remove_suid, and setattr_copy but
    XFS should really follow the Linux VFS.  Adapt XFS to use the VFS
    functions and get rid of the old functions.
    
    [1] https://lore.kernel.org/fstests/CAL3q7H47iNQ=Wmk83WcGB-KBJVOEtR9+qGczzCeXJ9Y2KCV25Q@mail.gmail.com/
    [2] https://lore.kernel.org/linux-xfs/20220221182218.748084-1-andrey.zhadchenko@virtuozzo.com/
    
    Fixes: 7fa294c8991c ("userns: Allow chown and setgid preservation")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Tested-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>