Список изменений в ядре 6.8.7

accel/ivpu: Check return code of ipc->lock init [+ + +]

Author: Wachowski, Karol <karol.wachowski@intel.com>
Date:   Tue Apr 2 12:49:22 2024 +0200

    accel/ivpu: Check return code of ipc->lock init
    
    commit f0cf7ffcd02953c72fed5995378805883d16203e upstream.
    
    Return value of drmm_mutex_init(ipc->lock) was unchecked.
    
    Fixes: 5d7422cfb498 ("accel/ivpu: Add IPC driver and JSM messages")
    Cc: <stable@vger.kernel.org> # v6.3+
    Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
    Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
    Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-2-jacek.lawrynowicz@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

accel/ivpu: Fix deadlock in context_xa [+ + +]

Author: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Date:   Tue Apr 2 12:49:29 2024 +0200

    accel/ivpu: Fix deadlock in context_xa
    
    commit fd7726e75968b27fe98534ccbf47ccd6fef686f3 upstream.
    
    ivpu_device->context_xa is locked both in kernel thread and IRQ context.
    It requires XA_FLAGS_LOCK_IRQ flag to be passed during initialization
    otherwise the lock could be acquired from a thread and interrupted by
    an IRQ that locks it for the second time causing the deadlock.
    
    This deadlock was reported by lockdep and observed in internal tests.
    
    Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU")
    Cc: <stable@vger.kernel.org> # v6.3+
    Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
    Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-9-jacek.lawrynowicz@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

accel/ivpu: Fix PCI D0 state entry in resume [+ + +]

Author: Wachowski, Karol <karol.wachowski@intel.com>
Date:   Tue Apr 2 12:49:24 2024 +0200

    accel/ivpu: Fix PCI D0 state entry in resume
    
    commit 3534eacbf101f6e66105f03d869a03893407c384 upstream.
    
    In case of failed power up we end up left in PCI D3hot
    state making it impossible to access NPU registers on retry.
    Enter D0 state on retry before proceeding with power up sequence.
    
    Fixes: 28083ff18d3f ("accel/ivpu: Fix DevTLB errors on suspend/resume and recovery")
    Cc: <stable@vger.kernel.org> # v6.8+
    Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
    Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
    Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-4-jacek.lawrynowicz@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

accel/ivpu: Put NPU back to D3hot after failed resume [+ + +]

Author: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Date:   Tue Apr 2 12:49:25 2024 +0200

    accel/ivpu: Put NPU back to D3hot after failed resume
    
    commit 875bc9cd1b33eb027a5663f5e6878a43d98e9a16 upstream.
    
    Put NPU in D3hot after ivpu_resume() fails to power up the device.
    This will assure that D3->D0 power cycle will be performed before
    the next resume and also will minimize power usage in this corner case.
    
    Fixes: 28083ff18d3f ("accel/ivpu: Fix DevTLB errors on suspend/resume and recovery")
    Cc: <stable@vger.kernel.org> # v6.8+
    Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
    Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-5-jacek.lawrynowicz@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

accel/ivpu: Return max freq for DRM_IVPU_PARAM_CORE_CLOCK_RATE [+ + +]

Author: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Date:   Tue Apr 2 12:49:27 2024 +0200

    accel/ivpu: Return max freq for DRM_IVPU_PARAM_CORE_CLOCK_RATE
    
    commit c52c35e5b404b95a5bcff39af9be1b9293be3434 upstream.
    
    DRM_IVPU_PARAM_CORE_CLOCK_RATE returns current NPU frequency which
    could be 0 if device was sleeping. This value isn't really useful to
    the user space, so return max freq instead which can be used to estimate
    NPU performance.
    
    Fixes: c39dc15191c4 ("accel/ivpu: Read clock rate only if device is up")
    Cc: <stable@vger.kernel.org> # v6.7
    Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
    Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-7-jacek.lawrynowicz@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI: bus: allow _UID matching for integer zero [+ + +]

Author: Raag Jadav <raag.jadav@intel.com>
Date:   Thu Mar 28 09:25:40 2024 +0530

    ACPI: bus: allow _UID matching for integer zero
    
    [ Upstream commit aca1a5287ea328fd1f7e2bfa6806646486d86a70 ]
    
    Commit b2b32a173881 ("ACPI: bus: update acpi_dev_hid_uid_match() to
    support multiple types") added _UID matching support for both integer
    and string types, which satisfies NULL @uid2 argument for string types
    using inversion, but this logic prevents _UID comparision in case the
    argument is integer 0, which may result in false positives.
    
    Fix this using _Generic(), which will allow NULL @uid2 argument for
    string types as well as _UID matching for all possible integer values.
    
    Fixes: b2b32a173881 ("ACPI: bus: update acpi_dev_hid_uid_match() to support multiple types")
    Signed-off-by: Raag Jadav <raag.jadav@intel.com>
    [ rjw: Comment adjustment ]
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Fri Mar 8 14:59:23 2024 -0700

    ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes
    
    [ Upstream commit bd98cbbbf82a3086423865816e1b5ab4bb4b6c60 ]
    
    Update acpi_get_genport_coordinates() to allow retrieval of both access
    classes of the 'struct access_coordinate' for a generic target. The update
    will allow CXL code to compute access coordinates for both access class.
    
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/20240308220055.2172956-5-dave.jiang@intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: HMAT: Introduce 2 levels of generic port access class [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Fri Mar 8 14:59:22 2024 -0700

    ACPI: HMAT: Introduce 2 levels of generic port access class
    
    [ Upstream commit 1745a7b364dfd339ab2696b7d51d7ed950ed2598 ]
    
    In order to compute access0 and access1 classes for CXL memory, 2 levels
    of generic port information must be stored. Access0 will indicate the
    generic port access coordinates to the closest initiator and access1
    will indicate the generic port access coordinates to the cloest CPU.
    
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/20240308220055.2172956-4-dave.jiang@intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: scan: Do not increase dep_unmet for already met dependencies [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Sat Apr 6 13:40:52 2024 +0200

    ACPI: scan: Do not increase dep_unmet for already met dependencies
    
    commit d730192ff0246356a2d7e63ff5bd501060670eec upstream.
    
    On the Toshiba Encore WT10-A tablet the BATC battery ACPI device depends
    on 3 other devices:
    
                Name (_DEP, Package (0x03)  // _DEP: Dependencies
                {
                    I2C1,
                    GPO2,
                    GPO0
                })
    
    acpi_scan_check_dep() adds all 3 of these to the acpi_dep_list and then
    before an acpi_device is created for the BATC handle (and thus before
    acpi_scan_dep_init() runs) acpi_scan_clear_dep() gets called for both
    GPIO depenencies, with free_when_met not set for the dependencies.
    
    Since there is no adev for BATC yet, there also is no dep_unmet to
    decrement. The only result of acpi_scan_clear_dep() in this case is
    dep->met getting set.
    
    Soon after acpi_scan_clear_dep() has been called for the GPIO dependencies
    the acpi_device gets created for the BATC handle and acpi_scan_dep_init()
    runs, this sees 3 dependencies on the acpi_dep_list and initializes
    unmet_dep to 3. Later when the dependency for I2C1 is met unmet_dep
    becomes 2, but since the 2 GPIO deps where already met it never becomes 0
    causing battery monitoring to not work.
    
    Fix this by modifying acpi_scan_dep_init() to not increase dep_met for
    dependencies which have already been marked as being met.
    
    Fixes: 3ba12d8de3fa ("ACPI: scan: Reduce overhead related to devices with dependencies")
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Cc: 6.5+ <stable@vger.kernel.org> # 6.5+
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

af_unix: Clear stale u->oob_skb. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Fri Apr 5 15:10:57 2024 -0700

    af_unix: Clear stale u->oob_skb.
    
    [ Upstream commit b46f4eaa4f0ec38909fb0072eea3aeddb32f954e ]
    
    syzkaller started to report deadlock of unix_gc_lock after commit
    4090fa373f0e ("af_unix: Replace garbage collection algorithm."), but
    it just uncovers the bug that has been there since commit 314001f0bf92
    ("af_unix: Add OOB support").
    
    The repro basically does the following.
    
      from socket import *
      from array import array
    
      c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
      c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
      c2.recv(1)  # blocked as no normal data in recv queue
    
      c2.close()  # done async and unblock recv()
      c1.close()  # done async and trigger GC
    
    A socket sends its file descriptor to itself as OOB data and tries to
    receive normal data, but finally recv() fails due to async close().
    
    The problem here is wrong handling of OOB skb in manage_oob().  When
    recvmsg() is called without MSG_OOB, manage_oob() is called to check
    if the peeked skb is OOB skb.  In such a case, manage_oob() pops it
    out of the receive queue but does not clear unix_sock(sk)->oob_skb.
    This is wrong in terms of uAPI.
    
    Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
    The 'o' is handled as OOB data.  When recv() is called twice without
    MSG_OOB, the OOB data should be lost.
    
      >>> from socket import *
      >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
      >>> c1.send(b'hello', MSG_OOB)  # 'o' is OOB data
      5
      >>> c1.send(b'world')
      5
      >>> c2.recv(5)  # OOB data is not received
      b'hell'
      >>> c2.recv(5)  # OOB date is skipped
      b'world'
      >>> c2.recv(5, MSG_OOB)  # This should return an error
      b'o'
    
    In the same situation, TCP actually returns -EINVAL for the last
    recv().
    
    Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
    EPOLLPRI even though the data has passed through by previous recv().
    
    To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
    it from recv queue.
    
    The reason why the old GC did not trigger the deadlock is because the
    old GC relied on the receive queue to detect the loop.
    
    When it is triggered, the socket with OOB data is marked as GC candidate
    because file refcount == inflight count (1).  However, after traversing
    all inflight sockets, the socket still has a positive inflight count (1),
    thus the socket is excluded from candidates.  Then, the old GC lose the
    chance to garbage-collect the socket.
    
    With the old GC, the repro continues to create true garbage that will
    never be freed nor detected by kmemleak as it's linked to the global
    inflight list.  That's why we couldn't even notice the issue.
    
    Fixes: 314001f0bf92 ("af_unix: Add OOB support")
    Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240405221057.2406-1-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

af_unix: Do not use atomic ops for unix_sk(sk)->inflight. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Tue Jan 23 09:08:53 2024 -0800

    af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
    
    [ Upstream commit 97af84a6bba2ab2b9c704c08e67de3b5ea551bb2 ]
    
    When touching unix_sk(sk)->inflight, we are always under
    spin_lock(&unix_gc_lock).
    
    Let's convert unix_sk(sk)->inflight to the normal unsigned long.
    
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

af_unix: Fix garbage collector racing against connect() [+ + +]

Author: Michal Luczaj <mhal@rbox.co>
Date:   Tue Apr 9 22:09:39 2024 +0200

    af_unix: Fix garbage collector racing against connect()
    
    [ Upstream commit 47d8ac011fe1c9251070e1bd64cb10b48193ec51 ]
    
    Garbage collector does not take into account the risk of embryo getting
    enqueued during the garbage collection. If such embryo has a peer that
    carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
    different set of children. Leading to an incorrectly elevated inflight
    count, and then a dangling pointer within the gc_inflight_list.
    
    sockets are AF_UNIX/SOCK_STREAM
    S is an unconnected socket
    L is a listening in-flight socket bound to addr, not in fdtable
    V's fd will be passed via sendmsg(), gets inflight count bumped
    
    connect(S, addr)        sendmsg(S, [V]); close(V)       __unix_gc()
    ----------------        -------------------------       -----------
    
    NS = unix_create1()
    skb1 = sock_wmalloc(NS)
    L = unix_find_other(addr)
    unix_state_lock(L)
    unix_peer(S) = NS
                            // V count=1 inflight=0
    
                            NS = unix_peer(S)
                            skb2 = sock_alloc()
                            skb_queue_tail(NS, skb2[V])
    
                            // V became in-flight
                            // V count=2 inflight=1
    
                            close(V)
    
                            // V count=1 inflight=1
                            // GC candidate condition met
    
                                                    for u in gc_inflight_list:
                                                      if (total_refs == inflight_refs)
                                                        add u to gc_candidates
    
                                                    // gc_candidates={L, V}
    
                                                    for u in gc_candidates:
                                                      scan_children(u, dec_inflight)
    
                                                    // embryo (skb1) was not
                                                    // reachable from L yet, so V's
                                                    // inflight remains unchanged
    __skb_queue_tail(L, skb1)
    unix_state_unlock(L)
                                                    for u in gc_candidates:
                                                      if (u.inflight)
                                                        scan_children(u, inc_inflight_move_tail)
    
                                                    // V count=1 inflight=2 (!)
    
    If there is a GC-candidate listening socket, lock/unlock its state. This
    makes GC wait until the end of any ongoing connect() to that socket. After
    flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
    there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
    this point, unix_inflight() can not happen because unix_gc_lock is already
    taken. Inflight graph remains unaffected.
    
    Fixes: 1fd05ba5a2f2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
    Signed-off-by: Michal Luczaj <mhal@rbox.co>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator [+ + +]

Author: Tim Harvey <tharvey@gateworks.com>
Date:   Wed Feb 28 12:02:15 2024 -0800

    arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
    
    [ Upstream commit 8cb10cba124c4798b6cb333245ecdc8dde78aeae ]
    
    When using usb-conn-gpio to control USB role and VBUS, the vbus-supply
    property must be present in the usb-conn-gpio node. Additionally it
    should not be present in the phy node as that isn't what controls vbus
    and will upset the use count.
    
    This resolves an issue where VBUS is enabled with OTG in peripheral
    mode.
    
    Fixes: ad9a12f7a522 ("arm64: dts: imx8mp-venice: Fix USB connector description")
    Signed-off-by: Tim Harvey <tharvey@gateworks.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator [+ + +]

Author: Tim Harvey <tharvey@gateworks.com>
Date:   Wed Feb 28 12:02:16 2024 -0800

    arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator
    
    [ Upstream commit 6f8e0aca838e163e81fde176e945161d50679339 ]
    
    When using usb-conn-gpio to control USB role and VBUS, the vbus-supply
    property must be present in the usb-conn-gpio node. Additionally it
    should not be present in the phy node as that isn't what controls vbus
    and will upset the use count.
    
    This resolves an issue where VBUS is enabled with OTG in peripheral
    mode.
    
    Fixes: ad9a12f7a522 ("arm64: dts: imx8mp-venice: Fix USB connector description")
    Signed-off-by: Tim Harvey <tharvey@gateworks.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: imx8-ss-conn: fix usb lpcg indices [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Mon Apr 1 18:25:04 2024 -0400

    arm64: dts: imx8-ss-conn: fix usb lpcg indices
    
    commit 808e7716edcdb39d3498b9f567ef6017858b49aa upstream.
    
    usb2_lpcg: clock-controller@5b270000 {
            ...                                                    Col1  Col2
            clocks = <&conn_ahb_clk>, <&conn_ipg_clk>;           // 0     6
            clock-indices = <IMX_LPCG_CLK_6>, <IMX_LPCG_CLK_7>;  // 0     7
            ...
    };
    
    Col1: index, which existing dts try to get.
    Col2: actual index in lpcg driver.
    
    usbotg1: usb@5b0d0000 {
            ...
            clocks = <&usb2_lpcg 0>;
                                 ^^
    Should be:
            clocks = <&usb2_lpcg IMX_LPCG_CLK_6>;
    };
    
    usbphy1: usbphy@5b100000 {
            clocks = <&usb2_lpcg 1>;
                                 ^^
    SHould be:
            clocks = <&usb2_lpcg IMX_LPCG_CLK_7>;
    };
    
    Arg0 is divided by 4 in lpcg driver. So lpcg will do dummy enable. Fix it
    by use correct clock indices.
    
    Cc: stable@vger.kernel.org
    Fixes: 8065fc937f0f ("arm64: dts: imx8dxl: add usb1 and usb2 support")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Fri Mar 22 12:47:05 2024 -0400

    arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
    
    [ Upstream commit c6ddd6e7b166532a0816825442ff60f70aed9647 ]
    
    The actual clock show wrong frequency:
    
       echo on >/sys/devices/platform/bus\@5b000000/5b010000.mmc/power/control
       cat /sys/kernel/debug/mmc0/ios
    
       clock:          200000000 Hz
       actual clock:   166000000 Hz
                       ^^^^^^^^^
       .....
    
    According to
    
    sdhc0_lpcg: clock-controller@5b200000 {
                    compatible = "fsl,imx8qxp-lpcg";
                    reg = <0x5b200000 0x10000>;
                    #clock-cells = <1>;
                    clocks = <&clk IMX_SC_R_SDHC_0 IMX_SC_PM_CLK_PER>,
                             <&conn_ipg_clk>, <&conn_axi_clk>;
                    clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>,
                                    <IMX_LPCG_CLK_5>;
                    clock-output-names = "sdhc0_lpcg_per_clk",
                                         "sdhc0_lpcg_ipg_clk",
                                         "sdhc0_lpcg_ahb_clk";
                    power-domains = <&pd IMX_SC_R_SDHC_0>;
            }
    
    "per_clk" should be IMX_LPCG_CLK_0 instead of IMX_LPCG_CLK_5.
    
    After correct clocks order:
    
       echo on >/sys/devices/platform/bus\@5b000000/5b010000.mmc/power/control
       cat /sys/kernel/debug/mmc0/ios
    
       clock:          200000000 Hz
       actual clock:   198000000 Hz
                       ^^^^^^^^
       ...
    
    Fixes: 16c4ea7501b1 ("arm64: dts: imx8: switch to new lpcg clock binding")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: imx8-ss-dma: fix adc lpcg indices [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Mon Apr 1 18:25:07 2024 -0400

    arm64: dts: imx8-ss-dma: fix adc lpcg indices
    
    commit 81975080f14167610976e968e8016e92d836266f upstream.
    
    adc0_lpcg: clock-controller@5ac80000 {
            ...                                                 Col1   Col2
            clocks = <&clk IMX_SC_R_ADC_0 IMX_SC_PM_CLK_PER>, // 0      0
                     <&dma_ipg_clk>;                          // 1      4
            clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>;
    };
    
    Col1: index, which existing dts try to get.
    Col2: actual index in lpcg driver.
    
    adc0: adc@5a880000 {
            clocks = <&adc0_lpcg 0>, <&adc0_lpcg 1>;
                                 ^^              ^^
            clocks = <&adc0_lpcg IMX_LPCG_CLK_0>, <&adc0_lpcg IMX_LPCG_CLK_4>;
    
    Arg0 is divided by 4 in lpcg driver. So adc get IMX_SC_PM_CLK_PER by
    <&adc0_lpcg 0>, <&adc0_lpcg 1>. Although function can work, code logic is
    wrong. Fix it by using correct indices.
    
    Cc: stable@vger.kernel.org
    Fixes: 1db044b25d2e ("arm64: dts: imx8dxl: add adc0 support")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: imx8-ss-dma: fix can lpcg indices [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Mon Apr 1 18:25:08 2024 -0400

    arm64: dts: imx8-ss-dma: fix can lpcg indices
    
    commit 0893392334b5dffdf616a53679c6a2942c46391b upstream.
    
    can0_lpcg: clock-controller@5acd0000 {
            ...                                                Col1  Col2
            clocks = <&clk IMX_SC_R_CAN_0 IMX_SC_PM_CLK_PER>, // 0    0
                     <&dma_ipg_clk>,                          // 1    4
                     <&dma_ipg_clk>;                          // 2    5
            clock-indices = <IMX_LPCG_CLK_0>,
                            <IMX_LPCG_CLK_4>,
                            <IMX_LPCG_CLK_5>;
    }
    
    Col1: index, which existing dts try to get.
    Col2: actual index in lpcg driver.
    
    flexcan1: can@5a8d0000 {
            clocks = <&can0_lpcg 1>, <&can0_lpcg 0>;
                                 ^^              ^^
    Should be:
            clocks = <&can0_lpcg IMX_LPCG_CLK_4>, <&can0_lpcg IMX_LPCG_CLK_0>;
    };
    
    Arg0 is divided by 4 in lpcg driver. flexcan driver get IMX_SC_PM_CLK_PER
    by <&can0_lpcg 1> and <&can0_lpcg 0>. Although function can work, code
    logic is wrong. Fix it by using correct clock indices.
    
    Cc: stable@vger.kernel.org
    Fixes: 5e7d5b023e03 ("arm64: dts: imx8qxp: add flexcan in adma")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: imx8-ss-dma: fix pwm lpcg indices [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Mon Apr 1 18:25:06 2024 -0400

    arm64: dts: imx8-ss-dma: fix pwm lpcg indices
    
    commit 9055d87bce7276234173fa90e9702af31b3f5353 upstream.
    
    adma_pwm_lpcg: clock-controller@5a590000 {
            ...                                                      col1 col2
            clocks = <&clk IMX_SC_R_LCD_0_PWM_0 IMX_SC_PM_CLK_PER>,// 0   0
                     <&dma_ipg_clk>;                               // 1   4
            clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>;
            ...
    };
    
    Col1: index, which existing dts try to get.
    Col2: actual index in lpcg driver.
    
    adma_pwm: pwm@5a190000 {
            ...
            clocks = <&adma_pwm_lpcg 1>, <&adma_pwm_lpcg 0>;
                                     ^^                  ^^
    Should be
            clocks = <&adma_pwm_lpcg IMX_LPCG_CLK_4>,
                     <&adma_pwm_lpcg IMX_LPCG_CLK_0>;
    };
    
    Arg0 will be divided by 4 in lcpg driver, so pwm will get IMX_SC_PM_CLK_PER
    by <&adma_pwm_lpcg 1>, <&adma_pwm_lpcg 0>. Although function can work, code
    logic is wrong. Fix it by use correct indices.
    
    Cc: stable@vger.kernel.org
    Fixes: f1d6a6b991ef ("arm64: dts: imx8qxp: add adma_pwm in adma")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: imx8-ss-dma: fix spi lpcg indices [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Mon Apr 1 18:25:05 2024 -0400

    arm64: dts: imx8-ss-dma: fix spi lpcg indices
    
    commit f72b544a514c07d34a0d9d5380f5905b3731e647 upstream.
    
    spi0_lpcg: clock-controller@5a400000 {
            ...                                                  Col0   Col1
            clocks = <&clk IMX_SC_R_SPI_0 IMX_SC_PM_CLK_PER>,//   0      1
                     <&dma_ipg_clk>;                         //   1      4
            clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>;
    };
    
    Col1: index, which existing dts try to get.
    Col2: actual index in lpcg driver.
    
    lpspi0: spi@5a000000 {
            ...
            clocks = <&spi0_lpcg 0>, <&spi0_lpcg 1>;
                                 ^               ^
    Should be:
            clocks = <&spi0_lpcg IMX_LPCG_CLK_0>, <&spi0_lpcg IMX_LPCG_CLK_4>;
    };
    
    Arg0 is divided by 4 in lpcg driver. <&spi0_lpcg 0> and <&spi0_lpcg 1> are
    IMX_SC_PM_CLK_PER. Although code can work, code logic is wrong. It should
    use IMX_LPCG_CLK_0 and IMX_LPCG_CLK_4 for lpcg arg0.
    
    Cc: stable@vger.kernel.org
    Fixes: c4098885e790 ("arm64: dts: imx8dxl: add lpspi support")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: imx8-ss-lsio: fix pwm lpcg indices [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Mon Apr 1 18:25:03 2024 -0400

    arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
    
    commit 1d86c2b3946e69d6b0b93568d312aae6247847c0 upstream.
    
    lpcg's arg0 should use clock indices instead of index.
    
    pwm0_lpcg: clock-controller@5d400000 {
            ...                                                // Col1  Col2
            clocks = <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>,  // 0     0
                     <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>,  // 1     1
                     <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>,  // 2     4
                     <&lsio_bus_clk>,                          // 3     5
                     <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>;  // 4     6
            clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_1>,
                            <IMX_LPCG_CLK_4>, <IMX_LPCG_CLK_5>,
                            <IMX_LPCG_CLK_6>;
    };
    
    Col1: index, which existing dts try to get.
    Col2: actual index in lpcg driver.
    
    pwm1 {
            ....
            clocks = <&pwm1_lpcg 4>, <&pwm1_lpcg 1>;
                                 ^^              ^^
    should be:
    
            clocks = <&pwm1_lpcg IMX_LPCG_CLK_6>, <&pwm1_lpcg IMX_LPCG_CLK_1>;
    };
    
    Arg0 is divided by 4 in lpcg driver, so index 0 and 1 will be get by pwm
    driver, which are same as IMX_LPCG_CLK_6 and IMX_LPCG_CLK_1. Even it can
    work, but code logic is wrong. Fixed it by use correct indices.
    
    Cc: stable@vger.kernel.org
    Fixes: 23fa99b205ea ("arm64: dts: freescale: imx8-ss-lsio: add support for lsio_pwm0-3")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: imx8qm-ss-dma: fix can lpcg indices [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Mon Apr 1 18:25:09 2024 -0400

    arm64: dts: imx8qm-ss-dma: fix can lpcg indices
    
    commit 00b436182138310bb8d362b912b12a9df8f72ca3 upstream.
    
    can1_lpcg: clock-controller@5ace0000 {
            ...                                                 Col1   Col2
            clocks = <&clk IMX_SC_R_CAN_1 IMX_SC_PM_CLK_PER>,//  0       0
                     <&dma_ipg_clk>,                         //  1       4
                     <&dma_ipg_clk>;                         //  2       5
            clock-indices = <IMX_LPCG_CLK_0>,
                            <IMX_LPCG_CLK_4>,
                            <IMX_LPCG_CLK_5>;
    };
    
    Col1: index, which existing dts try to get.
    Col2: actual index in lpcg driver
    
    &flexcan2 {
            clocks = <&can1_lpcg 1>, <&can1_lpcg 0>;
                                 ^^              ^^
    Should be:
            clocks = <&can1_lpcg IMX_LPCG_CLK_4>, <&can1_lpcg IMX_LPCG_CLK_0>;
    };
    
    Arg0 is divided by 4 in lpcg driver. So flexcan get IMX_SC_PM_CLK_PER by
    <&can1_lpcg 1> and <&can1_lpcg 0>. Although function work, code logic is
    wrong. Fix it by using correct clock indices.
    
    Cc: stable@vger.kernel.org
    Fixes: be85831de020 ("arm64: dts: imx8qm: add can node in devicetree")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: tlb: Fix TLBI RANGE operand [+ + +]

Author: Gavin Shan <gshan@redhat.com>
Date:   Fri Apr 5 13:58:50 2024 +1000

    arm64: tlb: Fix TLBI RANGE operand
    
    commit e3ba51ab24fddef79fc212f9840de54db8fd1685 upstream.
    
    KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
    pages are collected by VMM and the page table entries become write
    protected during live migration. Unfortunately, the operand passed
    to the TLBI RANGE instruction isn't correctly sorted out due to the
    commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()").
    It leads to crash on the destination VM after live migration because
    TLBs aren't flushed completely and some of the dirty pages are missed.
    
    For example, I have a VM where 8GB memory is assigned, starting from
    0x40000000 (1GB). Note that the host has 4KB as the base page size.
    In the middile of migration, kvm_tlb_flush_vmid_range() is executed
    to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to
    __kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3
    and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported
    by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned
    from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop
    in the __flush_tlb_range_op() until the variable @scale underflows
    and becomes -9, 0xffff708000040000 is set as the operand. The operand
    is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to
    invalid @scale and @num.
    
    Fix it by extending __TLBI_RANGE_NUM() to support the combination of
    SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can
    be returned from the macro, meaning the TLBs for 0x200000 pages in the
    above example can be flushed in one shoot with SCALE#3 and NUM#31. The
    macro TLBI_RANGE_MASK is dropped since no one uses it any more. The
    comments are also adjusted accordingly.
    
    Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()")
    Cc: stable@kernel.org # v6.6+
    Reported-by: Yihuang Yu <yihyu@redhat.com>
    Suggested-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Gavin Shan <gshan@redhat.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
    Link: https://lore.kernel.org/r/20240405035852.1532010-2-gshan@redhat.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: imx7s-warp: Pass OV2680 link-frequencies [+ + +]

Author: Fabio Estevam <festevam@denx.de>
Date:   Thu Mar 28 12:19:54 2024 -0300

    ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
    
    commit 135f218255b28c5bbf71e9e32a49e5c734cabbe5 upstream.
    
    Since commit 63b0cd30b78e ("media: ov2680: Add bus-cfg / endpoint
    property verification") the ov2680 no longer probes on a imx7s-warp7:
    
    ov2680 1-0036: error -EINVAL: supported link freq 330000000 not found
    ov2680 1-0036: probe with driver ov2680 failed with error -22
    
    Fix it by passing the required 'link-frequencies' property as
    recommended by:
    
    https://www.kernel.org/doc/html/v6.9-rc1/driver-api/media/camera-sensor.html#handling-clocks
    
    Cc: stable@vger.kernel.org
    Fixes: 63b0cd30b78e ("media: ov2680: Add bus-cfg / endpoint property verification")
    Signed-off-by: Fabio Estevam <festevam@denx.de>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0 [+ + +]

Author: Aaro Koskinen <aaro.koskinen@iki.fi>
Date:   Fri Feb 23 20:14:35 2024 +0200

    ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0
    
    [ Upstream commit 95f37eb52e18879a1b16e51b972d992b39e50a81 ]
    
    The GPIO bank width is 32 on OMAP2, so all labels are incorrect.
    
    Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors")
    Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
    Message-ID: <20240223181439.1099750-2-aaro.koskinen@iki.fi>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Tony Lindgren <tony@atomide.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: OMAP2+: fix N810 MMC gpiod table [+ + +]

Author: Aaro Koskinen <aaro.koskinen@iki.fi>
Date:   Fri Feb 23 20:14:36 2024 +0200

    ARM: OMAP2+: fix N810 MMC gpiod table
    
    [ Upstream commit 480d44d0820dd5ae043dc97c0b46dabbe53cb1cf ]
    
    Trying to append a second table for the same dev_id doesn't seem to work.
    The second table is just silently ignored. As a result eMMC GPIOs are not
    present.
    
    Fix by using separate tables for N800 and N810.
    
    Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors")
    Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
    Message-ID: <20240223181439.1099750-3-aaro.koskinen@iki.fi>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Tony Lindgren <tony@atomide.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: OMAP2+: fix USB regression on Nokia N8x0 [+ + +]

Author: Aaro Koskinen <aaro.koskinen@iki.fi>
Date:   Fri Feb 23 20:16:56 2024 +0200

    ARM: OMAP2+: fix USB regression on Nokia N8x0
    
    [ Upstream commit 4421405e3634a3189b541cf1e34598e44260720d ]
    
    GPIO chip labels are wrong for OMAP2, so the USB does not work. Fix.
    
    Fixes: 8e0285ab95a9 ("ARM/musb: omap2: Remove global GPIO numbers from TUSB6010")
    Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Message-ID: <20240223181656.1099845-1-aaro.koskinen@iki.fi>
    Signed-off-by: Tony Lindgren <tony@atomide.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ata: libata-core: Allow command duration limits detection for ACS-4 drives [+ + +]

Author: Igor Pylypiv <ipylypiv@google.com>
Date:   Thu Apr 11 20:12:24 2024 +0000

    ata: libata-core: Allow command duration limits detection for ACS-4 drives
    
    commit c0297e7dd50795d559f3534887a6de1756b35d0f upstream.
    
    Even though the command duration limits (CDL) feature was first added
    in ACS-5 (major version 12), there are some ACS-4 (major version 11)
    drives that implement CDL as well.
    
    IDENTIFY_DEVICE, SUPPORTED_CAPABILITIES, and CURRENT_SETTINGS log pages
    are mandatory in the ACS-4 standard so it should be safe to read these
    log pages on older drives implementing the ACS-4 standard.
    
    Fixes: 62e4a60e0cdb ("scsi: ata: libata: Detect support for command duration limits")
    Cc: stable@vger.kernel.org
    Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ata: libata-scsi: Fix ata_scsi_dev_rescan() error path [+ + +]

Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Fri Apr 12 08:41:15 2024 +0900

    ata: libata-scsi: Fix ata_scsi_dev_rescan() error path
    
    commit 79336504781e7fee5ddaf046dcc186c8dfdf60b1 upstream.
    
    Commit 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
    incorrectly handles failures of scsi_resume_device() in
    ata_scsi_dev_rescan(), leading to a double call to
    spin_unlock_irqrestore() to unlock a device port. Fix this by redefining
    the goto labels used in case of errors and only unlock the port
    scsi_scan_mutex when scsi_resume_device() fails.
    
    Bug found with the Smatch static checker warning:
    
            drivers/ata/libata-scsi.c:4774 ata_scsi_dev_rescan()
            error: double unlocked 'ap->lock' (orig line 4757)
    
    Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
    Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
    Cc: stable@vger.kernel.org
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Niklas Cassel <cassel@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

base/node / ACPI: Enumerate node access class for 'struct access_coordinate' [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Fri Mar 8 14:59:21 2024 -0700

    base/node / ACPI: Enumerate node access class for 'struct access_coordinate'
    
    [ Upstream commit 11270e526276ffad4c4237acb393da82a3287487 ]
    
    Both generic node and HMAT handling code have been using magic numbers to
    indicate access classes for 'struct access_coordinate'. Introduce enums to
    enumerate the access0 and access1 classes shared by the two subsystems.
    Update the function parameters and callers as appropriate to utilize the
    new enum.
    
    Access0 is named to ACCESS_COORDINATE_LOCAL in order to indicate that the
    access class is for 'struct access_coordinate' between a target node and
    the nearest initiator node.
    
    Access1 is named to ACCESS_COORDINATE_CPU in order to indicate that the
    access class is for 'struct access_coordinate' between a target node and
    the nearest CPU node.
    
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/20240308220055.2172956-3-dave.jiang@intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: Avoid infinite loop trying to resize local TT [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Mon Feb 12 13:58:33 2024 +0100

    batman-adv: Avoid infinite loop trying to resize local TT
    
    commit b1f532a3b1e6d2e5559c7ace49322922637a28aa upstream.
    
    If the MTU of one of an attached interface becomes too small to transmit
    the local translation table then it must be resized to fit inside all
    fragments (when enabled) or a single packet.
    
    But if the MTU becomes too low to transmit even the header + the VLAN
    specific part then the resizing of the local TT will never succeed. This
    can for example happen when the usable space is 110 bytes and 11 VLANs are
    on top of batman-adv. In this case, at least 116 byte would be needed.
    There will just be an endless spam of
    
       batman_adv: batadv0: Forced to purge local tt entries to fit new maximum fragment MTU (110)
    
    in the log but the function will never finish. Problem here is that the
    timeout will be halved all the time and will then stagnate at 0 and
    therefore never be able to reduce the table even more.
    
    There are other scenarios possible with a similar result. The number of
    BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too
    high to fit inside a packet. Such a scenario can therefore happen also with
    only a single VLAN + 7 non-purgable addresses - requiring at least 120
    bytes.
    
    While this should be handled proactively when:
    
    * interface with too low MTU is added
    * VLAN is added
    * non-purgeable local mac is added
    * MTU of an attached interface is reduced
    * fragmentation setting gets disabled (which most likely requires dropping
      attached interfaces)
    
    not all of these scenarios can be prevented because batman-adv is only
    consuming events without the the possibility to prevent these actions
    (non-purgable MAC address added, MTU of an attached interface is reduced).
    It is therefore necessary to also make sure that the code is able to handle
    also the situations when there were already incompatible system
    configuration are present.
    
    Cc: stable@vger.kernel.org
    Fixes: a19d3d85e1b8 ("batman-adv: limit local translation table max size")
    Reported-by: syzbot+a6a4b5bb3da165594cff@syzkaller.appspotmail.com
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: fix q->blkg_list corruption during disk rebind [+ + +]

Author: Ming Lei <ming.lei@redhat.com>
Date:   Sun Apr 7 20:59:10 2024 +0800

    block: fix q->blkg_list corruption during disk rebind
    
    [ Upstream commit 8b8ace080319a866f5dfe9da8e665ae51d971c54 ]
    
    Multiple gendisk instances can allocated/added for single request queue
    in case of disk rebind. blkg may still stay in q->blkg_list when calling
    blkcg_init_disk() for rebind, then q->blkg_list becomes corrupted.
    
    Fix the list corruption issue by:
    
    - add blkg_init_queue() to initialize q->blkg_list & q->blkcg_mutex only
    - move calling blkg_init_queue() into blk_alloc_queue()
    
    The list corruption should be started since commit f1c006f1c685 ("blk-cgroup:
    synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
    which delays removing blkg from q->blkg_list into blkg_free_workfn().
    
    Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
    Fixes: 1059699f87eb ("block: move blkcg initialization/destroy into disk allocation/release handler")
    Cc: Yu Kuai <yukuai3@huawei.com>
    Cc: Tejun Heo <tj@kernel.org>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Yu Kuai <yukuai3@huawei.com>
    Link: https://lore.kernel.org/r/20240407125910.4053377-1-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: Fix memory leak in hci_req_sync_complete() [+ + +]

Author: Dmitry Antipov <dmantipov@yandex.ru>
Date:   Tue Apr 2 14:32:05 2024 +0300

    Bluetooth: Fix memory leak in hci_req_sync_complete()
    
    commit 45d355a926ab40f3ae7bc0b0a00cb0e3e8a5a810 upstream.
    
    In 'hci_req_sync_complete()', always free the previous sync
    request state before assigning reference to a new one.
    
    Reported-by: syzbot+39ec16ff6cc18b1d066d@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=39ec16ff6cc18b1d066d
    Cc: stable@vger.kernel.org
    Fixes: f60cb30579d3 ("Bluetooth: Convert hci_req_sync family of function to new request API")
    Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: hci_sock: Fix not validating setsockopt user input [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Fri Apr 5 16:46:50 2024 -0400

    Bluetooth: hci_sock: Fix not validating setsockopt user input
    
    [ Upstream commit b2186061d6043d6345a97100460363e990af0d46 ]
    
    Check user input length before copying data.
    
    Fixes: 09572fca7223 ("Bluetooth: hci_sock: Add support for BT_{SND,RCV}BUF")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Thu Mar 28 15:58:10 2024 -0400

    Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY
    
    [ Upstream commit 53cb4197e63ab2363aa28c3029061e4d516e7626 ]
    
    Coded PHY recommended intervals are 3 time bigger than the 1M PHY so
    this aligns with that by multiplying by 3 the values given to 1M PHY
    since the code already used recommended values for that.
    
    Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: hci_sync: Use QoS to determine which PHY to scan [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Wed Feb 21 09:38:10 2024 -0500

    Bluetooth: hci_sync: Use QoS to determine which PHY to scan
    
    [ Upstream commit 22cbf4f84c00da64196eb15034feee868e63eef0 ]
    
    This used the hci_conn QoS to determine which PHY to scan when creating
    a PA Sync.
    
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Stable-dep-of: 53cb4197e63a ("Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: ISO: Align broadcast sync_timeout with connection timeout [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Thu Mar 7 11:58:17 2024 -0500

    Bluetooth: ISO: Align broadcast sync_timeout with connection timeout
    
    [ Upstream commit 42ed95de82c01184a88945d3ca274be6a7ea607d ]
    
    This aligns broadcast sync_timeout with existing connection timeouts
    which are 20 seconds long.
    
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Stable-dep-of: b37cab587aa3 ("Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Wed Mar 13 15:43:18 2024 -0400

    Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset
    
    [ Upstream commit b37cab587aa3c9ab29c6b10aa55627dad713011f ]
    
    Consider certain values (0x00) as unset and load proper default if
    an application has not set them properly.
    
    Fixes: 0fe8c8d07134 ("Bluetooth: Split bt_iso_qos into dedicated structures")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: ISO: Fix not validating setsockopt user input [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Fri Apr 5 15:56:50 2024 -0400

    Bluetooth: ISO: Fix not validating setsockopt user input
    
    [ Upstream commit 9e8742cdfc4b0e65266bb4a901a19462bda9285e ]
    
    Check user input length before copying data.
    
    Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
    Fixes: 0731c5ab4d51 ("Bluetooth: ISO: Add support for BT_PKT_STATUS")
    Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit [+ + +]

Author: Archie Pusaka <apusaka@chromium.org>
Date:   Thu Apr 4 18:50:23 2024 +0800

    Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit
    
    [ Upstream commit 600b0bbe73d3a9a264694da0e4c2c0800309141e ]
    
    The bit is set and tested inside mgmt_device_connected(), therefore we
    must not set it just outside the function.
    
    Fixes: eeda1bf97bb5 ("Bluetooth: hci_event: Fix not indicating new connection for BIG Sync")
    Signed-off-by: Archie Pusaka <apusaka@chromium.org>
    Reviewed-by: Manish Mandlik <mmandlik@chromium.org>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: L2CAP: Fix not validating setsockopt user input [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Fri Apr 5 15:50:47 2024 -0400

    Bluetooth: L2CAP: Fix not validating setsockopt user input
    
    [ Upstream commit 4f3951242ace5efc7131932e2e01e6ac6baed846 ]
    
    Check user input length before copying data.
    
    Fixes: 33575df7be67 ("Bluetooth: move l2cap_sock_setsockopt() to l2cap_sock.c")
    Fixes: 3ee7b7cd8390 ("Bluetooth: Add BT_MODE socket option")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: RFCOMM: Fix not validating setsockopt user input [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Fri Apr 5 15:43:45 2024 -0400

    Bluetooth: RFCOMM: Fix not validating setsockopt user input
    
    [ Upstream commit a97de7bff13b1cc825c1b1344eaed8d6c2d3e695 ]
    
    syzbot reported rfcomm_sock_setsockopt_old() is copying data without
    checking user input length.
    
    BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset
    include/linux/sockptr.h:49 [inline]
    BUG: KASAN: slab-out-of-bounds in copy_from_sockptr
    include/linux/sockptr.h:55 [inline]
    BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt_old
    net/bluetooth/rfcomm/sock.c:632 [inline]
    BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt+0x893/0xa70
    net/bluetooth/rfcomm/sock.c:673
    Read of size 4 at addr ffff8880209a8bc3 by task syz-executor632/5064
    
    Fixes: 9f2c8a03fbb3 ("Bluetooth: Replace RFCOMM link mode with security level")
    Fixes: bb23c0ab8246 ("Bluetooth: Add support for deferring RFCOMM connection setup")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: SCO: Fix not validating setsockopt user input [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Fri Apr 5 15:41:52 2024 -0400

    Bluetooth: SCO: Fix not validating setsockopt user input
    
    [ Upstream commit 51eda36d33e43201e7a4fd35232e069b2c850b01 ]
    
    syzbot reported sco_sock_setsockopt() is copying data without
    checking user input length.
    
    BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset
    include/linux/sockptr.h:49 [inline]
    BUG: KASAN: slab-out-of-bounds in copy_from_sockptr
    include/linux/sockptr.h:55 [inline]
    BUG: KASAN: slab-out-of-bounds in sco_sock_setsockopt+0xc0b/0xf90
    net/bluetooth/sco.c:893
    Read of size 4 at addr ffff88805f7b15a3 by task syz-executor.5/12578
    
    Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option")
    Fixes: b96e9c671b05 ("Bluetooth: Add BT_DEFER_SETUP option to sco socket")
    Fixes: 00398e1d5183 ("Bluetooth: Add support for BT_PKT_STATUS CMSG data for SCO connections")
    Fixes: f6873401a608 ("Bluetooth: Allow setting of codec for HFP offload use case")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bnxt_en: Fix error recovery for RoCE ulp client [+ + +]

Author: Vikas Gupta <vikas.gupta@broadcom.com>
Date:   Fri Apr 5 16:55:12 2024 -0700

    bnxt_en: Fix error recovery for RoCE ulp client
    
    [ Upstream commit b5ea7d33ba2a42b95b4298d08d2af9cdeeaf0090 ]
    
    Since runtime MSIXs vector allocation/free has been removed,
    the L2 driver needs to repopulate the MSIX entries for the
    ulp client as the irq table may change during the recovery
    process.
    
    Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation")
    Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
    Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
    Signed-off-by: Michael Chan <michael.chan@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init() [+ + +]

Author: Vikas Gupta <vikas.gupta@broadcom.com>
Date:   Fri Apr 5 16:55:11 2024 -0700

    bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()
    
    [ Upstream commit 7ac10c7d728d75bc9daaa8fade3c7a3273b9a9ff ]
    
    If ulp = kzalloc() fails, the allocated edev will leak because it is
    not properly assigned and the cleanup path will not be able to free it.
    Fix it by assigning it properly immediately after allocation.
    
    Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation")
    Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
    Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
    Signed-off-by: Michael Chan <michael.chan@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bnxt_en: Reset PTP tx_avail after possible firmware reset [+ + +]

Author: Pavan Chebbi <pavan.chebbi@broadcom.com>
Date:   Fri Apr 5 16:55:13 2024 -0700

    bnxt_en: Reset PTP tx_avail after possible firmware reset
    
    [ Upstream commit faa12ca245585379d612736a4b5e98e88481ea59 ]
    
    It is possible that during error recovery and firmware reset,
    there is a pending TX PTP packet waiting for the timestamp.
    We need to reset this condition so that after recovery, the
    tx_avail count for PTP is reset back to the initial value.
    Otherwise, we may not accept any PTP TX timestamps after
    recovery.
    
    Fixes: 118612d519d8 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods")
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
    Signed-off-by: Michael Chan <michael.chan@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans [+ + +]

Author: Boris Burkov <boris@bur.io>
Date:   Thu Mar 21 10:18:39 2024 -0700

    btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans
    
    commit 211de93367304ab395357f8cb12568a4d1e20701 upstream.
    
    The transaction is only able to free PERTRANS reservations for a root
    once that root has been recorded with the TRANS tag on the roots radix
    tree. Therefore, until we are sure that this root will get tagged, it
    isn't safe to convert. Generally, this is not an issue as *some*
    transaction will likely tag the root before long and this reservation
    will get freed in that transaction, but technically it could stick
    around until unmount and result in a warning about leaked metadata
    reservation space.
    
    This path is most exercised by running the generic/269 fstest with
    CONFIG_BTRFS_DEBUG.
    
    Fixes: a6496849671a ("btrfs: fix start transaction qgroup rsv double free")
    CC: stable@vger.kernel.org # 6.6+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Boris Burkov <boris@bur.io>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: qgroup: correctly model root qgroup rsv in convert [+ + +]

Author: Boris Burkov <boris@bur.io>
Date:   Tue Mar 19 10:54:22 2024 -0700

    btrfs: qgroup: correctly model root qgroup rsv in convert
    
    commit 141fb8cd206ace23c02cd2791c6da52c1d77d42a upstream.
    
    We use add_root_meta_rsv and sub_root_meta_rsv to track prealloc and
    pertrans reservations for subvolumes when quotas are enabled. The
    convert function does not properly increment pertrans after decrementing
    prealloc, so the count is not accurate.
    
    Note: we check that the fs is not read-only to mirror the logic in
    qgroup_convert_meta, which checks that before adding to the pertrans rsv.
    
    Fixes: 8287475a2055 ("btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space")
    CC: stable@vger.kernel.org # 6.1+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Boris Burkov <boris@bur.io>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations [+ + +]

Author: Boris Burkov <boris@bur.io>
Date:   Thu Mar 21 10:02:04 2024 -0700

    btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations
    
    commit 74e97958121aa1f5854da6effba70143f051b0cd upstream.
    
    Create subvolume, create snapshot and delete subvolume all use
    btrfs_subvolume_reserve_metadata() to reserve metadata for the changes
    done to the parent subvolume's fs tree, which cannot be mediated in the
    normal way via start_transaction. When quota groups (squota or qgroups)
    are enabled, this reserves qgroup metadata of type PREALLOC. Once the
    operation is associated to a transaction, we convert PREALLOC to
    PERTRANS, which gets cleared in bulk at the end of the transaction.
    
    However, the error paths of these three operations were not implementing
    this lifecycle correctly. They unconditionally converted the PREALLOC to
    PERTRANS in a generic cleanup step regardless of errors or whether the
    operation was fully associated to a transaction or not. This resulted in
    error paths occasionally converting this rsv to PERTRANS without calling
    record_root_in_trans successfully, which meant that unless that root got
    recorded in the transaction by some other thread, the end of the
    transaction would not free that root's PERTRANS, leaking it. Ultimately,
    this resulted in hitting a WARN in CONFIG_BTRFS_DEBUG builds at unmount
    for the leaked reservation.
    
    The fix is to ensure that every qgroup PREALLOC reservation observes the
    following properties:
    
    1. any failure before record_root_in_trans is called successfully
       results in freeing the PREALLOC reservation.
    2. after record_root_in_trans, we convert to PERTRANS, and now the
       transaction owns freeing the reservation.
    
    This patch enforces those properties on the three operations. Without
    it, generic/269 with squotas enabled at mkfs time would fail in ~5-10
    runs on my system. With this patch, it ran successfully 1000 times in a
    row.
    
    Fixes: e85fde5162bf ("btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations")
    CC: stable@vger.kernel.org # 6.1+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Boris Burkov <boris@bur.io>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: record delayed inode root in transaction [+ + +]

Author: Boris Burkov <boris@bur.io>
Date:   Thu Mar 21 10:14:24 2024 -0700

    btrfs: record delayed inode root in transaction
    
    commit 71537e35c324ea6fbd68377a4f26bb93a831ae35 upstream.
    
    When running delayed inode updates, we do not record the inode's root in
    the transaction, but we do allocate PREALLOC and thus converted PERTRANS
    space for it. To be sure we free that PERTRANS meta rsv, we must ensure
    that we record the root in the transaction.
    
    Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item")
    CC: stable@vger.kernel.org # 6.1+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Boris Burkov <boris@bur.io>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: tests: allocate dummy fs_info and root in test_find_delalloc() [+ + +]

Author: David Sterba <dsterba@suse.com>
Date:   Mon Jan 29 19:04:33 2024 +0100

    btrfs: tests: allocate dummy fs_info and root in test_find_delalloc()
    
    commit b2136cc288fce2f24a92f3d656531b2d50ebec5a upstream.
    
    Allocate fs_info and root to have a valid fs_info pointer in case it's
    dereferenced by a helper outside of tests, like find_lock_delalloc_range().
    
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE [+ + +]

Author: NeilBrown <neilb@suse.de>
Date:   Mon Mar 25 09:21:20 2024 +1100

    ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
    
    commit b372e96bd0a32729d55d27f613c8bc80708a82e1 upstream.
    
    The page has been marked clean before writepage is called.  If we don't
    redirty it before postponing the write, it might never get written.
    
    Cc: stable@vger.kernel.org
    Fixes: 503d4fa6ee28 ("ceph: remove reliance on bdi congestion")
    Signed-off-by: NeilBrown <neilb@suse.de>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Reviewed-by: Xiubo Li <xiubli@redhat.org>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ceph: switch to use cap_delay_lock for the unlink delay list [+ + +]

Author: Xiubo Li <xiubli@redhat.com>
Date:   Tue Apr 9 08:56:03 2024 +0800

    ceph: switch to use cap_delay_lock for the unlink delay list
    
    commit 17f8dc2db52185460f212052f3a692c1fdc167ba upstream.
    
    The same list item will be used in both cap_delay_list and
    cap_unlink_delay_list, so it's buggy to use two different locks
    to protect them.
    
    Cc: stable@vger.kernel.org
    Fixes: dbc347ef7f0c ("ceph: add ceph_cap_unlink_work to fire check_caps() immediately")
    Link: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AODC76VXRAMXKLFDCTK4TKFDDPWUSCN5
    Reported-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de>
    Signed-off-by: Xiubo Li <xiubli@redhat.com>
    Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
    Tested-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Tue Mar 19 11:15:08 2024 -0700

    cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned
    
    [ Upstream commit 5c88a9ccd4c431d58b532e4158b6999a8350062c ]
    
    In the error path, map->reg_type is being used for kernel warning
    before its value is setup. Found by code inspection. Exposure to
    user is wrong reg_type being emitted via kernel log. Use a local
    var for reg_type and retrieve value for usage.
    
    Fixes: 6c7f4f1e51c2 ("cxl/core/regs: Make cxl_map_{component, device}_regs() device generic")
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cxl/core: Fix initialization of mbox_cmd.size_out in get event [+ + +]

Author: Kwangjin Ko <kwangjin.ko@sk.com>
Date:   Tue Apr 2 17:14:03 2024 +0900

    cxl/core: Fix initialization of mbox_cmd.size_out in get event
    
    [ Upstream commit f7c52345ccc96343c0a05bdea3121c8ac7b67d5f ]
    
    Since mbox_cmd.size_out is overwritten with the actual output size in
    the function below, it needs to be initialized every time.
    
    cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
    
    Problem scenario:
    
    1) The size_out variable is initially set to the size of the mailbox.
    2) Read an event.
       - size_out is set to 160 bytes(header 32B + one event 128B).
       - Two event are created while reading.
    3) Read the new *two* events.
       - size_out is still set to 160 bytes.
       - Although the value of out_len is 288 bytes, only 160 bytes are
         copied from the mailbox register to the local variable.
       - record_count is set to 2.
       - Accessing records[1] will result in reading incorrect data.
    
    Fixes: 6ebe28f9ec72 ("cxl/mem: Read, trace, and clear events on driver load")
    Tested-by: Ira Weiny <ira.weiny@intel.com>
    Reviewed-by: Ira Weiny <ira.weiny@intel.com>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cxl/mem: Fix for the index of Clear Event Record Handle [+ + +]

Author: Yuquan Wang <wangyuquan1236@phytium.com.cn>
Date:   Mon Mar 18 10:29:28 2024 +0800

    cxl/mem: Fix for the index of Clear Event Record Handle
    
    [ Upstream commit b7c59b038c656214f56432867056997c2e0fc268 ]
    
    The dev_dbg info for Clear Event Records mailbox command would report
    the handle of the next record to clear not the current one.
    
    This was because the index 'i' had incremented before printing the
    current handle value.
    
    Fixes: 6ebe28f9ec72 ("cxl/mem: Read, trace, and clear events on driver load")
    Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Fan Ni <fan.ni@samsung.com>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cxl: Fix retrieving of access_coordinates in PCIe path [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Wed Apr 3 08:47:13 2024 -0700

    cxl: Fix retrieving of access_coordinates in PCIe path
    
    [ Upstream commit 592780b8391fe31f129ef4823c1513528f4dcb76 ]
    
    Current loop in cxl_endpoint_get_perf_coordinates() incorrectly assumes
    the Root Port (RP) dport is the one with generic port access_coordinate.
    However those coordinates are one level up in the Host Bridge (HB).
    Current code causes the computation code to pick up 0s as the coordinates
    and cause minimal bandwidth to result in 0.
    
    Add check to skip RP when combining coordinates.
    
    Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports")
    Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Link: https://lore.kernel.org/r/20240403154844.3403859-3-dave.jiang@intel.com
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates() [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Wed Apr 3 08:47:12 2024 -0700

    cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
    
    [ Upstream commit 648dae58a830ecceea3b1bebf68432435980f137 ]
    
    The while() loop in cxl_endpoint_get_perf_coordinates() checks to see if
    'iter' is valid as part of the condition breaking out of the loop.
    is_cxl_root() will stop the loop before the next iteration could go NULL.
    Remove the iter check.
    
    The presence of the iter or removing the iter does not impact the behavior
    of the code. This is a code clean up and not a bug fix.
    
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Link: https://lore.kernel.org/r/20240403154844.3403859-2-dave.jiang@intel.com
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cxl: Split out combine_coordinates() for common shared usage [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Fri Mar 8 14:59:24 2024 -0700

    cxl: Split out combine_coordinates() for common shared usage
    
    [ Upstream commit 032f7b37adff6985e22516053698b77131c2ce96 ]
    
    Refactor the common code of combining coordinates in order to reduce code.
    Create a new function cxl_cooordinates_combine() it combine two 'struct
    access_coordinate'.
    
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/20240308220055.2172956-6-dave.jiang@intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cxl: Split out host bridge access coordinates [+ + +]

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Fri Mar 8 14:59:25 2024 -0700

    cxl: Split out host bridge access coordinates
    
    [ Upstream commit 863027d40993f13155451bd898bfe4c4e9b7002f ]
    
    The difference between access class 0 and access class 1 for 'struct
    access_coordinate', if any, is that class 0 is for the distance from
    the target to the closest initiator and that class 1 is for the distance
    from the target to the closest CPU. For CXL memory, the nearest initiator
    may not necessarily be a CPU node. The performance path from the CXL
    endpoint to the host bridge should remain the same. However, the numbers
    extracted and stored from HMAT is the difference for the two access
    classes. Split out the performance numbers for the host bridge (generic
    target) from the calculation of the entire path in order to allow
    calculation of both access classes for a CXL region.
    
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/20240308220055.2172956-7-dave.jiang@intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Stable-dep-of: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/display: always reset ODM mode in context when adding first plane [+ + +]

Author: Wenjing Liu <wenjing.liu@amd.com>
Date:   Fri Mar 22 15:02:45 2024 -0400

    drm/amd/display: always reset ODM mode in context when adding first plane
    
    commit 81901d8d0472e9a19d294ae1dea76b950548195d upstream.
    
    [why]
    In current implemenation ODM mode is only reset when the last plane is
    removed from dc state. For any dc validate we will always remove all
    current planes and add new planes. However when switching from no planes
    to 1 plane, ODM mode is not reset because no planes get removed. This
    has caused an issue where we kept ODM combine when it should have been
    remove when a plane is added. The change is to reset ODM mode when
    adding the first plane.
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
    Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
    Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Do not recursively call manual trigger programming [+ + +]

Author: Dillon Varone <dillon.varone@amd.com>
Date:   Thu Mar 21 13:49:43 2024 -0400

    drm/amd/display: Do not recursively call manual trigger programming
    
    commit 953927587f37b731abdeabe46ad44a3b3ec67a52 upstream.
    
    [WHY&HOW]
    We should not be recursively calling the manual trigger programming function when
    FAMS is not in use.
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
    Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
    Signed-off-by: Dillon Varone <dillon.varone@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: fix disable otg wa logic in DCN316 [+ + +]

Author: Fudongwang <fudong.wang@amd.com>
Date:   Tue Mar 26 16:03:16 2024 +0800

    drm/amd/display: fix disable otg wa logic in DCN316
    
    commit cf79814cb0bf5749b9f0db53ca231aa540c02768 upstream.
    
    [Why]
    Wrong logic cause screen corruption.
    
    [How]
    Port logic from DCN35/314.
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
    Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
    Signed-off-by: Fudongwang <fudong.wang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4 [+ + +]

Author: Harry Wentland <harry.wentland@amd.com>
Date:   Tue Mar 12 11:55:52 2024 -0400

    drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4
    
    commit 9e61ef8d219877202d4ee51d0d2ad9072c99a262 upstream.
    
    In order for display colorimetry to work correctly on DP displays
    we need to send the VSC SDP packet. We should only do so for
    panels with DPCD revision greater or equal to 1.4 as older
    receivers might have problems with it.
    
    Cc: stable@vger.kernel.org
    Cc: Joshua Ashton <joshua@froggi.es>
    Cc: Xaver Hugl <xaver.hugl@gmail.com>
    Cc: Melissa Wen <mwen@igalia.com>
    Cc: Agustin Gutierrez <Agustin.Gutierrez@amd.com>
    Reviewed-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
    Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
    Signed-off-by: Harry Wentland <harry.wentland@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Return max resolution supported by DWB [+ + +]

Author: Alex Hung <alex.hung@amd.com>
Date:   Sat Mar 23 12:02:54 2024 -0600

    drm/amd/display: Return max resolution supported by DWB
    
    commit 2cc69a10d83180f3de9f5afe3a98e972b1453d4c upstream.
    
    mode_config's max width x height is 4096x2160 and is higher than DWB's
    max resolution 3840x2160 which is returned instead.
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Harry Wentland <harry.wentland@amd.com>
    Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
    Signed-off-by: Alex Hung <alex.hung@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST [+ + +]

Author: Harry Wentland <harry.wentland@amd.com>
Date:   Thu Mar 21 11:13:38 2024 -0400

    drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST
    
    commit c3e2a5f2da904a18661335e8be2b961738574998 upstream.
    
    The previous check for the is_vsc_sdp_colorimetry_supported flag
    for MST sink signals did nothing. Simplify the code and use the
    same check for MST and SST.
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
    Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
    Signed-off-by: Harry Wentland <harry.wentland@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11 [+ + +]

Author: Tim Huang <Tim.Huang@amd.com>
Date:   Wed Mar 27 13:10:37 2024 +0800

    drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11
    
    commit 31729e8c21ecfd671458e02b6511eb68c2225113 upstream.
    
    While doing multiple S4 stress tests, GC/RLC/PMFW get into
    an invalid state resulting into hard hangs.
    
    Adding a GFX reset as workaround just before sending the
    MP1_UNLOAD message avoids this failure.
    
    Signed-off-by: Tim Huang <Tim.Huang@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: Mario Limonciello <superm1@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu/umsch: reinitialize write pointer in hw init [+ + +]

Author: Lang Yu <Lang.Yu@amd.com>
Date:   Mon Mar 25 13:24:31 2024 +0800

    drm/amdgpu/umsch: reinitialize write pointer in hw init
    
    commit 0f1bbcc2bab25d5fb2dfb1ee3e08131437690d3d upstream.
    
    Otherwise the old one will be used during GPU reset.
    That's not expected.
    
    Signed-off-by: Lang Yu <Lang.Yu@amd.com>
    Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu/vpe: power on vpe when hw_init [+ + +]

Author: Peyton Lee <peytolee@amd.com>
Date:   Wed Mar 13 16:53:49 2024 +0800

    drm/amdgpu/vpe: power on vpe when hw_init
    
    commit eed14eb48ee176fe0144c6a999d00c855d0b199b upstream.
    
    To fix mode2 reset failure.
    Should power on VPE when hw_init.
    
    Signed-off-by: Peyton Lee <peytolee@amd.com>
    Reviewed-by: Lang Yu <lang.yu@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: "Gong, Richard" <richard.gong@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: always force full reset for SOC21 [+ + +]

Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Sat Mar 23 20:46:53 2024 -0400

    drm/amdgpu: always force full reset for SOC21
    
    commit 65ff8092e4802f96d87d3d7cde146961f5228265 upstream.
    
    There are cases where soft reset seems to succeed, but
    does not, so always use mode1/2 for now.
    
    Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: differentiate external rev id for gfx 11.5.0 [+ + +]

Author: Yifan Zhang <yifan1.zhang@amd.com>
Date:   Sun Apr 7 22:01:35 2024 +0800

    drm/amdgpu: differentiate external rev id for gfx 11.5.0
    
    commit 6dba20d23e85034901ccb765a7ca71199bcca4df upstream.
    
    This patch to differentiate external rev id for gfx 11.5.0.
    
    Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
    Reviewed-by: Tim Huang <Tim.Huang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: fix incorrect number of active RBs for gfx11 [+ + +]

Author: Tim Huang <Tim.Huang@amd.com>
Date:   Wed Apr 3 17:28:44 2024 +0800

    drm/amdgpu: fix incorrect number of active RBs for gfx11
    
    commit bbca7f414ae9a12ea231cdbafd79c607e3337ea8 upstream.
    
    The RB bitmap should be global active RB bitmap &
    active RB bitmap based on active SA.
    
    Signed-off-by: Tim Huang <Tim.Huang@amd.com>
    Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: Reset dGPU if suspend got aborted [+ + +]

Author: Lijo Lazar <lijo.lazar@amd.com>
Date:   Wed Feb 14 17:55:54 2024 +0530

    drm/amdgpu: Reset dGPU if suspend got aborted
    
    commit 8b2be55f4d6c1099d7f629b0ed7535a5be788c83 upstream.
    
    For SOC21 ASICs, there is an issue in re-enabling PM features if a
    suspend got aborted. In such cases, reset the device during resume
    phase. This is a workaround till a proper solution is finalized.
    
    Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
    Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdkfd: Reset GPU on queue preemption failure [+ + +]

Author: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Date:   Tue Mar 26 15:32:46 2024 -0400

    drm/amdkfd: Reset GPU on queue preemption failure
    
    commit 8bdfb4ea95ca738d33ef71376c21eba20130f2eb upstream.
    
    Currently, with F32 HWS GPU reset is only when unmap queue fails.
    
    However, if compute queue doesn't repond to preemption request in time
    unmap will return without any error. In this case, only preemption error
    is logged and Reset is not triggered. Call GPU reset in this case also.
    
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
    Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/ast: Fix soft lockup [+ + +]

Author: Jammy Huang <jammy_huang@aspeedtech.com>
Date:   Wed Apr 3 17:02:46 2024 +0800

    drm/ast: Fix soft lockup
    
    commit bc004f5038220b1891ef4107134ccae44be55109 upstream.
    
    There is a while-loop in ast_dp_set_on_off() that could lead to
    infinite-loop. This is because the register, VGACRI-Dx, checked in
    this API is a scratch register actually controlled by a MCU, named
    DPMCU, in BMC.
    
    These scratch registers are protected by scu-lock. If suc-lock is not
    off, DPMCU can not update these registers and then host will have soft
    lockup due to never updated status.
    
    DPMCU is used to control DP and relative registers to handshake with
    host's VGA driver. Even the most time-consuming task, DP's link
    training, is less than 100ms. 200ms should be enough.
    
    Signed-off-by: Jammy Huang <jammy_huang@aspeedtech.com>
    Fixes: 594e9c04b586 ("drm/ast: Create the driver for ASPEED proprietory Display-Port")
    Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
    Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
    Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: KuoHsiang Chou <kuohsiang_chou@aspeedtech.com>
    Cc: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: Dave Airlie <airlied@redhat.com>
    Cc: Jocelyn Falempe <jfalempe@redhat.com>
    Cc: dri-devel@lists.freedesktop.org
    Cc: <stable@vger.kernel.org> # v5.19+
    Link: https://patchwork.freedesktop.org/patch/msgid/20240403090246.1495487-1-jammy_huang@aspeedtech.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/client: Fully protect modes[] with dev->mode_config.mutex [+ + +]

Author: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
Date:   Thu Apr 4 23:33:25 2024 +0300

    drm/client: Fully protect modes[] with dev->mode_config.mutex
    
    commit 3eadd887dbac1df8f25f701e5d404d1b90fd0fea upstream.
    
    The modes[] array contains pointers to modes on the connectors'
    mode lists, which are protected by dev->mode_config.mutex.
    Thus we need to extend modes[] the same protection or by the
    time we use it the elements may already be pointing to
    freed/reused memory.
    
    Cc: stable@vger.kernel.org
    Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10583
    Signed-off-by: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240404203336.10454-2-ville.syrjala@linux.intel.com
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Jani Nikula <jani.nikula@intel.com>
    Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915/cdclk: Fix CDCLK programming order when pipes are active [+ + +]

Author: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
Date:   Tue Apr 2 18:50:03 2024 +0300

    drm/i915/cdclk: Fix CDCLK programming order when pipes are active
    
    commit 7b1f6b5aaec0f849e19c3e99d4eea75876853cdd upstream.
    
    Currently we always reprogram CDCLK from the
    intel_set_cdclk_pre_plane_update() when using squash/crawl.
    The code only works correctly for the cd2x update or full
    modeset cases, and it was simply never updated to deal with
    squash/crawl.
    
    If the CDCLK frequency is increasing we must reprogram it
    before we do anything else that might depend on the new
    higher frequency, and conversely we must not decrease
    the frequency until everything that might still depend
    on the old higher frequency has been dealt with.
    
    Since cdclk_state->pipe is only relevant when doing a cd2x
    update we can't use it to determine the correct sequence
    during squash/crawl. To that end introduce cdclk_state->disable_pipes
    which simply indicates that we must perform the update
    while the pipes are disable (ie. during
    intel_set_cdclk_pre_plane_update()). Otherwise we use the
    same old vs. new CDCLK frequency comparsiong as for cd2x
    updates.
    
    The only remaining problem case is when the voltage_level
    needs to increase due to a DDI port, but the CDCLK frequency
    is decreasing (and not all pipes are being disabled). The
    current approach will not bump the voltage level up until
    after the port has already been enabled, which is too late.
    But we'll take care of that case separately.
    
    v2: Don't break the "must disable pipes case"
    v3: Keep the on stack 'pipe' for future use
    
    Cc: stable@vger.kernel.org
    Fixes: d62686ba3b54 ("drm/i915/adl_p: CDCLK crawl support for ADL")
    Reviewed-by: Uma Shankar <uma.shankar@intel.com>
    Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
    Signed-off-by: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240402155016.13733-2-ville.syrjala@linux.intel.com
    (cherry picked from commit 3aecee90ac12a351905f12dda7643d5b0676d6ca)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915/psr: Disable PSR when bigjoiner is used [+ + +]

Author: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
Date:   Fri Apr 5 00:34:26 2024 +0300

    drm/i915/psr: Disable PSR when bigjoiner is used
    
    commit e3d4ead4d48c05355bd3b99c8162428f68c3c1a5 upstream.
    
    Bigjoiner seem to be causing all kinds of grief to the PSR
    code currently. I don't believe there is any hardware issue
    but the code simply not handling this correctly. For now
    just disable PSR when bigjoiner is needed.
    
    Cc: stable@vger.kernel.org
    Link: https://patchwork.freedesktop.org/patch/msgid/20240404213441.17637-3-ville.syrjala@linux.intel.com
    Reviewed-by: Arun R Murthy <arun.r.mruthy@intel.com>
    Acked-by: Jouni Hц╤gander <jouni.hogander@intel.com>
    Signed-off-by: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
    (cherry picked from commit 372fa0c79d3f289f813d8001e0a8a96d1011826c)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915/vrr: Disable VRR when using bigjoiner [+ + +]

Author: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
Date:   Fri Apr 5 00:34:29 2024 +0300

    drm/i915/vrr: Disable VRR when using bigjoiner
    
    commit dcd8992e47f13afb5c11a61e8d9c141c35e23751 upstream.
    
    All joined pipes share the same transcoder/timing generator.
    Currently we just do the commits per-pipe, which doesn't really
    work if we need to change switch between non-VRR and VRR timings
    generators on the fly, or even when sending the push to the
    transcoder. For now just disable VRR when bigjoiner is needed.
    
    Cc: stable@vger.kernel.org
    Tested-by: Vidya Srinivas <vidya.srinivas@intel.com>
    Reviewed-by: Vandita Kulkarni <vandita.kulkarni@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240404213441.17637-6-ville.syrjala@linux.intel.com
    Signed-off-by: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
    (cherry picked from commit f9d5e51db65652dbd8a2102fd7619440e3599fd2)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915: Disable live M/N updates when using bigjoiner [+ + +]

Author: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
Date:   Fri Apr 5 00:34:28 2024 +0300

    drm/i915: Disable live M/N updates when using bigjoiner
    
    commit 4a36e46df7aa781c756f09727d37dc2783f1ee75 upstream.
    
    All joined pipes share the same transcoder/timing generator.
    Currently we just do the commits per-pipe, which doesn't really
    work if we need to change the timings at the same time. For
    now just disable live M/N updates when bigjoiner is needed.
    
    Cc: stable@vger.kernel.org
    Tested-by: Vidya Srinivas <vidya.srinivas@intel.com>
    Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240404213441.17637-5-ville.syrjala@linux.intel.com
    Signed-off-by: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
    (cherry picked from commit ef79820db723a2a7c229a7251c12859e7e25a247)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915: Disable port sync when bigjoiner is used [+ + +]

Author: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
Date:   Fri Apr 5 00:34:27 2024 +0300

    drm/i915: Disable port sync when bigjoiner is used
    
    commit 0653d501409eeb9f1deb7e4c12e4d0d2c9f1cba1 upstream.
    
    The current modeset sequence can't handle port sync and bigjoiner
    at the same time. Refuse port sync when bigjoiner is needed,
    at least until we fix the modeset sequence.
    
    v2: Add a FIXME (Vandite)
    
    Cc: stable@vger.kernel.org
    Tested-by: Vidya Srinivas <vidya.srinivas@intel.com>
    Reviewed-by: Vandita Kulkarni <vandita.kulkarni@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240404213441.17637-4-ville.syrjala@linux.intel.com
    Signed-off-by: Ville Syrjц╓lц╓ <ville.syrjala@linux.intel.com>
    (cherry picked from commit b37e1347b991459c38c56ec2476087854a4f720b)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/msm/adreno: Set highest_bank_bit for A619 [+ + +]

Author: Luca Weiss <luca.weiss@fairphone.com>
Date:   Thu Mar 28 09:02:45 2024 +0100

    drm/msm/adreno: Set highest_bank_bit for A619
    
    [ Upstream commit 9dc23cba0927d09cb481da064c8413eb9df42e2b ]
    
    The default highest_bank_bit of 15 didn't seem to cause issues so far
    but downstream defines it to be 14. But similar to [0] leaving it on 14
    (or 15 for that matter) causes some corruption issues with some
    resolutions with DisplayPort, like 1920x1200.
    
    So set it to 13 for now so that there's no screen corruption.
    
    [0] commit 6a0dbcd20ef2 ("drm/msm/a6xx: set highest_bank_bit to 13 for a610")
    
    Fixes: b7616b5c69e6 ("drm/msm/adreno: Add A619 support")
    Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
    Patchwork: https://patchwork.freedesktop.org/patch/585215/
    Signed-off-by: Rob Clark <robdclark@chromium.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dp: fix runtime PM leak on connect failure [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Wed Mar 13 17:43:06 2024 +0100

    drm/msm/dp: fix runtime PM leak on connect failure
    
    commit e86750b01a1560f198e4b3e21bb3f78bfd5bb2c3 upstream.
    
    Make sure to balance the runtime PM usage counter (and suspend) before
    returning on connect failures (e.g. DPCD read failures after a spurious
    connect event or if link training fails).
    
    Fixes: 5814b8bf086a ("drm/msm/dp: incorporate pm_runtime framework into DP driver")
    Cc: stable@vger.kernel.org      # 6.8
    Cc: Kuogee Hsieh <quic_khsieh@quicinc.com>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/582746/
    Link: https://lore.kernel.org/r/20240313164306.23133-3-johan+linaro@kernel.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/msm/dp: fix runtime PM leak on disconnect [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Wed Mar 13 17:43:05 2024 +0100

    drm/msm/dp: fix runtime PM leak on disconnect
    
    commit 0640f47b742667fca6aac174f7cd62b6c2c7532c upstream.
    
    Make sure to put the runtime PM usage count (and suspend) also when
    receiving a disconnect event while in the ST_MAINLINK_READY state.
    
    This specifically avoids leaking a runtime PM usage count on every
    disconnect with display servers that do not automatically enable
    external displays when receiving a hotplug notification.
    
    Fixes: 5814b8bf086a ("drm/msm/dp: incorporate pm_runtime framework into DP driver")
    Cc: stable@vger.kernel.org      # 6.8
    Cc: Kuogee Hsieh <quic_khsieh@quicinc.com>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/582744/
    Link: https://lore.kernel.org/r/20240313164306.23133-2-johan+linaro@kernel.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/msm/dpu: don't allow overriding data from catalog [+ + +]

Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Thu Mar 14 03:10:41 2024 +0200

    drm/msm/dpu: don't allow overriding data from catalog
    
    [ Upstream commit 4f3b77ae5ff5b5ba9d99c5d5450db388dbee5107 ]
    
    The data from catalog is marked as const, so it is a part of the RO
    segment. Allowing userspace to write to it through debugfs can cause
    protection faults. Set debugfs file mode to read-only for debug entries
    corresponding to perf_cfg coming from catalog.
    
    Fixes: abda0d925f9c ("drm/msm/dpu: Mark various data tables as const")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/582844/
    Link: https://lore.kernel.org/r/20240314-dpu-perf-rework-v3-1-79fa4e065574@linaro.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dpu: make error messages at dpu_core_irq_register_callback() more sensible [+ + +]

Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Sat Mar 30 05:53:22 2024 +0200

    drm/msm/dpu: make error messages at dpu_core_irq_register_callback() more sensible
    
    [ Upstream commit 8844f467d6a58dc915f241e81c46e0c126f8c070 ]
    
    There is little point in using %ps to print a value known to be NULL. On
    the other hand it makes sense to print the callback symbol in the
    'invalid IRQ' message. Correct those two error messages to make more
    sense.
    
    Fixes: 6893199183f8 ("drm/msm/dpu: stop using raw IRQ indices in the kernel output")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/585565/
    Link: https://lore.kernel.org/r/20240330-dpu-irq-messages-v1-1-9ce782ae35f9@linaro.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm: Add newlines to some debug prints [+ + +]

Author: Stephen Boyd <swboyd@chromium.org>
Date:   Mon Mar 25 14:08:09 2024 -0700

    drm/msm: Add newlines to some debug prints
    
    [ Upstream commit c588f7d67044d6d59ef92d75a970b64929984d89 ]
    
    These debug prints are missing newlines, leading to multiple messages
    being printed on one line and hard to read logs. Add newlines to have
    the debug prints on separate lines. The DBG macro used to add a newline,
    but I missed that while migrating to drm_dbg wrappers.
    
    Fixes: 7cb017db1896 ("drm/msm: Move FB debug prints to drm_dbg_state()")
    Fixes: 721c6e0c6aed ("drm/msm: Move vblank debug prints to drm_dbg_vbl()")
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/584769/
    Link: https://lore.kernel.org/r/20240325210810.1340820-1-swboyd@chromium.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr() [+ + +]

Author: Boris Brezillon <boris.brezillon@collabora.com>
Date:   Fri Jan 5 21:46:11 2024 +0300

    drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()
    
    commit 1fc9af813b25e146d3607669247d0f970f5a87c3 upstream.
    
    Subject: drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()
    
    If some the pages or sgt allocation failed, we shouldn't release the
    pages ref we got earlier, otherwise we will end up with unbalanced
    get/put_pages() calls. We should instead leave everything in place
    and let the BO release function deal with extra cleanup when the object
    is destroyed, or let the fault handler try again next time it's called.
    
    Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations")
    Cc: <stable@vger.kernel.org>
    Reviewed-by: Steven Price <steven.price@arm.com>
    Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
    Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
    Co-developed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
    Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240105184624.508603-18-dmitry.osipenko@collabora.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/vmwgfx: Enable DMA mappings with SEV [+ + +]

Author: Zack Rusin <zack.rusin@broadcom.com>
Date:   Sun Apr 7 22:28:02 2024 -0400

    drm/vmwgfx: Enable DMA mappings with SEV
    
    commit 4c08f01934ab67d1d283d5cbaa52b923abcfe4cd upstream.
    
    Enable DMA mappings in vmwgfx after TTM has been fixed in commit
    3bf3710e3718 ("drm/ttm: Add a generic TTM memcpy move for page-based iomem")
    
    This enables full guest-backed memory support and in particular allows
    usage of screen targets as the presentation mechanism.
    
    Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
    Reported-by: Ye Li <ye.li@broadcom.com>
    Tested-by: Ye Li <ye.li@broadcom.com>
    Fixes: 3b0d6458c705 ("drm/vmwgfx: Refuse DMA operation when SEV encryption is active")
    Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
    Cc: dri-devel@lists.freedesktop.org
    Cc: <stable@vger.kernel.org> # v6.6+
    Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240408022802.358641-1-zack.rusin@broadcom.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/xe/display: Fix double mutex initialization [+ + +]

Author: Lucas De Marchi <lucas.demarchi@intel.com>
Date:   Fri Apr 5 13:07:11 2024 -0700

    drm/xe/display: Fix double mutex initialization
    
    [ Upstream commit 50a9b7fc151e67b9e642232d32e8c5a5ac13e64a ]
    
    All of these mutexes are already initialized by the display side since
    commit 3fef3e6ff86a ("drm/i915: move display mutex inits to display
    code"), so the xe shouldnб╢t initialize them.
    
    Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: Arun R Murthy <arun.r.murthy@intel.com>
    Reviewed-by: Jani Nikula <jani.nikula@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240405200711.2041428-1-lucas.demarchi@intel.com
    Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
    (cherry picked from commit 117de185edf2c5767f03575219bf7a43b161ff0d)
    Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe/hwmon: Cast result to output precision on left shift of operand [+ + +]

Author: Karthik Poosa <karthik.poosa@intel.com>
Date:   Fri Apr 5 18:31:27 2024 +0530

    drm/xe/hwmon: Cast result to output precision on left shift of operand
    
    [ Upstream commit a8ad8715472bb8f6a2ea8b4072a28151eb9f4f24 ]
    
    Address potential overflow in result of left shift of a
    lower precision (u32) operand before assignment to higher
    precision (u64) variable.
    
    v2:
     - Update commit message. (Himal)
    
    Fixes: 4446fcf220ce ("drm/xe/hwmon: Expose power1_max_interval")
    Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
    Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
    Cc: Badal Nilawar <badal.nilawar@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240405130127.1392426-5-karthik.poosa@intel.com
    Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
    (cherry picked from commit 883232b47b81108b0252197c747f396ecd51455a)
    Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dt-bindings: display/msm: sm8150-mdss: add DP node [+ + +]

Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Tue Apr 2 05:57:15 2024 +0300

    dt-bindings: display/msm: sm8150-mdss: add DP node
    
    [ Upstream commit be1b7acb929137e3943fe380671242beb485190c ]
    
    As Qualcomm SM8150 got support for the DisplayPort, add displayport@
    node as a valid child to the MDSS node.
    
    Fixes: 88806318e2c2 ("dt-bindings: display: msm: dp: declare compatible string for sm8150")
    Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Patchwork: https://patchwork.freedesktop.org/patch/586156/
    Link: https://lore.kernel.org/r/20240402-fd-fix-schema-v3-1-817ea6ddf775@linaro.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

firmware: arm_ffa: Fix the partition ID check in ffa_notification_info_get() [+ + +]

Author: Jens Wiklander <jens.wiklander@linaro.org>
Date:   Mon Mar 11 12:07:00 2024 +0100

    firmware: arm_ffa: Fix the partition ID check in ffa_notification_info_get()
    
    [ Upstream commit 1a4bd2b128fb5ca62e4d1c5ca298d3d06b9c1e8e ]
    
    FFA_NOTIFICATION_INFO_GET retrieves information about pending
    notifications. Notifications can be either global or per VCPU. Global
    notifications are reported with the partition ID only in the list of
    endpoints with pending notifications. ffa_notification_info_get()
    incorrectly expect no ID at all for global notifications. Fix this by
    checking for ID = 1 instead of ID = 0.
    
    Fixes: 3522be48d82b ("firmware: arm_ffa: Implement the NOTIFICATION_INFO_GET interface")
    Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
    Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
    Link: https://lore.kernel.org/r/20240311110700.2367142-1-jens.wiklander@linaro.org
    Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

firmware: arm_scmi: Make raw debugfs entries non-seekable [+ + +]

Author: Cristian Marussi <cristian.marussi@arm.com>
Date:   Fri Mar 15 14:03:24 2024 +0000

    firmware: arm_scmi: Make raw debugfs entries non-seekable
    
    [ Upstream commit b70c7996d4ffb2e02895132e8a79a37cee66504f ]
    
    SCMI raw debugfs entries are used to inject and snoop messages out of the
    SCMI core and, as such, the underlying virtual files have no reason to
    support seeking.
    
    Modify the related file_operations descriptors to be non-seekable.
    
    Fixes: 3c3d818a9317 ("firmware: arm_scmi: Add core raw transmission support")
    Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
    Link: https://lore.kernel.org/r/20240315140324.231830-1-cristian.marussi@arm.com
    Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fs/proc: remove redundant comments from /proc/bootconfig [+ + +]

Author: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Date:   Mon Apr 8 21:43:57 2024 -0700

    fs/proc: remove redundant comments from /proc/bootconfig
    
    commit fbbdc255fbee59b4207a5398fdb4f04590681a79 upstream.
    
    commit 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to
    /proc/bootconfig") adds bootloader argument comments into /proc/bootconfig.
    
    /proc/bootconfig shows boot_command_line[] multiple times following
    every xbc key value pair, that's duplicated and not necessary.
    Remove redundant ones.
    
    Output before and after the fix is like:
    key1 = value1
    *bootloader argument comments*
    key2 = value2
    *bootloader argument comments*
    key3 = value3
    *bootloader argument comments*
    ...
    
    key1 = value1
    key2 = value2
    key3 = value3
    *bootloader argument comments*
    ...
    
    Link: https://lore.kernel.org/all/20240409044358.1156477-1-paulmck@kernel.org/
    
    Fixes: 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to /proc/bootconfig")
    Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Cc: <linux-trace-kernel@vger.kernel.org>
    Cc: <linux-fsdevel@vger.kernel.org>
    Cc: stable@vger.kernel.org
    Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/proc: Skip bootloader comment if no embedded kernel parameters [+ + +]

Author: Masami Hiramatsu <mhiramat@kernel.org>
Date:   Mon Apr 8 21:43:58 2024 -0700

    fs/proc: Skip bootloader comment if no embedded kernel parameters
    
    commit c722cea208789d9e2660992bcd05fb9fac3adb56 upstream.
    
    If the "bootconfig" kernel command-line argument was specified or if
    the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are
    no embedded kernel parameter, omit the "# Parameters from bootloader:"
    comment from the /proc/bootconfig file.  This will cause automation
    to fall back to the /proc/cmdline file, which will be identical to the
    comment in this no-embedded-kernel-parameters case.
    
    Link: https://lore.kernel.org/all/20240409044358.1156477-2-paulmck@kernel.org/
    
    Fixes: 8b8ce6c75430 ("fs/proc: remove redundant comments from /proc/bootconfig")
    Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Cc: stable@vger.kernel.org
    Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

geneve: fix header validation in geneve[6]_xmit_skb [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Apr 5 10:30:34 2024 +0000

    geneve: fix header validation in geneve[6]_xmit_skb
    
    [ Upstream commit d8a6213d70accb403b82924a1c229e733433a5ef ]
    
    syzbot is able to trigger an uninit-value in geneve_xmit() [1]
    
    Problem : While most ip tunnel helpers (like ip_tunnel_get_dsfield())
    uses skb_protocol(skb, true), pskb_inet_may_pull() is only using
    skb->protocol.
    
    If anything else than ETH_P_IPV6 or ETH_P_IP is found in skb->protocol,
    pskb_inet_may_pull() does nothing at all.
    
    If a vlan tag was provided by the caller (af_packet in the syzbot case),
    the network header might not point to the correct location, and skb
    linear part could be smaller than expected.
    
    Add skb_vlan_inet_prepare() to perform a complete mac validation.
    
    Use this in geneve for the moment, I suspect we need to adopt this
    more broadly.
    
    v4 - Jakub reported v3 broke l2_tos_ttl_inherit.sh selftest
       - Only call __vlan_get_protocol() for vlan types.
    Link: https://lore.kernel.org/netdev/20240404100035.3270a7d5@kernel.org/
    
    v2,v3 - Addressed Sabrina comments on v1 and v2
    Link: https://lore.kernel.org/netdev/Zg1l9L2BNoZWZDZG@hog/
    
    [1]
    
    BUG: KMSAN: uninit-value in geneve_xmit_skb drivers/net/geneve.c:910 [inline]
     BUG: KMSAN: uninit-value in geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030
      geneve_xmit_skb drivers/net/geneve.c:910 [inline]
      geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030
      __netdev_start_xmit include/linux/netdevice.h:4903 [inline]
      netdev_start_xmit include/linux/netdevice.h:4917 [inline]
      xmit_one net/core/dev.c:3531 [inline]
      dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547
      __dev_queue_xmit+0x348d/0x52c0 net/core/dev.c:4335
      dev_queue_xmit include/linux/netdevice.h:3091 [inline]
      packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
      packet_snd net/packet/af_packet.c:3081 [inline]
      packet_sendmsg+0x8bb0/0x9ef0 net/packet/af_packet.c:3113
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      __sys_sendto+0x685/0x830 net/socket.c:2191
      __do_sys_sendto net/socket.c:2203 [inline]
      __se_sys_sendto net/socket.c:2199 [inline]
      __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Uninit was created at:
      slab_post_alloc_hook mm/slub.c:3804 [inline]
      slab_alloc_node mm/slub.c:3845 [inline]
      kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
      __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
      alloc_skb include/linux/skbuff.h:1318 [inline]
      alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
      sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
      packet_alloc_skb net/packet/af_packet.c:2930 [inline]
      packet_snd net/packet/af_packet.c:3024 [inline]
      packet_sendmsg+0x722d/0x9ef0 net/packet/af_packet.c:3113
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      __sys_sendto+0x685/0x830 net/socket.c:2191
      __do_sys_sendto net/socket.c:2203 [inline]
      __se_sys_sendto net/socket.c:2199 [inline]
      __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    CPU: 0 PID: 5033 Comm: syz-executor346 Not tainted 6.9.0-rc1-syzkaller-00005-g928a87efa423 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
    
    Fixes: d13f048dd40e ("net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb")
    Reported-by: syzbot+9ee20ec1de7b3168db09@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/netdev/000000000000d19c3a06152f9ee4@google.com/
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Phillip Potter <phil@philpotter.co.uk>
    Cc: Sabrina Dubroca <sd@queasysnail.net>
    Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
    Reviewed-by: Phillip Potter <phil@philpotter.co.uk>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

io_uring/net: restore msg_control on sendzc retry [+ + +]

Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Mon Apr 8 18:11:09 2024 +0100

    io_uring/net: restore msg_control on sendzc retry
    
    commit 4fe82aedeb8a8cb09bfa60f55ab57b5c10a74ac4 upstream.
    
    cac9e4418f4cb ("io_uring/net: save msghdr->msg_control for retries")
    reinstatiates msg_control before every __sys_sendmsg_sock(), since the
    function can overwrite the value in msghdr. We need to do same for
    zerocopy sendmsg.
    
    Cc: stable@vger.kernel.org
    Fixes: 493108d95f146 ("io_uring/net: zerocopy sendmsg")
    Link: https://github.com/axboe/liburing/issues/1067
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/cc1d5d9df0576fa66ddad4420d240a98a020b267.1712596179.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring: disable io-wq execution of multishot NOWAIT requests [+ + +]

Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Apr 1 11:30:06 2024 -0600

    io_uring: disable io-wq execution of multishot NOWAIT requests
    
    Commit bee1d5becdf5bf23d4ca0cd9c6b60bdf3c61d72b upstream.
    
    Do the same check for direct io-wq execution for multishot requests that
    commit 2a975d426c82 did for the inline execution, and disable multishot
    mode (and revert to single shot) if the file type doesn't support NOWAIT,
    and isn't opened in O_NONBLOCK mode. For multishot to work properly, it's
    a requirement that nonblocking read attempts can be done.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring: refactor DEFER_TASKRUN multishot checks [+ + +]

Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Fri Mar 8 13:55:57 2024 +0000

    io_uring: refactor DEFER_TASKRUN multishot checks
    
    Commit e0e4ab52d17096d96c21a6805ccd424b283c3c6d upstream.
    
    We disallow DEFER_TASKRUN multishots from running by io-wq, which is
    checked by individual opcodes in the issue path. We can consolidate all
    it in io_wq_submit_work() at the same time moving the checks out of the
    hot path.
    
    Suggested-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/e492f0f11588bb5aa11d7d24e6f53b7c7628afdb.1709905727.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iommu/vt-d: Allocate local memory for page request queue [+ + +]

Author: Jacob Pan <jacob.jun.pan@linux.intel.com>
Date:   Thu Apr 11 11:07:43 2024 +0800

    iommu/vt-d: Allocate local memory for page request queue
    
    [ Upstream commit a34f3e20ddff02c4f12df2c0635367394e64c63d ]
    
    The page request queue is per IOMMU, its allocation should be made
    NUMA-aware for performance reasons.
    
    Fixes: a222a7f0bb6c ("iommu/vt-d: Implement page request handling")
    Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    Link: https://lore.kernel.org/r/20240403214007.985600-1-jacob.jun.pan@linux.intel.com
    Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/vt-d: Fix WARN_ON in iommu probe path [+ + +]

Author: Lu Baolu <baolu.lu@linux.intel.com>
Date:   Thu Apr 11 11:07:44 2024 +0800

    iommu/vt-d: Fix WARN_ON in iommu probe path
    
    [ Upstream commit 89436f4f54125b1297aec1f466efd8acb4ec613d ]
    
    Commit 1a75cc710b95 ("iommu/vt-d: Use rbtree to track iommu probed
    devices") adds all devices probed by the iommu driver in a rbtree
    indexed by the source ID of each device. It assumes that each device
    has a unique source ID. This assumption is incorrect and the VT-d
    spec doesn't state this requirement either.
    
    The reason for using a rbtree to track devices is to look up the device
    with PCI bus and devfunc in the paths of handling ATS invalidation time
    out error and the PRI I/O page faults. Both are PCI ATS feature related.
    
    Only track the devices that have PCI ATS capabilities in the rbtree to
    avoid unnecessary WARN_ON in the iommu probe path. Otherwise, on some
    platforms below kernel splat will be displayed and the iommu probe results
    in failure.
    
     WARNING: CPU: 3 PID: 166 at drivers/iommu/intel/iommu.c:158 intel_iommu_probe_device+0x319/0xd90
     Call Trace:
      <TASK>
      ? __warn+0x7e/0x180
      ? intel_iommu_probe_device+0x319/0xd90
      ? report_bug+0x1f8/0x200
      ? handle_bug+0x3c/0x70
      ? exc_invalid_op+0x18/0x70
      ? asm_exc_invalid_op+0x1a/0x20
      ? intel_iommu_probe_device+0x319/0xd90
      ? debug_mutex_init+0x37/0x50
      __iommu_probe_device+0xf2/0x4f0
      iommu_probe_device+0x22/0x70
      iommu_bus_notifier+0x1e/0x40
      notifier_call_chain+0x46/0x150
      blocking_notifier_call_chain+0x42/0x60
      bus_notify+0x2f/0x50
      device_add+0x5ed/0x7e0
      platform_device_add+0xf5/0x240
      mfd_add_devices+0x3f9/0x500
      ? preempt_count_add+0x4c/0xa0
      ? up_write+0xa2/0x1b0
      ? __debugfs_create_file+0xe3/0x150
      intel_lpss_probe+0x49f/0x5b0
      ? pci_conf1_write+0xa3/0xf0
      intel_lpss_pci_probe+0xcf/0x110 [intel_lpss_pci]
      pci_device_probe+0x95/0x120
      really_probe+0xd9/0x370
      ? __pfx___driver_attach+0x10/0x10
      __driver_probe_device+0x73/0x150
      driver_probe_device+0x19/0xa0
      __driver_attach+0xb6/0x180
      ? __pfx___driver_attach+0x10/0x10
      bus_for_each_dev+0x77/0xd0
      bus_add_driver+0x114/0x210
      driver_register+0x5b/0x110
      ? __pfx_intel_lpss_pci_driver_init+0x10/0x10 [intel_lpss_pci]
      do_one_initcall+0x57/0x2b0
      ? kmalloc_trace+0x21e/0x280
      ? do_init_module+0x1e/0x210
      do_init_module+0x5f/0x210
      load_module+0x1d37/0x1fc0
      ? init_module_from_file+0x86/0xd0
      init_module_from_file+0x86/0xd0
      idempotent_init_module+0x17c/0x230
      __x64_sys_finit_module+0x56/0xb0
      do_syscall_64+0x6e/0x140
      entry_SYSCALL_64_after_hwframe+0x71/0x79
    
    Fixes: 1a75cc710b95 ("iommu/vt-d: Use rbtree to track iommu probed devices")
    Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10689
    Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
    Link: https://lore.kernel.org/r/20240407011429.136282-1-baolu.lu@linux.intel.com
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/vt-d: Fix wrong use of pasid config [+ + +]

Author: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Date:   Thu Apr 11 11:07:42 2024 +0800

    iommu/vt-d: Fix wrong use of pasid config
    
    [ Upstream commit 5b3625a4f6422e8982f90f0c11b5546149c962b8 ]
    
    The commit "iommu/vt-d: Add IOMMU perfmon support" introduce IOMMU
    PMU feature, but use the wrong config when set pasid filter.
    
    Fixes: 7232ab8b89e9 ("iommu/vt-d: Add IOMMU perfmon support")
    Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
    Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
    Link: https://lore.kernel.org/r/20240401060753.3321318-1-xuchun.shang@linux.alibaba.com
    Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv4/route: avoid unused-but-set-variable warning [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Mon Apr 8 09:42:03 2024 +0200

    ipv4/route: avoid unused-but-set-variable warning
    
    [ Upstream commit cf1b7201df59fb936f40f4a807433fe3f2ce310a ]
    
    The log_martians variable is only used in an #ifdef, causing a 'make W=1'
    warning with gcc:
    
    net/ipv4/route.c: In function 'ip_rt_send_redirect':
    net/ipv4/route.c:880:13: error: variable 'log_martians' set but not used [-Werror=unused-but-set-variable]
    
    Change the #ifdef to an equivalent IS_ENABLED() to let the compiler
    see where the variable is used.
    
    Fixes: 30038fc61adf ("net: ip_rt_send_redirect() optimization")
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240408074219.3030256-2-arnd@kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv6: fib: hide unused 'pn' variable [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Mon Apr 8 09:42:02 2024 +0200

    ipv6: fib: hide unused 'pn' variable
    
    [ Upstream commit 74043489fcb5e5ca4074133582b5b8011b67f9e7 ]
    
    When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing
    a 'make W=1' warning:
    
    net/ipv6/ip6_fib.c: In function 'fib6_add':
    net/ipv6/ip6_fib.c:1388:32: error: variable 'pn' set but not used [-Werror=unused-but-set-variable]
    
    Add another #ifdef around the variable declaration, matching the other
    uses in this file.
    
    Fixes: 66729e18df08 ("[IPV6] ROUTE: Make sure we have fn->leaf when adding a node on subtree.")
    Link: https://lore.kernel.org/netdev/20240322131746.904943-1-arnd@kernel.org/
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240408074219.3030256-1-arnd@kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr [+ + +]

Author: Jiri Benc <jbenc@redhat.com>
Date:   Mon Apr 8 16:18:21 2024 +0200

    ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
    
    [ Upstream commit 7633c4da919ad51164acbf1aa322cc1a3ead6129 ]
    
    Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
    still means hlist_for_each_entry_rcu can return an item that got removed
    from the list. The memory itself of such item is not freed thanks to RCU
    but nothing guarantees the actual content of the memory is sane.
    
    In particular, the reference count can be zero. This can happen if
    ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
    from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
    references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
    timing, this can happen:
    
    1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
    
    2. Then, the whole ipv6_del_addr is executed for the given entry. The
       reference count drops to zero and kfree_rcu is scheduled.
    
    3. ipv6_get_ifaddr continues and tries to increments the reference count
       (in6_ifa_hold).
    
    4. The rcu is unlocked and the entry is freed.
    
    5. The freed entry is returned.
    
    Prevent increasing of the reference count in such case. The name
    in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
    
    [   41.506330] refcount_t: addition on 0; use-after-free.
    [   41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130
    [   41.507413] Modules linked in: veth bridge stp llc
    [   41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14
    [   41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
    [   41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130
    [   41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff
    [   41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282
    [   41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000
    [   41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900
    [   41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff
    [   41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000
    [   41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48
    [   41.514086] FS:  00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000
    [   41.514726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0
    [   41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [   41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [   41.516799] Call Trace:
    [   41.517037]  <TASK>
    [   41.517249]  ? __warn+0x7b/0x120
    [   41.517535]  ? refcount_warn_saturate+0xa5/0x130
    [   41.517923]  ? report_bug+0x164/0x190
    [   41.518240]  ? handle_bug+0x3d/0x70
    [   41.518541]  ? exc_invalid_op+0x17/0x70
    [   41.520972]  ? asm_exc_invalid_op+0x1a/0x20
    [   41.521325]  ? refcount_warn_saturate+0xa5/0x130
    [   41.521708]  ipv6_get_ifaddr+0xda/0xe0
    [   41.522035]  inet6_rtm_getaddr+0x342/0x3f0
    [   41.522376]  ? __pfx_inet6_rtm_getaddr+0x10/0x10
    [   41.522758]  rtnetlink_rcv_msg+0x334/0x3d0
    [   41.523102]  ? netlink_unicast+0x30f/0x390
    [   41.523445]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
    [   41.523832]  netlink_rcv_skb+0x53/0x100
    [   41.524157]  netlink_unicast+0x23b/0x390
    [   41.524484]  netlink_sendmsg+0x1f2/0x440
    [   41.524826]  __sys_sendto+0x1d8/0x1f0
    [   41.525145]  __x64_sys_sendto+0x1f/0x30
    [   41.525467]  do_syscall_64+0xa5/0x1b0
    [   41.525794]  entry_SYSCALL_64_after_hwframe+0x72/0x7a
    [   41.526213] RIP: 0033:0x7fbc4cfcea9a
    [   41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
    [   41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    [   41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a
    [   41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005
    [   41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c
    [   41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    [   41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b
    [   41.531573]  </TASK>
    
    Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU")
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Jiri Benc <jbenc@redhat.com>
    Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

irqflags: Explicitly ignore lockdep_hrtimer_exit() argument [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Mon Apr 8 09:46:01 2024 +0200

    irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
    
    commit c1d11fc2c8320871b40730991071dd0a0b405bc8 upstream.
    
    When building with 'make W=1' but CONFIG_TRACE_IRQFLAGS=n, the
    unused argument to lockdep_hrtimer_exit() causes a warning:
    
    kernel/time/hrtimer.c:1655:14: error: variable 'expires_in_hardirq' set but not used [-Werror=unused-but-set-variable]
    
    This is intentional behavior, so add a cast to void to shut up the warning.
    
    Fixes: 73d20564e0dc ("hrtimer: Don't dereference the hrtimer pointer after the callback")
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240408074609.3170807-1-arnd@kernel.org
    Closes: https://lore.kernel.org/oe-kbuild-all/202311191229.55QXHVc6-lkp@intel.com/
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kernfs: annotate different lockdep class for of->mutex of writable files [+ + +]

Author: Amir Goldstein <amir73il@gmail.com>
Date:   Fri Apr 5 17:56:35 2024 +0300

    kernfs: annotate different lockdep class for of->mutex of writable files
    
    commit 16b52bbee4823b01ab7fe3919373c981a38f3797 upstream.
    
    The writable file /sys/power/resume may call vfs lookup helpers for
    arbitrary paths and readonly files can be read by overlayfs from vfs
    helpers when sysfs is a lower layer of overalyfs.
    
    To avoid a lockdep warning of circular dependency between overlayfs
    inode lock and kernfs of->mutex, use a different lockdep class for
    writable and readonly kernfs files.
    
    Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com
    Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries")
    Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kprobes: Fix possible use-after-free issue on kprobe registration [+ + +]

Author: Zheng Yejian <zhengyejian1@huawei.com>
Date:   Wed Apr 10 09:58:02 2024 +0800

    kprobes: Fix possible use-after-free issue on kprobe registration
    
    commit 325f3fb551f8cd672dbbfc4cf58b14f9ee3fc9e8 upstream.
    
    When unloading a module, its state is changing MODULE_STATE_LIVE ->
     MODULE_STATE_GOING -> MODULE_STATE_UNFORMED. Each change will take
    a time. `is_module_text_address()` and `__module_text_address()`
    works with MODULE_STATE_LIVE and MODULE_STATE_GOING.
    If we use `is_module_text_address()` and `__module_text_address()`
    separately, there is a chance that the first one is succeeded but the
    next one is failed because module->state becomes MODULE_STATE_UNFORMED
    between those operations.
    
    In `check_kprobe_address_safe()`, if the second `__module_text_address()`
    is failed, that is ignored because it expected a kernel_text address.
    But it may have failed simply because module->state has been changed
    to MODULE_STATE_UNFORMED. In this case, arm_kprobe() will try to modify
    non-exist module text address (use-after-free).
    
    To fix this problem, we should not use separated `is_module_text_address()`
    and `__module_text_address()`, but use only `__module_text_address()`
    once and do `try_module_get(module)` which is only available with
    MODULE_STATE_LIVE.
    
    Link: https://lore.kernel.org/all/20240410015802.265220-1-zhengyejian1@huawei.com/
    
    Fixes: 28f6c37a2910 ("kprobes: Forbid probing on trampoline and BPF code areas")
    Cc: stable@vger.kernel.org
    Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
    Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lib: checksum: hide unused expected_csum_ipv6_magic[] [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Apr 4 18:36:45 2024 +0200

    lib: checksum: hide unused expected_csum_ipv6_magic[]
    
    [ Upstream commit e9d47b7b31563a6524b9f64ea70ed0289cc4d9c4 ]
    
    When CONFIG_NET is disabled, an extra warning shows up for this
    unused variable:
    
    lib/checksum_kunit.c:218:18: error: 'expected_csum_ipv6_magic' defined but not used [-Werror=unused-const-variable=]
    
    Replace the #ifdef with an IS_ENABLED() check that makes the compiler's
    dead-code-elimination take care of the link failure.
    
    Fixes: f24a70106dc1 ("lib: checksum: Fix build with CONFIG_NET=n")
    Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Tested-by: Simon Horman <horms@kernel.org> # build-tested
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Linux: Linux 6.8.7 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Wed Apr 17 11:23:43 2024 +0200

    Linux 6.8.7
    
    Link: https://lore.kernel.org/r/20240415141959.976094777@linuxfoundation.org
    Tested-by: Ronald Warsow <rwarsow@gmx.de>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: cec: core: remove length check of Timer Status [+ + +]

Author: Nini Song <nini.song@mediatek.com>
Date:   Thu Jan 25 21:28:45 2024 +0800

    media: cec: core: remove length check of Timer Status
    
    commit ce5d241c3ad4568c12842168288993234345c0eb upstream.
    
    The valid_la is used to check the length requirements,
    including special cases of Timer Status. If the length is
    shorter than 5, that means no Duration Available is returned,
    the message will be forced to be invalid.
    
    However, the description of Duration Available in the spec
    is that this parameter may be returned when these cases, or
    that it can be optionally return when these cases. The key
    words in the spec description are flexible choices.
    
    Remove the special length check of Timer Status to fit the
    spec which is not compulsory about that.
    
    Signed-off-by: Nini Song <nini.song@mediatek.com>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: omap: fix broken slot switch lookup [+ + +]

Author: Aaro Koskinen <aaro.koskinen@iki.fi>
Date:   Fri Feb 23 20:14:37 2024 +0200

    mmc: omap: fix broken slot switch lookup
    
    [ Upstream commit d4debbcbffa45c3de5df0040af2eea74a9e794a3 ]
    
    The lookup is done before host->dev is initialized. It will always just
    fail silently, and the MMC behaviour is totally unpredictable as the switch
    is left in an undefined state. Fix that.
    
    Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors")
    Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
    Message-ID: <20240223181439.1099750-4-aaro.koskinen@iki.fi>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Tony Lindgren <tony@atomide.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mmc: omap: fix deferred probe [+ + +]

Author: Aaro Koskinen <aaro.koskinen@iki.fi>
Date:   Fri Feb 23 20:14:38 2024 +0200

    mmc: omap: fix deferred probe
    
    [ Upstream commit f6862c7f156d04f81c38467e1c304b7e9517e810 ]
    
    After a deferred probe, GPIO descriptor lookup will fail with EBUSY. Fix by
    using managed descriptors.
    
    Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors")
    Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
    Message-ID: <20240223181439.1099750-5-aaro.koskinen@iki.fi>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Tony Lindgren <tony@atomide.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mmc: omap: restore original power up/down steps [+ + +]

Author: Aaro Koskinen <aaro.koskinen@iki.fi>
Date:   Fri Feb 23 20:14:39 2024 +0200

    mmc: omap: restore original power up/down steps
    
    [ Upstream commit 894ad61b85d6ba8efd4274aa8719d9ff1c89ea54 ]
    
    Commit e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors")
    moved Nokia N810 MMC power up/down from the board file into the MMC driver.
    
    The change removed some delays, and ordering without a valid reason.
    Restore power up/down to match the original code. This matters only on N810
    where the 2nd GPIO is in use. Other boards will see an additional delay but
    that should be a lesser concern than omitting delays altogether.
    
    Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors")
    Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
    Message-ID: <20240223181439.1099750-6-aaro.koskinen@iki.fi>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Tony Lindgren <tony@atomide.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Correctly compare pkt reformat ids [+ + +]

Author: Cosmin Ratiu <cratiu@nvidia.com>
Date:   Tue Apr 9 22:08:13 2024 +0300

    net/mlx5: Correctly compare pkt reformat ids
    
    [ Upstream commit 9eca93f4d5ab03905516a68683674d9c50ff95bd ]
    
    struct mlx5_pkt_reformat contains a naked union of a u32 id and a
    dr_action pointer which is used when the action is SW-managed (when
    pkt_reformat.owner is set to MLX5_FLOW_RESOURCE_OWNER_SW). Using id
    directly in that case is incorrect, as it maps to the least significant
    32 bits of the 64-bit pointer in mlx5_fs_dr_action and not to the pkt
    reformat id allocated in firmware.
    
    For the purpose of comparing whether two rules are identical,
    interpreting the least significant 32 bits of the mlx5_fs_dr_action
    pointer as an id mostly works... until it breaks horribly and produces
    the outcome described in [1].
    
    This patch fixes mlx5_flow_dests_cmp to correctly compare ids using
    mlx5_fs_dr_action_get_pkt_reformat_id for the SW-managed rules.
    
    Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/T/#u [1]
    
    Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation")
    Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
    Reviewed-by: Mark Bloch <mbloch@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/20240409190820.227554-6-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: offset comp irq index in name by one [+ + +]

Author: Michael Liang <mliang@purestorage.com>
Date:   Tue Apr 9 22:08:11 2024 +0300

    net/mlx5: offset comp irq index in name by one
    
    [ Upstream commit 9f7e8fbb91f8fa29548e2f6ab50c03b628c67ede ]
    
    The mlx5 comp irq name scheme is changed a little bit between
    commit 3663ad34bc70 ("net/mlx5: Shift control IRQ to the last index")
    and commit 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation").
    The index in the comp irq name used to start from 0 but now it starts
    from 1. There is nothing critical here, but it's harmless to change
    back to the old behavior, a.k.a starting from 0.
    
    Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
    Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
    Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
    Signed-off-by: Michael Liang <mliang@purestorage.com>
    Reviewed-by: Shay Drory <shayd@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/20240409190820.227554-4-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Properly link new fs rules into the tree [+ + +]

Author: Cosmin Ratiu <cratiu@nvidia.com>
Date:   Tue Apr 9 22:08:12 2024 +0300

    net/mlx5: Properly link new fs rules into the tree
    
    [ Upstream commit 7c6782ad4911cbee874e85630226ed389ff2e453 ]
    
    Previously, add_rule_fg would only add newly created rules from the
    handle into the tree when they had a refcount of 1. On the other hand,
    create_flow_handle tries hard to find and reference already existing
    identical rules instead of creating new ones.
    
    These two behaviors can result in a situation where create_flow_handle
    1) creates a new rule and references it, then
    2) in a subsequent step during the same handle creation references it
       again,
    resulting in a rule with a refcount of 2 that is not linked into the
    tree, will have a NULL parent and root and will result in a crash when
    the flow group is deleted because del_sw_hw_rule, invoked on rule
    deletion, assumes node->parent is != NULL.
    
    This happened in the wild, due to another bug related to incorrect
    handling of duplicate pkt_reformat ids, which lead to the code in
    create_flow_handle incorrectly referencing a just-added rule in the same
    flow handle, resulting in the problem described above. Full details are
    at [1].
    
    This patch changes add_rule_fg to add new rules without parents into
    the tree, properly initializing them and avoiding the crash. This makes
    it more consistent with how rules are added to an FTE in
    create_flow_handle.
    
    Fixes: 74491de93712 ("net/mlx5: Add multi dest support")
    Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/T/#u [1]
    Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Mark Bloch <mbloch@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/20240409190820.227554-5-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Register devlink first under devlink lock [+ + +]

Author: Shay Drory <shayd@nvidia.com>
Date:   Tue Apr 9 22:08:10 2024 +0300

    net/mlx5: Register devlink first under devlink lock
    
    [ Upstream commit c6e77aa9dd82bc18a89bf49418f8f7e961cfccc8 ]
    
    In case device is having a non fatal FW error during probe, the
    driver will report the error to user via devlink. This will trigger
    a WARN_ON, since mlx5 is calling devlink_register() last.
    In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
    first under devlink lock.
    
    [1]
    WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
    CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
    RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
    Call Trace:
     <TASK>
     ? __warn+0x79/0x120
     ? devlink_recover_notify.constprop.0+0xb8/0xc0
     ? report_bug+0x17c/0x190
     ? handle_bug+0x3c/0x60
     ? exc_invalid_op+0x14/0x70
     ? asm_exc_invalid_op+0x16/0x20
     ? devlink_recover_notify.constprop.0+0xb8/0xc0
     devlink_health_report+0x4a/0x1c0
     mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
     process_one_work+0x1bb/0x3c0
     ? process_one_work+0x3c0/0x3c0
     worker_thread+0x4d/0x3c0
     ? process_one_work+0x3c0/0x3c0
     kthread+0xc6/0xf0
     ? kthread_complete_and_exit+0x20/0x20
     ret_from_fork+0x1f/0x30
     </TASK>
    
    Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
    Signed-off-by: Shay Drory <shayd@nvidia.com>
    Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: SF, Stop waiting for FW as teardown was called [+ + +]

Author: Moshe Shemesh <moshe@nvidia.com>
Date:   Thu Jan 25 14:24:09 2024 +0200

    net/mlx5: SF, Stop waiting for FW as teardown was called
    
    [ Upstream commit 137cef6d55564fb687d12fbc5f85be43ff7b53a7 ]
    
    When PF/VF teardown is called the driver sets the flag
    MLX5_BREAK_FW_WAIT to stop waiting for FW loading and initializing. Same
    should be applied to SF driver teardown to cut waiting time. On
    mlx5_sf_dev_remove() set the flag before draining health WQ as recovery
    flow may also wait for FW reloading while it is not relevant anymore.
    
    Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
    Reviewed-by: Aya Levin <ayal@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Stable-dep-of: c6e77aa9dd82 ("net/mlx5: Register devlink first under devlink lock")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit [+ + +]

Author: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Date:   Tue Apr 9 22:08:17 2024 +0300

    net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
    
    [ Upstream commit 86b0ca5b118d3a0bae5e5645a13e66f8a4f6c525 ]
    
    Free Tx port timestamping metadata entries in the NAPI poll context and
    consume metadata enties in the WQE xmit path. Do not free a Tx port
    timestamping metadata entry in the WQE xmit path even in the error path to
    avoid a race between two metadata entry producers.
    
    Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
    Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/20240409190820.227554-10-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Fix mlx5e_priv_init() cleanup flow [+ + +]

Author: Carolina Jubran <cjubran@nvidia.com>
Date:   Tue Apr 9 22:08:15 2024 +0300

    net/mlx5e: Fix mlx5e_priv_init() cleanup flow
    
    [ Upstream commit ecb829459a841198e142f72fadab56424ae96519 ]
    
    When mlx5e_priv_init() fails, the cleanup flow calls mlx5e_selq_cleanup which
    calls mlx5e_selq_apply() that assures that the `priv->state_lock` is held using
    lockdep_is_held().
    
    Acquire the state_lock in mlx5e_selq_cleanup().
    
    Kernel log:
    =============================
    WARNING: suspicious RCU usage
    6.8.0-rc3_net_next_841a9b5 #1 Not tainted
    -----------------------------
    drivers/net/ethernet/mellanox/mlx5/core/en/selq.c:124 suspicious rcu_dereference_protected() usage!
    
    other info that might help us debug this:
    
    rcu_scheduler_active = 2, debug_locks = 1
    2 locks held by systemd-modules/293:
     #0: ffffffffa05067b0 (devices_rwsem){++++}-{3:3}, at: ib_register_client+0x109/0x1b0 [ib_core]
     #1: ffff8881096c65c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x104/0x1c0 [ib_core]
    
    stack backtrace:
    CPU: 4 PID: 293 Comm: systemd-modules Not tainted 6.8.0-rc3_net_next_841a9b5 #1
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl+0x8a/0xa0
     lockdep_rcu_suspicious+0x154/0x1a0
     mlx5e_selq_apply+0x94/0xa0 [mlx5_core]
     mlx5e_selq_cleanup+0x3a/0x60 [mlx5_core]
     mlx5e_priv_init+0x2be/0x2f0 [mlx5_core]
     mlx5_rdma_setup_rn+0x7c/0x1a0 [mlx5_core]
     rdma_init_netdev+0x4e/0x80 [ib_core]
     ? mlx5_rdma_netdev_free+0x70/0x70 [mlx5_core]
     ipoib_intf_init+0x64/0x550 [ib_ipoib]
     ipoib_intf_alloc+0x4e/0xc0 [ib_ipoib]
     ipoib_add_one+0xb0/0x360 [ib_ipoib]
     add_client_context+0x112/0x1c0 [ib_core]
     ib_register_client+0x166/0x1b0 [ib_core]
     ? 0xffffffffa0573000
     ipoib_init_module+0xeb/0x1a0 [ib_ipoib]
     do_one_initcall+0x61/0x250
     do_init_module+0x8a/0x270
     init_module_from_file+0x8b/0xd0
     idempotent_init_module+0x17d/0x230
     __x64_sys_finit_module+0x61/0xb0
     do_syscall_64+0x71/0x140
     entry_SYSCALL_64_after_hwframe+0x46/0x4e
     </TASK>
    
    Fixes: 8bf30be75069 ("net/mlx5e: Introduce select queue parameters")
    Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/20240409190820.227554-8-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: HTB, Fix inconsistencies with QoS SQs number [+ + +]

Author: Carolina Jubran <cjubran@nvidia.com>
Date:   Tue Apr 9 22:08:16 2024 +0300

    net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
    
    [ Upstream commit 2f436f1869771d46e1a9f85738d5a1a7c5653a4e ]
    
    When creating a new HTB class while the interface is down,
    the variable that follows the number of QoS SQs (htb_max_qos_sqs)
    may not be consistent with the number of HTB classes.
    
    Previously, we compared these two values to ensure that
    the node_qid is lower than the number of QoS SQs, and we
    allocated stats for that SQ when they are equal.
    
    Change the check to compare the node_qid with the current
    number of leaf nodes and fix the checking conditions to
    ensure allocation of stats_list and stats for each node.
    
    Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
    Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/20240409190820.227554-9-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: RSS, Block changing channels number when RXFH is configured [+ + +]

Author: Carolina Jubran <cjubran@nvidia.com>
Date:   Tue Apr 9 22:08:14 2024 +0300

    net/mlx5e: RSS, Block changing channels number when RXFH is configured
    
    [ Upstream commit ee3572409f74a838154af74ce1e56e62c17786a8 ]
    
    Changing the channels number after configuring the receive flow hash
    indirection table may affect the RSS table size. The previous
    configuration may no longer be compatible with the new receive flow
    hash indirection table.
    
    Block changing the channels number when RXFH is configured and changing
    the channels number requires resizing the RSS table size.
    
    Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels")
    Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://lore.kernel.org/r/20240409190820.227554-7-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: mt7530: trap link-local frames regardless of ST Port State [+ + +]

Author: Arд╠nц╖ ц°NAL <arinc.unal@arinc9.com>
Date:   Tue Apr 9 18:01:14 2024 +0300

    net: dsa: mt7530: trap link-local frames regardless of ST Port State
    
    [ Upstream commit 17c560113231ddc20088553c7b499b289b664311 ]
    
    In Clause 5 of IEEE Std 802-2014, two sublayers of the data link layer
    (DLL) of the Open Systems Interconnection basic reference model (OSI/RM)
    are described; the medium access control (MAC) and logical link control
    (LLC) sublayers. The MAC sublayer is the one facing the physical layer.
    
    In 8.2 of IEEE Std 802.1Q-2022, the Bridge architecture is described. A
    Bridge component comprises a MAC Relay Entity for interconnecting the Ports
    of the Bridge, at least two Ports, and higher layer entities with at least
    a Spanning Tree Protocol Entity included.
    
    Each Bridge Port also functions as an end station and shall provide the MAC
    Service to an LLC Entity. Each instance of the MAC Service is provided to a
    distinct LLC Entity that supports protocol identification, multiplexing,
    and demultiplexing, for protocol data unit (PDU) transmission and reception
    by one or more higher layer entities.
    
    It is described in 8.13.9 of IEEE Std 802.1Q-2022 that in a Bridge, the LLC
    Entity associated with each Bridge Port is modeled as being directly
    connected to the attached Local Area Network (LAN).
    
    On the switch with CPU port architecture, CPU port functions as Management
    Port, and the Management Port functionality is provided by software which
    functions as an end station. Software is connected to an IEEE 802 LAN that
    is wholly contained within the system that incorporates the Bridge.
    Software provides access to the LLC Entity associated with each Bridge Port
    by the value of the source port field on the special tag on the frame
    received by software.
    
    We call frames that carry control information to determine the active
    topology and current extent of each Virtual Local Area Network (VLAN),
    i.e., spanning tree or Shortest Path Bridging (SPB) and Multiple VLAN
    Registration Protocol Data Units (MVRPDUs), and frames from other link
    constrained protocols, such as Extensible Authentication Protocol over LAN
    (EAPOL) and Link Layer Discovery Protocol (LLDP), link-local frames. They
    are not forwarded by a Bridge. Permanently configured entries in the
    filtering database (FDB) ensure that such frames are discarded by the
    Forwarding Process. In 8.6.3 of IEEE Std 802.1Q-2022, this is described in
    detail:
    
    Each of the reserved MAC addresses specified in Table 8-1
    (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]) shall be
    permanently configured in the FDB in C-VLAN components and ERs.
    
    Each of the reserved MAC addresses specified in Table 8-2
    (01-80-C2-00-00-[01,02,03,04,05,06,07,08,09,0A,0E]) shall be permanently
    configured in the FDB in S-VLAN components.
    
    Each of the reserved MAC addresses specified in Table 8-3
    (01-80-C2-00-00-[01,02,04,0E]) shall be permanently configured in the FDB
    in TPMR components.
    
    The FDB entries for reserved MAC addresses shall specify filtering for all
    Bridge Ports and all VIDs. Management shall not provide the capability to
    modify or remove entries for reserved MAC addresses.
    
    The addresses in Table 8-1, Table 8-2, and Table 8-3 determine the scope of
    propagation of PDUs within a Bridged Network, as follows:
    
      The Nearest Bridge group address (01-80-C2-00-00-0E) is an address that
      no conformant Two-Port MAC Relay (TPMR) component, Service VLAN (S-VLAN)
      component, Customer VLAN (C-VLAN) component, or MAC Bridge can forward.
      PDUs transmitted using this destination address, or any other addresses
      that appear in Table 8-1, Table 8-2, and Table 8-3
      (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]), can
      therefore travel no further than those stations that can be reached via a
      single individual LAN from the originating station.
    
      The Nearest non-TPMR Bridge group address (01-80-C2-00-00-03), is an
      address that no conformant S-VLAN component, C-VLAN component, or MAC
      Bridge can forward; however, this address is relayed by a TPMR component.
      PDUs using this destination address, or any of the other addresses that
      appear in both Table 8-1 and Table 8-2 but not in Table 8-3
      (01-80-C2-00-00-[00,03,05,06,07,08,09,0A,0B,0C,0D,0F]), will be relayed
      by any TPMRs but will propagate no further than the nearest S-VLAN
      component, C-VLAN component, or MAC Bridge.
    
      The Nearest Customer Bridge group address (01-80-C2-00-00-00) is an
      address that no conformant C-VLAN component, MAC Bridge can forward;
      however, it is relayed by TPMR components and S-VLAN components. PDUs
      using this destination address, or any of the other addresses that appear
      in Table 8-1 but not in either Table 8-2 or Table 8-3
      (01-80-C2-00-00-[00,0B,0C,0D,0F]), will be relayed by TPMR components and
      S-VLAN components but will propagate no further than the nearest C-VLAN
      component or MAC Bridge.
    
    Because the LLC Entity associated with each Bridge Port is provided via CPU
    port, we must not filter these frames but forward them to CPU port.
    
    In a Bridge, the transmission Port is majorly decided by ingress and egress
    rules, FDB, and spanning tree Port State functions of the Forwarding
    Process. For link-local frames, only CPU port should be designated as
    destination port in the FDB, and the other functions of the Forwarding
    Process must not interfere with the decision of the transmission Port. We
    call this process trapping frames to CPU port.
    
    Therefore, on the switch with CPU port architecture, link-local frames must
    be trapped to CPU port, and certain link-local frames received by a Port of
    a Bridge comprising a TPMR component or an S-VLAN component must be
    excluded from it.
    
    A Bridge of the switch with CPU port architecture cannot comprise a
    Two-Port MAC Relay (TPMR) component as a TPMR component supports only a
    subset of the functionality of a MAC Bridge. A Bridge comprising two Ports
    (Management Port doesn't count) of this architecture will either function
    as a standard MAC Bridge or a standard VLAN Bridge.
    
    Therefore, a Bridge of this architecture can only comprise S-VLAN
    components, C-VLAN components, or MAC Bridge components. Since there's no
    TPMR component, we don't need to relay PDUs using the destination addresses
    specified on the Nearest non-TPMR section, and the proportion of the
    Nearest Customer Bridge section where they must be relayed by TPMR
    components.
    
    One option to trap link-local frames to CPU port is to add static FDB
    entries with CPU port designated as destination port. However, because that
    Independent VLAN Learning (IVL) is being used on every VID, each entry only
    applies to a single VLAN Identifier (VID). For a Bridge comprising a MAC
    Bridge component or a C-VLAN component, there would have to be 16 times
    4096 entries. This switch intellectual property can only hold a maximum of
    2048 entries. Using this option, there also isn't a mechanism to prevent
    link-local frames from being discarded when the spanning tree Port State of
    the reception Port is discarding.
    
    The remaining option is to utilise the BPC, RGAC1, RGAC2, RGAC3, and RGAC4
    registers. Whilst this applies to every VID, it doesn't contain all of the
    reserved MAC addresses without affecting the remaining Standard Group MAC
    Addresses. The REV_UN frame tag utilised using the RGAC4 register covers
    the remaining 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] destination
    addresses. It also includes the 01-80-C2-00-00-22 to 01-80-C2-00-00-FF
    destination addresses which may be relayed by MAC Bridges or VLAN Bridges.
    The latter option provides better but not complete conformance.
    
    This switch intellectual property also does not provide a mechanism to trap
    link-local frames with specific destination addresses to CPU port by
    Bridge, to conform to the filtering rules for the distinct Bridge
    components.
    
    Therefore, regardless of the type of the Bridge component, link-local
    frames with these destination addresses will be trapped to CPU port:
    
    01-80-C2-00-00-[00,01,02,03,0E]
    
    In a Bridge comprising a MAC Bridge component or a C-VLAN component:
    
      Link-local frames with these destination addresses won't be trapped to
      CPU port which won't conform to IEEE Std 802.1Q-2022:
    
      01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F]
    
    In a Bridge comprising an S-VLAN component:
    
      Link-local frames with these destination addresses will be trapped to CPU
      port which won't conform to IEEE Std 802.1Q-2022:
    
      01-80-C2-00-00-00
    
      Link-local frames with these destination addresses won't be trapped to
      CPU port which won't conform to IEEE Std 802.1Q-2022:
    
      01-80-C2-00-00-[04,05,06,07,08,09,0A]
    
    Currently on this switch intellectual property, if the spanning tree Port
    State of the reception Port is discarding, link-local frames will be
    discarded.
    
    To trap link-local frames regardless of the spanning tree Port State, make
    the switch regard them as Bridge Protocol Data Units (BPDUs). This switch
    intellectual property only lets the frames regarded as BPDUs bypass the
    spanning tree Port State function of the Forwarding Process.
    
    With this change, the only remaining interference is the ingress rules.
    When the reception Port has no PVID assigned on software, VLAN-untagged
    frames won't be allowed in. There doesn't seem to be a mechanism on the
    switch intellectual property to have link-local frames bypass this function
    of the Forwarding Process.
    
    Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
    Reviewed-by: Daniel Golle <daniel@makrotopia.org>
    Signed-off-by: Arд╠nц╖ ц°NAL <arinc.unal@arinc9.com>
    Link: https://lore.kernel.org/r/20240409-b4-for-net-mt7530-fix-link-local-when-stp-discarding-v2-1-07b1150164ac@arinc9.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ena: Fix incorrect descriptor free behavior [+ + +]

Author: David Arinzon <darinzon@amazon.com>
Date:   Wed Apr 10 09:13:57 2024 +0000

    net: ena: Fix incorrect descriptor free behavior
    
    [ Upstream commit bf02d9fe00632d22fa91d34749c7aacf397b6cde ]
    
    ENA has two types of TX queues:
    - queues which only process TX packets arriving from the network stack
    - queues which only process TX packets forwarded to it by XDP_REDIRECT
      or XDP_TX instructions
    
    The ena_free_tx_bufs() cycles through all descriptors in a TX queue
    and unmaps + frees every descriptor that hasn't been acknowledged yet
    by the device (uncompleted TX transactions).
    The function assumes that the processed TX queue is necessarily from
    the first category listed above and ends up using napi_consume_skb()
    for descriptors belonging to an XDP specific queue.
    
    This patch solves a bug in which, in case of a VF reset, the
    descriptors aren't freed correctly, leading to crashes.
    
    Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
    Signed-off-by: Shay Agroskin <shayagr@amazon.com>
    Signed-off-by: David Arinzon <darinzon@amazon.com>
    Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ena: Fix potential sign extension issue [+ + +]

Author: David Arinzon <darinzon@amazon.com>
Date:   Wed Apr 10 09:13:55 2024 +0000

    net: ena: Fix potential sign extension issue
    
    [ Upstream commit 713a85195aad25d8a26786a37b674e3e5ec09e3c ]
    
    Small unsigned types are promoted to larger signed types in
    the case of multiplication, the result of which may overflow.
    In case the result of such a multiplication has its MSB
    turned on, it will be sign extended with '1's.
    This changes the multiplication result.
    
    Code example of the phenomenon:
    -------------------------------
    u16 x, y;
    size_t z1, z2;
    
    x = y = 0xffff;
    printk("x=%x y=%x\n",x,y);
    
    z1 = x*y;
    z2 = (size_t)x*y;
    
    printk("z1=%lx z2=%lx\n", z1, z2);
    
    Output:
    -------
    x=ffff y=ffff
    z1=fffffffffffe0001 z2=fffe0001
    
    The expected result of ffff*ffff is fffe0001, and without the
    explicit casting to avoid the unwanted sign extension we got
    fffffffffffe0001.
    
    This commit adds an explicit casting to avoid the sign extension
    issue.
    
    Fixes: 689b2bdaaa14 ("net: ena: add functions for handling Low Latency Queues in ena_com")
    Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
    Signed-off-by: David Arinzon <darinzon@amazon.com>
    Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ena: Set tx_info->xdpf value to NULL [+ + +]

Author: David Arinzon <darinzon@amazon.com>
Date:   Wed Apr 10 09:13:58 2024 +0000

    net: ena: Set tx_info->xdpf value to NULL
    
    [ Upstream commit 36a1ca01f0452f2549420e7279c2588729bd94df ]
    
    The patch mentioned in the `Fixes` tag removed the explicit assignment
    of tx_info->xdpf to NULL with the justification that there's no need
    to set tx_info->xdpf to NULL and tx_info->num_of_bufs to 0 in case
    of a mapping error. Both values won't be used once the mapping function
    returns an error, and their values would be overridden by the next
    transmitted packet.
    
    While both values do indeed get overridden in the next transmission
    call, the value of tx_info->xdpf is also used to check whether a TX
    descriptor's transmission has been completed (i.e. a completion for it
    was polled).
    
    An example scenario:
    1. Mapping failed, tx_info->xdpf wasn't set to NULL
    2. A VF reset occurred leading to IO resource destruction and
       a call to ena_free_tx_bufs() function
    3. Although the descriptor whose mapping failed was freed by the
       transmission function, it still passes the check
         if (!tx_info->skb)
    
       (skb and xdp_frame are in a union)
    4. The xdp_frame associated with the descriptor is freed twice
    
    This patch returns the assignment of NULL to tx_info->xdpf to make the
    cleaning function knows that the descriptor is already freed.
    
    Fixes: 504fd6a5390c ("net: ena: fix DMA mapping function issues in XDP")
    Signed-off-by: Shay Agroskin <shayagr@amazon.com>
    Signed-off-by: David Arinzon <darinzon@amazon.com>
    Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ena: Wrong missing IO completions check order [+ + +]

Author: David Arinzon <darinzon@amazon.com>
Date:   Wed Apr 10 09:13:56 2024 +0000

    net: ena: Wrong missing IO completions check order
    
    [ Upstream commit f7e417180665234fdb7af2ebe33d89aaa434d16f ]
    
    Missing IO completions check is called every second (HZ jiffies).
    This commit fixes several issues with this check:
    
    1. Duplicate queues check:
       Max of 4 queues are scanned on each check due to monitor budget.
       Once reaching the budget, this check exits under the assumption that
       the next check will continue to scan the remainder of the queues,
       but in practice, next check will first scan the last already scanned
       queue which is not necessary and may cause the full queue scan to
       last a couple of seconds longer.
       The fix is to start every check with the next queue to scan.
       For example, on 8 IO queues:
       Bug: [0,1,2,3], [3,4,5,6], [6,7]
       Fix: [0,1,2,3], [4,5,6,7]
    
    2. Unbalanced queues check:
       In case the number of active IO queues is not a multiple of budget,
       there will be checks which don't utilize the full budget
       because the full scan exits when reaching the last queue id.
       The fix is to run every TX completion check with exact queue budget
       regardless of the queue id.
       For example, on 7 IO queues:
       Bug: [0,1,2,3], [4,5,6], [0,1,2,3]
       Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4]
       The budget may be lowered in case the number of IO queues is less
       than the budget (4) to make sure there are no duplicate queues on
       the same check.
       For example, on 3 IO queues:
       Bug: [0,1,2,0], [1,2,0,1]
       Fix: [0,1,2], [0,1,2]
    
    Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
    Signed-off-by: Amit Bernstein <amitbern@amazon.com>
    Signed-off-by: David Arinzon <darinzon@amazon.com>
    Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ks8851: Handle softirqs at the end of IRQ thread to fix hang [+ + +]

Author: Marek Vasut <marex@denx.de>
Date:   Fri Apr 5 22:30:40 2024 +0200

    net: ks8851: Handle softirqs at the end of IRQ thread to fix hang
    
    [ Upstream commit be0384bf599cf1eb8d337517feeb732d71f75a6f ]
    
    The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
    any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
    implementation is guarded by local_bh_disable() and local_bh_enable().
    The local_bh_enable() may call do_softirq() to run softirqs in case
    any are pending. One of the softirqs is net_rx_action, which ultimately
    reaches the driver .start_xmit callback. If that happens, the system
    hangs. The entire call chain is below:
    
    ks8851_start_xmit_par from netdev_start_xmit
    netdev_start_xmit from dev_hard_start_xmit
    dev_hard_start_xmit from sch_direct_xmit
    sch_direct_xmit from __dev_queue_xmit
    __dev_queue_xmit from __neigh_update
    __neigh_update from neigh_update
    neigh_update from arp_process.constprop.0
    arp_process.constprop.0 from __netif_receive_skb_one_core
    __netif_receive_skb_one_core from process_backlog
    process_backlog from __napi_poll.constprop.0
    __napi_poll.constprop.0 from net_rx_action
    net_rx_action from __do_softirq
    __do_softirq from call_with_stack
    call_with_stack from do_softirq
    do_softirq from __local_bh_enable_ip
    __local_bh_enable_ip from netif_rx
    netif_rx from ks8851_irq
    ks8851_irq from irq_thread_fn
    irq_thread_fn from irq_thread
    irq_thread from kthread
    kthread from ret_from_fork
    
    The hang happens because ks8851_irq() first locks a spinlock in
    ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...)
    and with that spinlock locked, calls netif_rx(). Once the execution
    reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again
    which attempts to claim the already locked spinlock again, and the
    hang happens.
    
    Move the do_softirq() call outside of the spinlock protected section
    of ks8851_irq() by disabling BHs around the entire spinlock protected
    section of ks8851_irq() handler. Place local_bh_enable() outside of
    the spinlock protected section, so that it can trigger do_softirq()
    without the ks8851_par.c ks8851_lock_par() spinlock being held, and
    safely call ks8851_start_xmit_par() without attempting to lock the
    already locked spinlock.
    
    Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable()
    now, replace netif_rx() with __netif_rx() which is not duplicating the
    local_bh_disable()/local_bh_enable() calls.
    
    Fixes: 797047f875b5 ("net: ks8851: Implement Parallel bus operations")
    Signed-off-by: Marek Vasut <marex@denx.de>
    Link: https://lore.kernel.org/r/20240405203204.82062-2-marex@denx.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ks8851: Inline ks8851_rx_skb() [+ + +]

Author: Marek Vasut <marex@denx.de>
Date:   Fri Apr 5 22:30:39 2024 +0200

    net: ks8851: Inline ks8851_rx_skb()
    
    [ Upstream commit f96f700449b6d190e06272f1cf732ae8e45b73df ]
    
    Both ks8851_rx_skb_par() and ks8851_rx_skb_spi() call netif_rx(skb),
    inline the netif_rx(skb) call directly into ks8851_common.c and drop
    the .rx_skb callback and ks8851_rx_skb() wrapper. This removes one
    indirect call from the driver, no functional change otherwise.
    
    Signed-off-by: Marek Vasut <marex@denx.de>
    Link: https://lore.kernel.org/r/20240405203204.82062-1-marex@denx.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: be0384bf599c ("net: ks8851: Handle softirqs at the end of IRQ thread to fix hang")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: openvswitch: fix unwanted error log on timeout policy probing [+ + +]

Author: Ilya Maximets <i.maximets@ovn.org>
Date:   Wed Apr 3 22:38:01 2024 +0200

    net: openvswitch: fix unwanted error log on timeout policy probing
    
    [ Upstream commit 4539f91f2a801c0c028c252bffae56030cfb2cae ]
    
    On startup, ovs-vswitchd probes different datapath features including
    support for timeout policies.  While probing, it tries to execute
    certain operations with OVS_PACKET_ATTR_PROBE or OVS_FLOW_ATTR_PROBE
    attributes set.  These attributes tell the openvswitch module to not
    log any errors when they occur as it is expected that some of the
    probes will fail.
    
    For some reason, setting the timeout policy ignores the PROBE attribute
    and logs a failure anyway.  This is causing the following kernel log
    on each re-start of ovs-vswitchd:
    
      kernel: Failed to associated timeout policy `ovs_test_tp'
    
    Fix that by using the same logging macro that all other messages are
    using.  The message will still be printed at info level when needed
    and will be rate limited, but with a net rate limiter instead of
    generic printk one.
    
    The nf_ct_set_timeout() itself will still print some info messages,
    but at least this change makes logging in openvswitch module more
    consistent.
    
    Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
    Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
    Acked-by: Eelco Chaudron <echaudro@redhat.com>
    Link: https://lore.kernel.org/r/20240403203803.2137962-1-i.maximets@ovn.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: sparx5: fix wrong config being used when reconfiguring PCS [+ + +]

Author: Daniel Machon <daniel.machon@microchip.com>
Date:   Tue Apr 9 12:41:59 2024 +0200

    net: sparx5: fix wrong config being used when reconfiguring PCS
    
    [ Upstream commit 33623113a48ea906f1955cbf71094f6aa4462e8f ]
    
    The wrong port config is being used if the PCS is reconfigured. Fix this
    by correctly using the new config instead of the old one.
    
    Fixes: 946e7fd5053a ("net: sparx5: add port module support")
    Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20240409-link-mode-reconfiguration-fix-v2-1-db6a507f3627@microchip.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: complete validation of user input [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Apr 9 12:07:41 2024 +0000

    netfilter: complete validation of user input
    
    [ Upstream commit 65acf6e0501ac8880a4f73980d01b5d27648b956 ]
    
    In my recent commit, I missed that do_replace() handlers
    use copy_from_sockptr() (which I fixed), followed
    by unsafe copy_from_sockptr_offset() calls.
    
    In all functions, we can perform the @optlen validation
    before even calling xt_alloc_table_info() with the following
    check:
    
    if ((u64)optlen < (u64)tmp.size + sizeof(tmp))
            return -EINVAL;
    
    Fixes: 0c83842df40f ("netfilter: validate user input for expected length")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Link: https://lore.kernel.org/r/20240409120741.3538135-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

nouveau: fix function cast warning [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Apr 4 18:02:25 2024 +0200

    nouveau: fix function cast warning
    
    [ Upstream commit 185fdb4697cc9684a02f2fab0530ecdd0c2f15d4 ]
    
    Calling a function through an incompatible pointer type causes breaks
    kcfi, so clang warns about the assignment:
    
    drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c:73:10: error: cast from 'void (*)(const void *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
       73 |         .fini = (void(*)(void *))kfree,
    
    Avoid this with a trivial wrapper.
    
    Fixes: c39f472e9f14 ("drm/nouveau: remove symlinks, move core/ to nvkm/ (no code changes)")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Danilo Krummrich <dakr@redhat.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240404160234.2923554-1-arnd@kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-af: Fix NIX SQ mode and BP config [+ + +]

Author: Geetha sowjanya <gakula@marvell.com>
Date:   Mon Apr 8 12:06:43 2024 +0530

    octeontx2-af: Fix NIX SQ mode and BP config
    
    [ Upstream commit faf23006185e777db18912685922c5ddb2df383f ]
    
    NIX SQ mode and link backpressure configuration is required for
    all platforms. But in current driver this code is wrongly placed
    under specific platform check. This patch fixes the issue by
    moving the code out of platform check.
    
    Fixes: 5d9b976d4480 ("octeontx2-af: Support fixed transmit scheduler topology")
    Signed-off-by: Geetha sowjanya <gakula@marvell.com>
    Link: https://lore.kernel.org/r/20240408063643.26288-1-gakula@marvell.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-pf: Fix transmit scheduler resource leak [+ + +]

Author: Hariprasad Kelam <hkelam@marvell.com>
Date:   Thu Apr 4 16:54:27 2024 +0530

    octeontx2-pf: Fix transmit scheduler resource leak
    
    [ Upstream commit bccb798e07f8bb8b91212fe8ed1e421685449076 ]
    
    Inorder to support shaping and scheduling, Upon class creation
    Netdev driver allocates trasmit schedulers.
    
    The previous patch which added support for Round robin scheduling has
    a bug due to which driver is not freeing transmit schedulers post
    class deletion.
    
    This patch fixes the same.
    
    Fixes: 47a9656f168a ("octeontx2-pf: htb offload support for Round Robin scheduling")
    Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pds_core: Fix pdsc_check_pci_health function to use work thread [+ + +]

Author: Brett Creeley <brett.creeley@amd.com>
Date:   Mon Apr 8 09:35:40 2024 -0700

    pds_core: Fix pdsc_check_pci_health function to use work thread
    
    [ Upstream commit 81665adf25d28a00a986533f1d3a5df76b79cad9 ]
    
    When the driver notices fw_status == 0xff it tries to perform a PCI
    reset on itself via pci_reset_function() in the context of the driver's
    health thread. However, pdsc_reset_prepare calls
    pdsc_stop_health_thread(), which attempts to stop/flush the health
    thread. This results in a deadlock because the stop/flush will never
    complete since the driver called pci_reset_function() from the health
    thread context. Fix by changing the pdsc_check_pci_health_function()
    to queue a newly introduced pdsc_pci_reset_thread() on the pdsc's
    work queue.
    
    Unloading the driver in the fw_down/dead state uncovered another issue,
    which can be seen in the following trace:
    
    WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440
    [...]
    RIP: 0010:__queue_work+0x358/0x440
    [...]
    Call Trace:
     <TASK>
     ? __warn+0x85/0x140
     ? __queue_work+0x358/0x440
     ? report_bug+0xfc/0x1e0
     ? handle_bug+0x3f/0x70
     ? exc_invalid_op+0x17/0x70
     ? asm_exc_invalid_op+0x1a/0x20
     ? __queue_work+0x358/0x440
     queue_work_on+0x28/0x30
     pdsc_devcmd_locked+0x96/0xe0 [pds_core]
     pdsc_devcmd_reset+0x71/0xb0 [pds_core]
     pdsc_teardown+0x51/0xe0 [pds_core]
     pdsc_remove+0x106/0x200 [pds_core]
     pci_device_remove+0x37/0xc0
     device_release_driver_internal+0xae/0x140
     driver_detach+0x48/0x90
     bus_remove_driver+0x6d/0xf0
     pci_unregister_driver+0x2e/0xa0
     pdsc_cleanup_module+0x10/0x780 [pds_core]
     __x64_sys_delete_module+0x142/0x2b0
     ? syscall_trace_enter.isra.18+0x126/0x1a0
     do_syscall_64+0x3b/0x90
     entry_SYSCALL_64_after_hwframe+0x72/0xdc
    RIP: 0033:0x7fbd9d03a14b
    [...]
    
    Fix this by preventing the devcmd reset if the FW is not running.
    
    Fixes: d9407ff11809 ("pds_core: Prevent health thread from running during reset/remove")
    Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
    Signed-off-by: Brett Creeley <brett.creeley@amd.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

pds_core: use pci_reset_function for health reset [+ + +]

Author: Shannon Nelson <shannon.nelson@amd.com>
Date:   Fri Feb 16 14:29:52 2024 -0800

    pds_core: use pci_reset_function for health reset
    
    [ Upstream commit 2cbab3c296f1addd73b40549a2271b30f960df8b ]
    
    We get the benefit of all the PCI reset locking and recovery if
    we use the existing pci_reset_function() that will call our
    local reset handlers.
    
    Reviewed-by: Brett Creeley <brett.creeley@amd.com>
    Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 81665adf25d2 ("pds_core: Fix pdsc_check_pci_health function to use work thread")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf/x86: Fix out of range data [+ + +]

Author: Namhyung Kim <namhyung@kernel.org>
Date:   Tue Mar 5 22:10:03 2024 -0800

    perf/x86: Fix out of range data
    
    commit dec8ced871e17eea46f097542dd074d022be4bd1 upstream.
    
    On x86 each struct cpu_hw_events maintains a table for counter assignment but
    it missed to update one for the deleted event in x86_pmu_del().  This
    can make perf_clear_dirty_counters() reset used counter if it's called
    before event scheduling or enabling.  Then it would return out of range
    data which doesn't make sense.
    
    The following code can reproduce the problem.
    
      $ cat repro.c
      #include <pthread.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      #include <linux/perf_event.h>
      #include <sys/ioctl.h>
      #include <sys/mman.h>
      #include <sys/syscall.h>
    
      struct perf_event_attr attr = {
            .type = PERF_TYPE_HARDWARE,
            .config = PERF_COUNT_HW_CPU_CYCLES,
            .disabled = 1,
      };
    
      void *worker(void *arg)
      {
            int cpu = (long)arg;
            int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
            int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
            void *p;
    
            do {
                    ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0);
                    p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
                    ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);
    
                    ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0);
                    munmap(p, 4096);
                    ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0);
            } while (1);
    
            return NULL;
      }
    
      int main(void)
      {
            int i;
            int n = sysconf(_SC_NPROCESSORS_ONLN);
            pthread_t *th = calloc(n, sizeof(*th));
    
            for (i = 0; i < n; i++)
                    pthread_create(&th[i], NULL, worker, (void *)(long)i);
            for (i = 0; i < n; i++)
                    pthread_join(th[i], NULL);
    
            free(th);
            return 0;
      }
    
    And you can see the out of range data using perf stat like this.
    Probably it'd be easier to see on a large machine.
    
      $ gcc -o repro repro.c -pthread
      $ ./repro &
      $ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }'
           1.001028462 CPU6   196,719,295,683,763      cycles                           # 194290.996 GHz                       (71.54%)
           1.001028462 CPU3   396,077,485,787,730      branch-misses                    # 15804359784.80% of all branches      (71.07%)
           1.001028462 CPU17  197,608,350,727,877      branch-misses                    # 14594186554.56% of all branches      (71.22%)
           2.020064073 CPU4   198,372,472,612,140      cycles                           # 194681.113 GHz                       (70.95%)
           2.020064073 CPU6   199,419,277,896,696      cycles                           # 195720.007 GHz                       (70.57%)
           2.020064073 CPU20  198,147,174,025,639      cycles                           # 194474.654 GHz                       (71.03%)
           2.020064073 CPU20  198,421,240,580,145      stalled-cycles-frontend          #  100.14% frontend cycles idle        (70.93%)
           3.037443155 CPU4   197,382,689,923,416      cycles                           # 194043.065 GHz                       (71.30%)
           3.037443155 CPU20  196,324,797,879,414      cycles                           # 193003.773 GHz                       (71.69%)
           3.037443155 CPU5   197,679,956,608,205      stalled-cycles-backend           # 1315606428.66% backend cycles idle   (71.19%)
           3.037443155 CPU5   198,571,860,474,851      instructions                     # 13215422.58  insn per cycle
    
    It should move the contents in the cpuc->assign as well.
    
    Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task")
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240306061003.1894224-1-namhyung@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

platform/chrome: cros_ec_uart: properly fix race condition [+ + +]

Author: Noah Loomans <noah@noahloomans.com>
Date:   Wed Apr 10 20:26:19 2024 +0200

    platform/chrome: cros_ec_uart: properly fix race condition
    
    commit 5e700b384ec13f5bcac9855cb28fcc674f1d3593 upstream.
    
    The cros_ec_uart_probe() function calls devm_serdev_device_open() before
    it calls serdev_device_set_client_ops(). This can trigger a NULL pointer
    dereference:
    
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        ...
        Call Trace:
         <TASK>
         ...
         ? ttyport_receive_buf
    
    A simplified version of crashing code is as follows:
    
        static inline size_t serdev_controller_receive_buf(struct serdev_controller *ctrl,
                                                          const u8 *data,
                                                          size_t count)
        {
                struct serdev_device *serdev = ctrl->serdev;
    
                if (!serdev || !serdev->ops->receive_buf) // CRASH!
                    return 0;
    
                return serdev->ops->receive_buf(serdev, data, count);
        }
    
    It assumes that if SERPORT_ACTIVE is set and serdev exists, serdev->ops
    will also exist. This conflicts with the existing cros_ec_uart_probe()
    logic, as it first calls devm_serdev_device_open() (which sets
    SERPORT_ACTIVE), and only later sets serdev->ops via
    serdev_device_set_client_ops().
    
    Commit 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race
    condition") attempted to fix a similar race condition, but while doing
    so, made the window of error for this race condition to happen much
    wider.
    
    Attempt to fix the race condition again, making sure we fully setup
    before calling devm_serdev_device_open().
    
    Fixes: 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race condition")
    Cc: stable@vger.kernel.org
    Signed-off-by: Noah Loomans <noah@noahloomans.com>
    Reviewed-by: Guenter Roeck <groeck@chromium.org>
    Link: https://lore.kernel.org/r/20240410182618.169042-2-noah@noahloomans.com
    Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PM: s2idle: Make sure CPUs will wakeup directly on resume [+ + +]

Author: Anna-Maria Behnsen <anna-maria@linutronix.de>
Date:   Mon Apr 8 09:02:23 2024 +0200

    PM: s2idle: Make sure CPUs will wakeup directly on resume
    
    commit 3c89a068bfd0698a5478f4cf39493595ef757d5e upstream.
    
    s2idle works like a regular suspend with freezing processes and freezing
    devices. All CPUs except the control CPU go into idle. Once this is
    completed the control CPU kicks all other CPUs out of idle, so that they
    reenter the idle loop and then enter s2idle state. The control CPU then
    issues an swait() on the suspend state and therefore enters the idle loop
    as well.
    
    Due to being kicked out of idle, the other CPUs leave their NOHZ states,
    which means the tick is active and the corresponding hrtimer is programmed
    to the next jiffie.
    
    On entering s2idle the CPUs shut down their local clockevent device to
    prevent wakeups. The last CPU which enters s2idle shuts down its local
    clockevent and freezes timekeeping.
    
    On resume, one of the CPUs receives the wakeup interrupt, unfreezes
    timekeeping and its local clockevent and starts the resume process. At that
    point all other CPUs are still in s2idle with their clockevents switched
    off. They only resume when they are kicked by another CPU or after resuming
    devices and then receiving a device interrupt.
    
    That means there is no guarantee that all CPUs will wakeup directly on
    resume. As a consequence there is no guarantee that timers which are queued
    on those CPUs and should expire directly after resume, are handled. Also
    timer list timers which are remotely queued to one of those CPUs after
    resume will not result in a reprogramming IPI as the tick is
    active. Queueing a hrtimer will also not result in a reprogramming IPI
    because the first hrtimer event is already in the past.
    
    The recent introduction of the timer pull model (7ee988770326 ("timers:
    Implement the hierarchical pull model")) amplifies this problem, if the
    current migrator is one of the non woken up CPUs. When a non pinned timer
    list timer is queued and the queuing CPU goes idle, it relies on the still
    suspended migrator CPU to expire the timer which will happen by chance.
    
    The problem exists since commit 8d89835b0467 ("PM: suspend: Do not pause
    cpuidle in the suspend-to-idle path"). There the cpuidle_pause() call which
    in turn invoked a wakeup for all idle CPUs was moved to a later point in
    the resume process. This might not be reached or reached very late because
    it waits on a timer of a still suspended CPU.
    
    Address this by kicking all CPUs out of idle after the control CPU returns
    from swait() so that they resume their timers and restore consistent system
    state.
    
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218641
    Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
    Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Mario Limonciello <mario.limonciello@amd.com>
    Cc: 5.16+ <stable@kernel.org> # 5.16+
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

raid1: fix use-after-free for original bio in raid1_write_request() [+ + +]

Author: Yu Kuai <yukuai3@huawei.com>
Date:   Fri Mar 8 17:37:26 2024 +0800

    raid1: fix use-after-free for original bio in raid1_write_request()
    
    commit fcf3f7e2fc8a53a6140beee46ec782a4c88e4744 upstream.
    
    r1_bio->bios[] is used to record new bios that will be issued to
    underlying disks, however, in raid1_write_request(), r1_bio->bios[]
    will set to the original bio temporarily. Meanwhile, if blocked rdev
    is set, free_r1bio() will be called causing that all r1_bio->bios[]
    to be freed:
    
    raid1_write_request()
     r1_bio = alloc_r1bio(mddev, bio); -> r1_bio->bios[] is NULL
     for (i = 0;  i < disks; i++) -> for each rdev in conf
      // first rdev is normal
      r1_bio->bios[0] = bio; -> set to original bio
      // second rdev is blocked
      if (test_bit(Blocked, &rdev->flags))
       break
    
     if (blocked_rdev)
      free_r1bio()
       put_all_bios()
        bio_put(r1_bio->bios[0]) -> original bio is freed
    
    Test scripts:
    
    mdadm -CR /dev/md0 -l1 -n4 /dev/sd[abcd] --assume-clean
    fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 \
        -iodepth=128 -name=test -direct=1
    echo blocked > /sys/block/md0/md/rd2/state
    
    Test result:
    
    BUG bio-264 (Not tainted): Object already free
    -----------------------------------------------------------------------------
    
    Allocated in mempool_alloc_slab+0x24/0x50 age=1 cpu=1 pid=869
     kmem_cache_alloc+0x324/0x480
     mempool_alloc_slab+0x24/0x50
     mempool_alloc+0x6e/0x220
     bio_alloc_bioset+0x1af/0x4d0
     blkdev_direct_IO+0x164/0x8a0
     blkdev_write_iter+0x309/0x440
     aio_write+0x139/0x2f0
     io_submit_one+0x5ca/0xb70
     __do_sys_io_submit+0x86/0x270
     __x64_sys_io_submit+0x22/0x30
     do_syscall_64+0xb1/0x210
     entry_SYSCALL_64_after_hwframe+0x6c/0x74
    Freed in mempool_free_slab+0x1f/0x30 age=1 cpu=1 pid=869
     kmem_cache_free+0x28c/0x550
     mempool_free_slab+0x1f/0x30
     mempool_free+0x40/0x100
     bio_free+0x59/0x80
     bio_put+0xf0/0x220
     free_r1bio+0x74/0xb0
     raid1_make_request+0xadf/0x1150
     md_handle_request+0xc7/0x3b0
     md_submit_bio+0x76/0x130
     __submit_bio+0xd8/0x1d0
     submit_bio_noacct_nocheck+0x1eb/0x5c0
     submit_bio_noacct+0x169/0xd40
     submit_bio+0xee/0x1d0
     blkdev_direct_IO+0x322/0x8a0
     blkdev_write_iter+0x309/0x440
     aio_write+0x139/0x2f0
    
    Since that bios for underlying disks are not allocated yet, fix this
    problem by using mempool_free() directly to free the r1_bio.
    
    Fixes: 992db13a4aee ("md/raid1: free the r1bio before waiting for blocked rdev")
    Cc: stable@vger.kernel.org # v6.6+
    Reported-by: Coly Li <colyli@suse.de>
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Tested-by: Coly Li <colyli@suse.de>
    Signed-off-by: Song Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20240308093726.1047420-1-yukuai1@huaweicloud.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "drm/qxl: simplify qxl_fence_wait" [+ + +]

Author: Alex Constantino <dreaming.about.electric.sheep@gmail.com>
Date:   Thu Apr 4 19:14:48 2024 +0100

    Revert "drm/qxl: simplify qxl_fence_wait"
    
    [ Upstream commit 07ed11afb68d94eadd4ffc082b97c2331307c5ea ]
    
    This reverts commit 5a838e5d5825c85556011478abde708251cc0776.
    
    Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
    result in a '[TTM] Buffer eviction failed' exception whenever it reached a
    timeout.
    Due to a dependency to DMA_FENCE_WARN this also restores some code deleted
    by commit d72277b6c37d ("dma-buf: nuke DMA_FENCE_TRACE macros v2").
    
    Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
    Link: https://lore.kernel.org/regressions/ZTgydqRlK6WX_b29@eldamar.lan/
    Reported-by: Timo Lindfors <timo.lindfors@iki.fi>
    Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
    Signed-off-by: Alex Constantino <dreaming.about.electric.sheep@gmail.com>
    Signed-off-by: Maxime Ripard <mripard@kernel.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240404181448.1643-2-dreaming.about.electric.sheep@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "s390/ism: fix receive message buffer allocation" [+ + +]

Author: Gerd Bayer <gbayer@linux.ibm.com>
Date:   Tue Apr 9 13:37:53 2024 +0200

    Revert "s390/ism: fix receive message buffer allocation"
    
    [ Upstream commit d51dc8dd6ab6f93a894ff8b38d3b8d02c98eb9fb ]
    
    This reverts commit 58effa3476536215530c9ec4910ffc981613b413.
    Review was not finished on this patch. So it's not ready for
    upstreaming.
    
    Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
    Link: https://lore.kernel.org/r/20240409113753.2181368-1-gbayer@linux.ibm.com
    Fixes: 58effa347653 ("s390/ism: fix receive message buffer allocation")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ring-buffer: Only update pages_touched when a new page is touched [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Tue Apr 9 15:13:09 2024 -0400

    ring-buffer: Only update pages_touched when a new page is touched
    
    commit ffe3986fece696cf65e0ef99e74c75f848be8e30 upstream.
    
    The "buffer_percent" logic that is used by the ring buffer splice code to
    only wake up the tasks when there's no data after the buffer is filled to
    the percentage of the "buffer_percent" file is dependent on three
    variables that determine the amount of data that is in the ring buffer:
    
     1) pages_read - incremented whenever a new sub-buffer is consumed
     2) pages_lost - incremented every time a writer overwrites a sub-buffer
     3) pages_touched - incremented when a write goes to a new sub-buffer
    
    The percentage is the calculation of:
    
      (pages_touched - (pages_lost + pages_read)) / nr_pages
    
    Basically, the amount of data is the total number of sub-bufs that have been
    touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
    divided by the total count to give the buffer percentage. When the
    percentage is greater than the value in the "buffer_percent" file, it
    wakes up splice readers waiting for that amount.
    
    It was observed that over time, the amount read from the splice was
    constantly decreasing the longer the trace was running. That is, if one
    asked for 60%, it would read over 60% when it first starts tracing, but
    then it would be woken up at under 60% and would slowly decrease the
    amount of data read after being woken up, where the amount becomes much
    less than the buffer percent.
    
    This was due to an accounting of the pages_touched incrementation. This
    value is incremented whenever a writer transfers to a new sub-buffer. But
    the place where it was incremented was incorrect. If a writer overflowed
    the current sub-buffer it would go to the next one. If it gets preempted
    by an interrupt at that time, and the interrupt performs a trace, it too
    will end up going to the next sub-buffer. But only one should increment
    the counter. Unfortunately, that was not the case.
    
    Change the cmpxchg() that does the real switch of the tail-page into a
    try_cmpxchg(), and on success, perform the increment of pages_touched. This
    will only increment the counter once for when the writer moves to a new
    sub-buffer, and not when there's a race and is incremented for when a
    writer and its preempting writer both move to the same new sub-buffer.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.local.home
    
    Cc: stable@vger.kernel.org
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
    Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/ism: fix receive message buffer allocation [+ + +]

Author: Gerd Bayer <gbayer@linux.ibm.com>
Date:   Fri Apr 5 13:16:06 2024 +0200

    s390/ism: fix receive message buffer allocation
    
    [ Upstream commit 58effa3476536215530c9ec4910ffc981613b413 ]
    
    Since [1], dma_alloc_coherent() does not accept requests for GFP_COMP
    anymore, even on archs that may be able to fulfill this. Functionality that
    relied on the receive buffer being a compound page broke at that point:
    The SMC-D protocol, that utilizes the ism device driver, passes receive
    buffers to the splice processor in a struct splice_pipe_desc with a
    single entry list of struct pages. As the buffer is no longer a compound
    page, the splice processor now rejects requests to handle more than a
    page worth of data.
    
    Replace dma_alloc_coherent() and allocate a buffer with folio_alloc and
    create a DMA map for it with dma_map_page(). Since only receive buffers
    on ISM devices use DMA, qualify the mapping as FROM_DEVICE.
    Since ISM devices are available on arch s390, only and on that arch all
    DMA is coherent, there is no need to introduce and export some kind of
    dma_sync_to_cpu() method to be called by the SMC-D protocol layer.
    
    Analogously, replace dma_free_coherent by a two step dma_unmap_page,
    then folio_put to free the receive buffer.
    
    [1] https://lore.kernel.org/all/20221113163535.884299-1-hch@lst.de/
    
    Fixes: c08004eede4b ("s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent")
    Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: hisi_sas: Modify the deadline for ata_wait_after_reset() [+ + +]

Author: Xiang Chen <chenxiang66@hisilicon.com>
Date:   Tue Apr 2 11:55:13 2024 +0800

    scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
    
    [ Upstream commit 0098c55e0881f0b32591f2110410d5c8b7f9bd5a ]
    
    We found that the second parameter of function ata_wait_after_reset() is
    incorrectly used. We call smp_ata_check_ready_type() to poll the device
    type until the 30s timeout, so the correct deadline should be (jiffies +
    30000).
    
    Fixes: 3c2673a09cf1 ("scsi: hisi_sas: Fix SATA devices missing issue during I_T nexus reset")
    Co-developed-by: xiabing <xiabing12@h-partners.com>
    Signed-off-by: xiabing <xiabing12@h-partners.com>
    Co-developed-by: Yihang Li <liyihang9@huawei.com>
    Signed-off-by: Yihang Li <liyihang9@huawei.com>
    Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
    Link: https://lore.kernel.org/r/20240402035513.2024241-3-chenxiang66@hisilicon.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: qla2xxx: Fix off by one in qla_edif_app_getstats() [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Tue Apr 2 12:56:54 2024 +0300

    scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
    
    [ Upstream commit 4406e4176f47177f5e51b4cc7e6a7a2ff3dbfbbd ]
    
    The app_reply->elem[] array is allocated earlier in this function and it
    has app_req.num_ports elements.  Thus this > comparison needs to be >= to
    prevent memory corruption.
    
    Fixes: 7878f22a2e03 ("scsi: qla2xxx: edif: Add getfcinfo and statistic bsgs")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Link: https://lore.kernel.org/r/5c125b2f-92dd-412b-9b6f-fc3a3207bd60@moroto.mountain
    Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: sg: Avoid race in error handling & drop bogus warn [+ + +]

Author: Alexander Wetzel <Alexander@wetzel-home.de>
Date:   Mon Apr 1 21:10:38 2024 +0200

    scsi: sg: Avoid race in error handling & drop bogus warn
    
    commit d4e655c49f474deffaf5ed7e65034b8167ee39c8 upstream.
    
    Commit 27f58c04a8f4 ("scsi: sg: Avoid sg device teardown race") introduced
    an incorrect WARN_ON_ONCE() and missed a sequence where sg_device_destroy()
    was used after scsi_device_put().
    
    sg_device_destroy() is accessing the parent scsi_device request_queue which
    will already be set to NULL when the preceding call to scsi_device_put()
    removed the last reference to the parent scsi_device.
    
    Drop the incorrect WARN_ON_ONCE() - allowing more than one concurrent
    access to the sg device - and make sure sg_device_destroy() is not used
    after scsi_device_put() in the error handling.
    
    Link: https://lore.kernel.org/all/5375B275-D137-4D5F-BE25-6AF8ACAE41EF@linux.ibm.com
    Fixes: 27f58c04a8f4 ("scsi: sg: Avoid sg device teardown race")
    Cc: stable@vger.kernel.org
    Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
    Link: https://lore.kernel.org/r/20240401191038.18359-1-Alexander@wetzel-home.de
    Tested-by: Sachin Sant <sachinp@linux.ibm.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: sg: Avoid sg device teardown race [+ + +]

Author: Alexander Wetzel <Alexander@wetzel-home.de>
Date:   Wed Mar 20 22:30:32 2024 +0100

    scsi: sg: Avoid sg device teardown race
    
    commit 27f58c04a8f438078583041468ec60597841284d upstream.
    
    sg_remove_sfp_usercontext() must not use sg_device_destroy() after calling
    scsi_device_put().
    
    sg_device_destroy() is accessing the parent scsi_device request_queue which
    will already be set to NULL when the preceding call to scsi_device_put()
    removed the last reference to the parent scsi_device.
    
    The resulting NULL pointer exception will then crash the kernel.
    
    Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
    Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
    Cc: stable@vger.kernel.org
    Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
    Link: https://lore.kernel.org/r/20240320213032.18221-1-Alexander@wetzel-home.de
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests/timers/posix_timers: Reimplement check_timer_distribution() [+ + +]

Author: Oleg Nesterov <oleg@redhat.com>
Date:   Tue Apr 9 15:38:03 2024 +0200

    selftests/timers/posix_timers: Reimplement check_timer_distribution()
    
    commit 6d029c25b71f2de2838a6f093ce0fa0e69336154 upstream.
    
    check_timer_distribution() runs ten threads in a busy loop and tries to
    test that the kernel distributes a process posix CPU timer signal to every
    thread over time.
    
    There is not guarantee that this is true even after commit bcb7ee79029d
    ("posix-timers: Prefer delivery of signals to the current thread") because
    that commit only avoids waking up the sleeping process leader thread, but
    that has nothing to do with the actual signal delivery.
    
    As the signal is process wide the first thread which observes sigpending
    and wins the race to lock sighand will deliver the signal. Testing shows
    that this hangs on a regular base because some threads never win the race.
    
    The comment "This primarily tests that the kernel does not favour any one."
    is wrong. The kernel does favour a thread which hits the timer interrupt
    when CLOCK_PROCESS_CPUTIME_ID expires.
    
    Rewrite the test so it only checks that the group leader sleeping in join()
    never receives SIGALRM and the thread which burns CPU cycles receives all
    signals.
    
    In older kernels which do not have commit bcb7ee79029d ("posix-timers:
    Prefer delivery of signals to the current thread") the test-case fails
    immediately, the very 1st tick wakes the leader up. Otherwise it quickly
    succeeds after 100 ticks.
    
    CI testing wants to use newer selftest versions on stable kernels. In this
    case the test is guaranteed to fail.
    
    So check in the failure case whether the kernel version is less than v6.3
    and skip the test result in that case.
    
    [ tglx: Massaged change log, renamed the version check helper ]
    
    Fixes: e797203fb3ba ("selftests/timers/posix_timers: Test delivery of signals across threads")
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240409133802.GD29396@redhat.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: kselftest: Fix build failure with NOLIBC [+ + +]

Author: Oleg Nesterov <oleg@redhat.com>
Date:   Fri Apr 12 14:35:36 2024 +0200

    selftests: kselftest: Fix build failure with NOLIBC
    
    commit 16767502aa990cca2cb7d1372b31d328c4c85b40 upstream.
    
    As Mark explains ksft_min_kernel_version() can't be compiled with nolibc,
    it doesn't implement uname().
    
    Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
    Reported-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20240412123536.GA32444@redhat.com
    Closes: https://lore.kernel.org/all/f0523b3a-ea08-4615-b0fb-5b504a2d39df@sirena.org.uk/
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: kselftest: Mark functions that unconditionally call exit() as __noreturn [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Thu Apr 11 11:45:40 2024 -0700

    selftests: kselftest: Mark functions that unconditionally call exit() as __noreturn
    
    commit f7d5bcd35d427daac7e206b1073ca14f5db85c27 upstream.
    
    After commit 6d029c25b71f ("selftests/timers/posix_timers: Reimplement
    check_timer_distribution()"), clang warns:
    
      tools/testing/selftests/timers/../kselftest.h:398:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
        398 |         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
            |             ^~~~~~~~~~~~
      tools/testing/selftests/timers/../kselftest.h:401:9: note: uninitialized use occurs here
        401 |         return major > min_major || (major == min_major && minor >= min_minor);
            |                ^~~~~
      tools/testing/selftests/timers/../kselftest.h:398:6: note: remove the '||' if its condition is always false
        398 |         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
            |             ^~~~~~~~~~~~~~~
      tools/testing/selftests/timers/../kselftest.h:395:20: note: initialize the variable 'major' to silence this warning
        395 |         unsigned int major, minor;
            |                           ^
            |                            = 0
    
    This is a false positive because if uname() fails, ksft_exit_fail_msg()
    will be called, which unconditionally calls exit(), a noreturn function.
    However, clang does not know that ksft_exit_fail_msg() will call exit() at
    the point in the pipeline that the warning is emitted because inlining has
    not occurred, so it assumes control flow will resume normally after
    ksft_exit_fail_msg() is called.
    
    Make it clear to clang that all of the functions that call exit()
    unconditionally in kselftest.h are noreturn transitively by marking them
    explicitly with '__attribute__((__noreturn__))', which clears up the
    warning above and any future warnings that may appear for the same reason.
    
    Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
    Reported-by: John Stultz <jstultz@google.com>
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Shuah Khan <skhan@linuxfoundation.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240411-mark-kselftest-exit-funcs-noreturn-v1-1-b027c948f586@kernel.org
    Closes: https://lore.kernel.org/all/20240410232637.4135564-2-jstultz@google.com/
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: timers: Fix abs() warning in posix_timers test [+ + +]

Author: John Stultz <jstultz@google.com>
Date:   Wed Apr 10 16:26:30 2024 -0700

    selftests: timers: Fix abs() warning in posix_timers test
    
    commit ed366de8ec89d4f960d66c85fc37d9de22f7bf6d upstream.
    
    Building with clang results in the following warning:
    
      posix_timers.c:69:6: warning: absolute value function 'abs' given an
          argument of type 'long long' but has parameter of type 'int' which may
          cause truncation of value [-Wabsolute-value]
            if (abs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) {
                ^
    So switch to using llabs() instead.
    
    Fixes: 0bc4b0cf1570 ("selftests: add basic posix timers selftests")
    Signed-off-by: John Stultz <jstultz@google.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240410232637.4135564-3-jstultz@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: timers: Fix posix_timers ksft_print_msg() warning [+ + +]

Author: John Stultz <jstultz@google.com>
Date:   Wed Apr 10 16:26:28 2024 -0700

    selftests: timers: Fix posix_timers ksft_print_msg() warning
    
    commit e4a6bceac98eba3c00e874892736b34ea5fdaca3 upstream.
    
    After commit 6d029c25b71f ("selftests/timers/posix_timers: Reimplement
    check_timer_distribution()") the following warning occurs when building
    with an older gcc:
    
    posix_timers.c:250:2: warning: format not a string literal and no format arguments [-Wformat-security]
      250 |  ksft_print_msg(errmsg);
          |  ^~~~~~~~~~~~~~
    
    Fix this up by changing it to ksft_print_msg("%s", errmsg)
    
    Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
    Signed-off-by: John Stultz <jstultz@google.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Justin Stitt <justinstitt@google.com>
    Acked-by: Shuah Khan <skhan@linuxfoundation.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240410232637.4135564-1-jstultz@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb3: fix Open files on server counter going negative [+ + +]

Author: Steve French <stfrench@microsoft.com>
Date:   Sat Apr 6 23:16:08 2024 -0500

    smb3: fix Open files on server counter going negative
    
    commit 28e0947651ce6a2200b9a7eceb93282e97d7e51a upstream.
    
    We were decrementing the count of open files on server twice
    for the case where we were closing cached directories.
    
    Fixes: 8e843bf38f7b ("cifs: return a single-use cfid if we did not get a lease")
    Cc: stable@vger.kernel.org
    Acked-by: Bharath SM <bharathsm@microsoft.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: hide unused ftrace_event_id_fops [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Apr 3 10:06:24 2024 +0200

    tracing: hide unused ftrace_event_id_fops
    
    [ Upstream commit 5281ec83454d70d98b71f1836fb16512566c01cd ]
    
    When CONFIG_PERF_EVENTS, a 'make W=1' build produces a warning about the
    unused ftrace_event_id_fops variable:
    
    kernel/trace/trace_events.c:2155:37: error: 'ftrace_event_id_fops' defined but not used [-Werror=unused-const-variable=]
     2155 | static const struct file_operations ftrace_event_id_fops = {
    
    Hide this in the same #ifdef as the reference to it.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240403080702.3509288-7-arnd@kernel.org
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Zheng Yejian <zhengyejian1@huawei.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Ajay Kaher <akaher@vmware.com>
    Cc: Jinjie Ruan <ruanjinjie@huawei.com>
    Cc: Clц╘ment Lц╘ger <cleger@rivosinc.com>
    Cc: Dan Carpenter <dan.carpenter@linaro.org>
    Cc: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
    Fixes: 620a30e97feb ("tracing: Don't pass file_operations array to event_create_dir()")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file [+ + +]

Author: Petr Tesarik <petr@tesarici.cz>
Date:   Thu Apr 4 09:57:40 2024 +0200

    u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file
    
    [ Upstream commit 38a15d0a50e0a43778561a5861403851f0b0194c ]
    
    Fix bogus lockdep warnings if multiple u64_stats_sync variables are
    initialized in the same file.
    
    With CONFIG_LOCKDEP, seqcount_init() is a macro which declares:
    
            static struct lock_class_key __key;
    
    Since u64_stats_init() is a function (albeit an inline one), all calls
    within the same file end up using the same instance, effectively treating
    them all as a single lock-class.
    
    Fixes: 9464ca650008 ("net: make u64_stats_init() a function")
    Closes: https://lore.kernel.org/netdev/ea1567d9-ce66-45e6-8168-ac40a47d1821@roeck-us.net/
    Signed-off-by: Petr Tesarik <petr@tesarici.cz>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240404075740.30682-1-petr@tesarici.cz
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

vhost: Add smp_rmb() in vhost_enable_notify() [+ + +]

Author: Gavin Shan <gshan@redhat.com>
Date:   Thu Mar 28 10:21:48 2024 +1000

    vhost: Add smp_rmb() in vhost_enable_notify()
    
    commit df9ace7647d4123209395bb9967e998d5758c645 upstream.
    
    A smp_rmb() has been missed in vhost_enable_notify(), inspired by
    Will. Otherwise, it's not ensured the available ring entries pushed
    by guest can be observed by vhost in time, leading to stale available
    ring entries fetched by vhost in vhost_get_vq_desc(), as reported by
    Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
    
      /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
      -accel kvm -machine virt,gic-version=host -cpu host          \
      -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
      -m 4096M,slots=16,maxmem=64G                                 \
      -object memory-backend-ram,id=mem0,size=4096M                \
       :                                                           \
      -netdev tap,id=vnet0,vhost=true                              \
      -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
       :
      guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
      virtio_net virtio0: output.0:id 100 is not a head!
    
    Add the missed smp_rmb() in vhost_enable_notify(). When it returns true,
    it means there's still pending tx buffers. Since it might read indices,
    so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that
    it should be safe until vq->avail_idx is changed by commit d3bb267bbdcb
    ("vhost: cache avail index in vhost_enable_notify()").
    
    Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
    Cc: <stable@kernel.org> # v5.18+
    Reported-by: Yihuang Yu <yihyu@redhat.com>
    Suggested-by: Will Deacon <will@kernel.org>
    Signed-off-by: Gavin Shan <gshan@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Message-Id: <20240328002149.1141302-3-gshan@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vhost: Add smp_rmb() in vhost_vq_avail_empty() [+ + +]

Author: Gavin Shan <gshan@redhat.com>
Date:   Thu Mar 28 10:21:47 2024 +1000

    vhost: Add smp_rmb() in vhost_vq_avail_empty()
    
    commit 22e1992cf7b034db5325660e98c41ca5afa5f519 upstream.
    
    A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
    Will. Otherwise, it's not ensured the available ring entries pushed
    by guest can be observed by vhost in time, leading to stale available
    ring entries fetched by vhost in vhost_get_vq_desc(), as reported by
    Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
    
      /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
      -accel kvm -machine virt,gic-version=host -cpu host          \
      -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
      -m 4096M,slots=16,maxmem=64G                                 \
      -object memory-backend-ram,id=mem0,size=4096M                \
       :                                                           \
      -netdev tap,id=vnet0,vhost=true                              \
      -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
       :
      guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
      virtio_net virtio0: output.0:id 100 is not a head!
    
    Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch()
    returns true, it means there's still pending tx buffers. Since it might
    read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc().
    Note that it should be safe until vq->avail_idx is changed by commit
    275bf960ac697 ("vhost: better detection of available buffers").
    
    Fixes: 275bf960ac69 ("vhost: better detection of available buffers")
    Cc: <stable@kernel.org> # v4.11+
    Reported-by: Yihuang Yu <yihyu@redhat.com>
    Suggested-by: Will Deacon <will@kernel.org>
    Signed-off-by: Gavin Shan <gshan@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Message-Id: <20240328002149.1141302-2-gshan@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

virtio_net: Do not send RSS key if it is not supported [+ + +]

Author: Breno Leitao <leitao@debian.org>
Date:   Wed Apr 3 08:43:12 2024 -0700

    virtio_net: Do not send RSS key if it is not supported
    
    commit 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 upstream.
    
    There is a bug when setting the RSS options in virtio_net that can break
    the whole machine, getting the kernel into an infinite loop.
    
    Running the following command in any QEMU virtual machine with virtionet
    will reproduce this problem:
    
        # ethtool -X eth0  hfunc toeplitz
    
    This is how the problem happens:
    
    1) ethtool_set_rxfh() calls virtnet_set_rxfh()
    
    2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
    
    3) virtnet_commit_rss_command() populates 4 entries for the rss
    scatter-gather
    
    4) Since the command above does not have a key, then the last
    scatter-gatter entry will be zeroed, since rss_key_size == 0.
    sg_buf_size = vi->rss_key_size;
    
    5) This buffer is passed to qemu, but qemu is not happy with a buffer
    with zero length, and do the following in virtqueue_map_desc() (QEMU
    function):
    
      if (!sz) {
          virtio_error(vdev, "virtio: zero sized buffers are not allowed");
    
    6) virtio_error() (also QEMU function) set the device as broken
    
        vdev->broken = true;
    
    7) Qemu bails out, and do not repond this crazy kernel.
    
    8) The kernel is waiting for the response to come back (function
    virtnet_send_command())
    
    9) The kernel is waiting doing the following :
    
          while (!virtqueue_get_buf(vi->cvq, &tmp) &&
                 !virtqueue_is_broken(vi->cvq))
                  cpu_relax();
    
    10) None of the following functions above is true, thus, the kernel
    loops here forever. Keeping in mind that virtqueue_is_broken() does
    not look at the qemu `vdev->broken`, so, it never realizes that the
    vitio is broken at QEMU side.
    
    Fix it by not sending RSS commands if the feature is not available in
    the device.
    
    Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
    Cc: stable@vger.kernel.org
    Cc: qemu-devel@nongnu.org
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
    Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/apic: Force native_apic_mem_read() to use the MOV instruction [+ + +]

Author: Adam Dunlap <acdunlap@google.com>
Date:   Mon Mar 18 16:09:27 2024 -0700

    x86/apic: Force native_apic_mem_read() to use the MOV instruction
    
    commit 5ce344beaca688f4cdea07045e0b8f03dc537e74 upstream.
    
    When done from a virtual machine, instructions that touch APIC memory
    must be emulated. By convention, MMIO accesses are typically performed
    via io.h helpers such as readl() or writeq() to simplify instruction
    emulation/decoding (ex: in KVM hosts and SEV guests) [0].
    
    Currently, native_apic_mem_read() does not follow this convention,
    allowing the compiler to emit instructions other than the MOV
    instruction generated by readl(). In particular, when the kernel is
    compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler
    would emit a TESTL instruction which is not supported by the SEV-ES
    emulator, causing a boot failure in that environment. It is likely the
    same problem would happen in a TDX guest as that uses the same
    instruction emulator as SEV-ES.
    
    To make sure all emulators can emulate APIC memory reads via MOV, use
    the readl() function in native_apic_mem_read(). It is expected that any
    emulator would support MOV in any addressing mode as it is the most
    generic and is what is usually emitted currently.
    
    The TESTL instruction is emitted when native_apic_mem_read() is inlined
    into apic_mem_wait_icr_idle(). The emulator comes from
    insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to
    extend insn_decode_mmio() to support more instructions since, in theory,
    the compiler could choose to output nearly any instruction for such
    reads which would bloat the emulator beyond reason.
    
      [0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.intel.com/
    
      [ bp: Massage commit message, fix typos. ]
    
    Signed-off-by: Adam Dunlap <acdunlap@google.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
    Tested-by: Kevin Loughlin <kevinloughlin@google.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240318230927.2191933-1-acdunlap@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES [+ + +]

Author: Josh Poimboeuf <jpoimboe@kernel.org>
Date:   Wed Apr 10 22:40:46 2024 -0700

    x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
    
    commit cb2db5bb04d7f778fbc1a1ea2507aab436f1bff3 upstream.
    
    There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and
    over.  It's even read in the BHI sysfs function which is a big no-no.
    Just read it once and cache it.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Clarify that syscall hardening isn't a BHI mitigation [+ + +]

Author: Josh Poimboeuf <jpoimboe@kernel.org>
Date:   Wed Apr 10 22:40:48 2024 -0700

    x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
    
    commit 5f882f3b0a8bf0788d5a0ee44b1191de5319bb8a upstream.
    
    While syscall hardening helps prevent some BHI attacks, there's still
    other low-hanging fruit remaining.  Don't classify it as a mitigation
    and make it clear that the system may still be vulnerable if it doesn't
    have a HW or SW mitigation enabled.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Fix BHI documentation [+ + +]

Author: Josh Poimboeuf <jpoimboe@kernel.org>
Date:   Wed Apr 10 22:40:45 2024 -0700

    x86/bugs: Fix BHI documentation
    
    commit dfe648903f42296866d79f10d03f8c85c9dfba30 upstream.
    
    Fix up some inaccuracies in the BHI documentation.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Fix BHI handling of RRSBA [+ + +]

Author: Josh Poimboeuf <jpoimboe@kernel.org>
Date:   Wed Apr 10 22:40:47 2024 -0700

    x86/bugs: Fix BHI handling of RRSBA
    
    commit 1cea8a280dfd1016148a3820676f2f03e3f5b898 upstream.
    
    The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been
    disabled by the Spectre v2 mitigation (or can otherwise be disabled by
    the BHI mitigation itself if needed).  In that case retpolines are fine.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Fix return type of spectre_bhi_state() [+ + +]

Author: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Date:   Tue Apr 9 16:08:05 2024 -0700

    x86/bugs: Fix return type of spectre_bhi_state()
    
    commit 04f4230e2f86a4e961ea5466eda3db8c1762004d upstream.
    
    The definition of spectre_bhi_state() incorrectly returns a const char
    * const. This causes the a compiler warning when building with W=1:
    
     warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
     2812 | static const char * const spectre_bhi_state(void)
    
    Remove the const qualifier from the pointer.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Reported-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: https://lore.kernel.org/r/20240409230806.1545822-1-daniel.sneddon@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto [+ + +]

Author: Josh Poimboeuf <jpoimboe@kernel.org>
Date:   Wed Apr 10 22:40:50 2024 -0700

    x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
    
    commit 36d4fe147c870f6d3f6602befd7ef44393a1c87a upstream.
    
    Unlike most other mitigations' "auto" options, spectre_bhi=auto only
    mitigates newer systems, which is confusing and not particularly useful.
    
    Remove it.
    
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr' [+ + +]

Author: Ingo Molnar <mingo@kernel.org>
Date:   Thu Apr 11 09:25:36 2024 +0200

    x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'
    
    commit d0485730d2189ffe5d986d4e9e191f1e4d5ffd24 upstream.
    
    So we are using the 'ia32_cap' value in a number of places,
    which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register.
    
    But there's very little 'IA32' about it - this isn't 32-bit only
    code, nor does it originate from there, it's just a historic
    quirk that many Intel MSR names are prefixed with IA32_.
    
    This is already clear from the helper method around the MSR:
    x86_read_arch_cap_msr(), which doesn't have the IA32 prefix.
    
    So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with
    its role and with the naming of the helper function.
    
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Nikolay Borisov <nik.borisov@suse.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI [+ + +]

Author: Josh Poimboeuf <jpoimboe@kernel.org>
Date:   Wed Apr 10 22:40:51 2024 -0700

    x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI
    
    commit 4f511739c54b549061993b53fc0380f48dfca23b upstream.
    
    For consistency with the other CONFIG_MITIGATION_* options, replace the
    CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
    CONFIG_MITIGATION_SPECTRE_BHI option.
    
    [ mingo: Fix ]
    
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Nikolay Borisov <nik.borisov@suse.com>
    Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Apr 9 10:51:05 2024 -0700

    x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
    
    commit f337a6a21e2fd67eadea471e93d05dd37baaa9be upstream.
    
    Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built
    with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly
    states that disabling SPECULATION_MITIGATIONS is supposed to turn off all
    mitigations by default.
    
      Б■┌ If you say N, all mitigations will be disabled. You really
      Б■┌ should know what you are doing to say so.
    
    As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in
    some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n.
    
    Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs")
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Cc: stable@vger.kernel.org
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: https://lore.kernel.org/r/20240409175108.1512861-2-seanjc@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Apr 4 20:27:38 2024 +0000

    xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
    
    [ Upstream commit 237f3cf13b20db183d3706d997eedc3c49eacd44 ]
    
    syzbot reported an illegal copy in xsk_setsockopt() [1]
    
    Make sure to validate setsockopt() @optlen parameter.
    
    [1]
    
     BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
     BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
     BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
    Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
    
    CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
      print_address_description mm/kasan/report.c:377 [inline]
      print_report+0x169/0x550 mm/kasan/report.c:488
      kasan_report+0x143/0x180 mm/kasan/report.c:601
      copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
      copy_from_sockptr include/linux/sockptr.h:55 [inline]
      xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
      do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
      __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
      __do_sys_setsockopt net/socket.c:2343 [inline]
      __se_sys_setsockopt net/socket.c:2340 [inline]
      __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    RIP: 0033:0x7fb40587de69
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69
    RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006
    RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000
    R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000
    R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08
     </TASK>
    
    Allocated by task 7549:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
      __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
      kasan_kmalloc include/linux/kasan.h:211 [inline]
      __do_kmalloc_node mm/slub.c:3966 [inline]
      __kmalloc+0x233/0x4a0 mm/slub.c:3979
      kmalloc include/linux/slab.h:632 [inline]
      __cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869
      do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
      __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
      __do_sys_setsockopt net/socket.c:2343 [inline]
      __se_sys_setsockopt net/socket.c:2340 [inline]
      __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    The buggy address belongs to the object at ffff888028c6cde0
     which belongs to the cache kmalloc-8 of size 8
    The buggy address is located 1 bytes to the right of
     allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
    
    The buggy address belongs to the physical page:
    page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c
    anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff)
    page_type: 0xffffffff()
    raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001
    raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    page_owner tracks the page as allocated
    page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223
      set_page_owner include/linux/page_owner.h:31 [inline]
      post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
      prep_new_page mm/page_alloc.c:1540 [inline]
      get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
      __alloc_pages+0x256/0x680 mm/page_alloc.c:4569
      __alloc_pages_node include/linux/gfp.h:238 [inline]
      alloc_pages_node include/linux/gfp.h:261 [inline]
      alloc_slab_page+0x5f/0x160 mm/slub.c:2175
      allocate_slab mm/slub.c:2338 [inline]
      new_slab+0x84/0x2f0 mm/slub.c:2391
      ___slab_alloc+0xc73/0x1260 mm/slub.c:3525
      __slab_alloc mm/slub.c:3610 [inline]
      __slab_alloc_node mm/slub.c:3663 [inline]
      slab_alloc_node mm/slub.c:3835 [inline]
      __do_kmalloc_node mm/slub.c:3965 [inline]
      __kmalloc_node+0x2db/0x4e0 mm/slub.c:3973
      kmalloc_node include/linux/slab.h:648 [inline]
      __vmalloc_area_node mm/vmalloc.c:3197 [inline]
      __vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392
      __vmalloc_node mm/vmalloc.c:3457 [inline]
      vzalloc+0x79/0x90 mm/vmalloc.c:3530
      bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162
      bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895
      __sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631
      __do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
      __se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
      __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    page last free pid 6650 tgid 6647 stack trace:
      reset_page_owner include/linux/page_owner.h:24 [inline]
      free_pages_prepare mm/page_alloc.c:1140 [inline]
      free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346
      free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532
      release_pages+0x2117/0x2400 mm/swap.c:1042
      tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
      tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
      tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300
      tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392
      exit_mmap+0x4b6/0xd40 mm/mmap.c:3300
      __mmput+0x115/0x3c0 kernel/fork.c:1345
      exit_mm+0x220/0x310 kernel/exit.c:569
      do_exit+0x99e/0x27e0 kernel/exit.c:865
      do_group_exit+0x207/0x2c0 kernel/exit.c:1027
      get_signal+0x176e/0x1850 kernel/signal.c:2907
      arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310
      exit_to_user_mode_loop kernel/entry/common.c:105 [inline]
      exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
      __syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline]
      syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212
      do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Memory state around the buggy address:
     ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
     ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
    >ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
                                                           ^
     ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
     ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
    
    Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: "Bjц╤rn Tц╤pel" <bjorn@kernel.org>
    Cc: Magnus Karlsson <magnus.karlsson@intel.com>
    Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/r/20240404202738.3634547-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Список изменений в Linux 6.8.7