Список изменений в ядре 6.7.10

arch/arm/mm: fix major fault accounting when retrying under per-VMA lock [+ + +]

Author: Suren Baghdasaryan <surenb@google.com>
Date:   Mon Jan 22 22:43:05 2024 -0800

    arch/arm/mm: fix major fault accounting when retrying under per-VMA lock
    
    [ Upstream commit e870920bbe68e52335a4c31a059e6af6a9a59dbb ]
    
    The change [1] missed ARM architecture when fixing major fault accounting
    for page fault retry under per-VMA lock.
    
    The user-visible effects is that it restores correct major fault
    accounting that was broken after [2] was merged in 6.7 kernel. The
    more detailed description is in [3] and this patch simply adds the
    same fix to ARM architecture which I missed in [3].
    
    Add missing code to fix ARM architecture fault accounting.
    
    [1] 46e714c729c8 ("arch/mm/fault: fix major fault accounting when retrying under per-VMA lock")
    [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/
    [3] https://lore.kernel.org/all/20231226214610.109282-1-surenb@google.com/
    
    Link: https://lkml.kernel.org/r/20240123064305.2829244-1-surenb@google.com
    Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock")
    Reported-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Cc: Alexander Gordeev <agordeev@linux.ibm.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: 9328/1: mm: try VMA lock-based page fault handling first [+ + +]

Author: Wang Kefeng <wangkefeng.wang@huawei.com>
Date:   Thu Oct 19 12:21:35 2023 +0100

    ARM: 9328/1: mm: try VMA lock-based page fault handling first
    
    [ Upstream commit c16af1212479570454752671a170a1756e11fdfb ]
    
    Attempt VMA lock-based page fault handling first, and fall back to the
    existing mmap_lock-based handling if that fails, the ebizzy benchmark
    shows 25% improvement on qemu with 2 cpus.
    
    Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Stable-dep-of: e870920bbe68 ("arch/arm/mm: fix major fault accounting when retrying under per-VMA lock")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bpf: check bpf_func_state->callback_depth when pruning states [+ + +]

Author: Eduard Zingerman <eddyz87@gmail.com>
Date:   Thu Feb 22 17:41:20 2024 +0200

    bpf: check bpf_func_state->callback_depth when pruning states
    
    [ Upstream commit e9a8e5a587ca55fec6c58e4881742705d45bee54 ]
    
    When comparing current and cached states verifier should consider
    bpf_func_state->callback_depth. Current state cannot be pruned against
    cached state, when current states has more iterations left compared to
    cached state. Current state has more iterations left when it's
    callback_depth is smaller.
    
    Below is an example illustrating this bug, minimized from mailing list
    discussion [0] (assume that BPF_F_TEST_STATE_FREQ is set).
    The example is not a safe program: if loop_cb point (1) is followed by
    loop_cb point (2), then division by zero is possible at point (4).
    
        struct ctx {
            __u64 a;
            __u64 b;
            __u64 c;
        };
    
        static void loop_cb(int i, struct ctx *ctx)
        {
            /* assume that generated code is "fallthrough-first":
             * if ... == 1 goto
             * if ... == 2 goto
             * <default>
             */
            switch (bpf_get_prandom_u32()) {
            case 1:  /* 1 */ ctx->a = 42; return 0; break;
            case 2:  /* 2 */ ctx->b = 42; return 0; break;
            default: /* 3 */ ctx->c = 42; return 0; break;
            }
        }
    
        SEC("tc")
        __failure
        __flag(BPF_F_TEST_STATE_FREQ)
        int test(struct __sk_buff *skb)
        {
            struct ctx ctx = { 7, 7, 7 };
    
            bpf_loop(2, loop_cb, &ctx, 0);              /* 0 */
            /* assume generated checks are in-order: .a first */
            if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7)
                    asm volatile("r0 /= 0;":::"r0");    /* 4 */
            return 0;
        }
    
    Prior to this commit verifier built the following checkpoint tree for
    this example:
    
     .------------------------------------- Checkpoint / State name
     |    .-------------------------------- Code point number
     |    |   .---------------------------- Stack state {ctx.a,ctx.b,ctx.c}
     |    |   |        .------------------- Callback depth in frame #0
     v    v   v        v
       - (0) {7P,7P,7},depth=0
         - (3) {7P,7P,7},depth=1
           - (0) {7P,7P,42},depth=1
             - (3) {7P,7,42},depth=2
               - (0) {7P,7,42},depth=2      loop terminates because of depth limit
                 - (4) {7P,7,42},depth=0    predicted false, ctx.a marked precise
                 - (6) exit
    (a)      - (2) {7P,7,42},depth=2
               - (0) {7P,42,42},depth=2     loop terminates because of depth limit
                 - (4) {7P,42,42},depth=0   predicted false, ctx.a marked precise
                 - (6) exit
    (b)      - (1) {7P,7P,42},depth=2
               - (0) {42P,7P,42},depth=2    loop terminates because of depth limit
                 - (4) {42P,7P,42},depth=0  predicted false, ctx.{a,b} marked precise
                 - (6) exit
         - (2) {7P,7,7},depth=1             considered safe, pruned using checkpoint (a)
    (c)  - (1) {7P,7P,7},depth=1            considered safe, pruned using checkpoint (b)
    
    Here checkpoint (b) has callback_depth of 2, meaning that it would
    never reach state {42,42,7}.
    While checkpoint (c) has callback_depth of 1, and thus
    could yet explore the state {42,42,7} if not pruned prematurely.
    This commit makes forbids such premature pruning,
    allowing verifier to explore states sub-tree starting at (c):
    
    (c)  - (1) {7,7,7P},depth=1
           - (0) {42P,7,7P},depth=1
             ...
             - (2) {42,7,7},depth=2
               - (0) {42,42,7},depth=2      loop terminates because of depth limit
                 - (4) {42,42,7},depth=0    predicted true, ctx.{a,b,c} marked precise
                   - (5) division by zero
    
    [0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/
    
    Fixes: bb124da69c47 ("bpf: keep track of max number of bpf_loop callback iterations")
    Suggested-by: Yonghong Song <yonghong.song@linux.dev>
    Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20240222154121.6991-2-eddyz87@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cpumap: Zero-initialise xdp_rxq_info struct before running XDP program [+ + +]

Author: Toke Hц╦iland-Jц╦rgensen <toke@redhat.com>
Date:   Tue Mar 5 22:31:32 2024 +0100

    cpumap: Zero-initialise xdp_rxq_info struct before running XDP program
    
    [ Upstream commit 2487007aa3b9fafbd2cb14068f49791ce1d7ede5 ]
    
    When running an XDP program that is attached to a cpumap entry, we don't
    initialise the xdp_rxq_info data structure being used in the xdp_buff
    that backs the XDP program invocation. Tobias noticed that this leads to
    random values being returned as the xdp_md->rx_queue_index value for XDP
    programs running in a cpumap.
    
    This means we're basically returning the contents of the uninitialised
    memory, which is bad. Fix this by zero-initialising the rxq data
    structure before running the XDP program.
    
    Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
    Reported-by: Tobias Bц╤hm <tobias@aibor.de>
    Signed-off-by: Toke Hц╦iland-Jц╦rgensen <toke@redhat.com>
    Link: https://lore.kernel.org/r/20240305213132.11955-1-toke@redhat.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dmaengine: fsl-edma: correct max_segment_size setting [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Wed Feb 7 14:47:32 2024 -0500

    dmaengine: fsl-edma: correct max_segment_size setting
    
    [ Upstream commit a79f949a5ce1d45329d63742c2a995f2b47f9852 ]
    
    Correcting the previous setting of 0x3fff to the actual value of 0x7fff.
    
    Introduced new macro 'EDMA_TCD_ITER_MASK' for improved code clarity and
    utilization of FIELD_GET to obtain the accurate maximum value.
    
    Cc: stable@vger.kernel.org
    Fixes: e06748539432 ("dmaengine: fsl-edma: support edma memcpy")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Link: https://lore.kernel.org/r/20240207194733.2112870-1-Frank.Li@nxp.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dmaengine: fsl-edma: utilize common dt-binding header file [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Tue Nov 14 10:48:23 2023 -0500

    dmaengine: fsl-edma: utilize common dt-binding header file
    
    [ Upstream commit d0e217b72f9f5c5ef35e3423d393ea8093ce98ec ]
    
    Refactor the code to use the common dt-binding header file, fsl-edma.h.
    Renaming ARGS* to FSL_EDMA*, ensuring no functional changes.
    
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Link: https://lore.kernel.org/r/20231114154824.3617255-4-Frank.Li@nxp.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Stable-dep-of: a79f949a5ce1 ("dmaengine: fsl-edma: correct max_segment_size setting")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Documentation/hw-vuln: Add documentation for RFDS [+ + +]

Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date:   Mon Mar 11 12:29:43 2024 -0700

    Documentation/hw-vuln: Add documentation for RFDS
    
    commit 4e42765d1be01111df0c0275bbaf1db1acef346e upstream.
    
    Add the documentation for transient execution vulnerability Register
    File Data Sampling (RFDS) that affects Intel Atom CPUs.
    
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in dts [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Tue Nov 14 10:48:22 2023 -0500

    dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in dts
    
    [ Upstream commit 1e9b05258271b76ccc04a4b535009d2cb596506a ]
    
    Introduce a common dt-bindings header file, fsl-edma.h, shared between
    the driver and dts files. This addition aims to eliminate hardcoded values
    in dts files, promoting maintainability and consistency.
    
    DTS header file not support BIT() macro yet. Directly use 2^n number.
    
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Reviewed-by: Rob Herring <robh@kernel.org>
    Link: https://lore.kernel.org/r/20231114154824.3617255-3-Frank.Li@nxp.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Stable-dep-of: a79f949a5ce1 ("dmaengine: fsl-edma: correct max_segment_size setting")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

erofs: apply proper VMA alignment for memory mapped files on THP [+ + +]

Author: Gao Xiang <xiang@kernel.org>
Date:   Wed Mar 6 13:31:38 2024 +0800

    erofs: apply proper VMA alignment for memory mapped files on THP
    
    [ Upstream commit 4127caee89612a84adedd78c9453089138cd5afe ]
    
    There are mainly two reasons that thp_get_unmapped_area() should be
    used for EROFS as other filesystems:
    
     - It's needed to enable PMD mappings as a FSDAX filesystem, see
       commit 74d2fad1334d ("thp, dax: add thp_get_unmapped_area for pmd
       mappings");
    
     - It's useful together with large folios and
       CONFIG_READ_ONLY_THP_FOR_FS which enable THPs for mmapped files
       (e.g. shared libraries) even without FSDAX.  See commit 1854bc6e2420
       ("mm/readahead: Align file mappings for non-DAX").
    
    Fixes: 06252e9ce05b ("erofs: dax support for non-tailpacking regular file")
    Fixes: ce529cc25b18 ("erofs: enable large folios for iomap mode")
    Fixes: e6687b89225e ("erofs: enable large folios for fscache mode")
    Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
    Link: https://lore.kernel.org/r/20240306053138.2240206-1-hsiangkao@linux.alibaba.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

geneve: make sure to pull inner header in geneve_rx() [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 29 13:11:52 2024 +0000

    geneve: make sure to pull inner header in geneve_rx()
    
    [ Upstream commit 1ca1ba465e55b9460e4e75dec9fff31e708fec74 ]
    
    syzbot triggered a bug in geneve_rx() [1]
    
    Issue is similar to the one I fixed in commit 8d975c15c0cd
    ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()")
    
    We have to save skb->network_header in a temporary variable
    in order to be able to recompute the network_header pointer
    after a pskb_inet_may_pull() call.
    
    pskb_inet_may_pull() makes sure the needed headers are in skb->head.
    
    [1]
    BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
     BUG: KMSAN: uninit-value in geneve_rx drivers/net/geneve.c:279 [inline]
     BUG: KMSAN: uninit-value in geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391
      IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
      geneve_rx drivers/net/geneve.c:279 [inline]
      geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391
      udp_queue_rcv_one_skb+0x1d39/0x1f20 net/ipv4/udp.c:2108
      udp_queue_rcv_skb+0x6ae/0x6e0 net/ipv4/udp.c:2186
      udp_unicast_rcv_skb+0x184/0x4b0 net/ipv4/udp.c:2346
      __udp4_lib_rcv+0x1c6b/0x3010 net/ipv4/udp.c:2422
      udp_rcv+0x7d/0xa0 net/ipv4/udp.c:2604
      ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205
      ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233
      NF_HOOK include/linux/netfilter.h:314 [inline]
      ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
      dst_input include/net/dst.h:461 [inline]
      ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
      NF_HOOK include/linux/netfilter.h:314 [inline]
      ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569
      __netif_receive_skb_one_core net/core/dev.c:5534 [inline]
      __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648
      process_backlog+0x480/0x8b0 net/core/dev.c:5976
      __napi_poll+0xe3/0x980 net/core/dev.c:6576
      napi_poll net/core/dev.c:6645 [inline]
      net_rx_action+0x8b8/0x1870 net/core/dev.c:6778
      __do_softirq+0x1b7/0x7c5 kernel/softirq.c:553
      do_softirq+0x9a/0xf0 kernel/softirq.c:454
      __local_bh_enable_ip+0x9b/0xa0 kernel/softirq.c:381
      local_bh_enable include/linux/bottom_half.h:33 [inline]
      rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
      __dev_queue_xmit+0x2768/0x51c0 net/core/dev.c:4378
      dev_queue_xmit include/linux/netdevice.h:3171 [inline]
      packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
      packet_snd net/packet/af_packet.c:3081 [inline]
      packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg net/socket.c:745 [inline]
      __sys_sendto+0x735/0xa10 net/socket.c:2191
      __do_sys_sendto net/socket.c:2203 [inline]
      __se_sys_sendto net/socket.c:2199 [inline]
      __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x63/0x6b
    
    Uninit was created at:
      slab_post_alloc_hook mm/slub.c:3819 [inline]
      slab_alloc_node mm/slub.c:3860 [inline]
      kmem_cache_alloc_node+0x5cb/0xbc0 mm/slub.c:3903
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
      __alloc_skb+0x352/0x790 net/core/skbuff.c:651
      alloc_skb include/linux/skbuff.h:1296 [inline]
      alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6394
      sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2783
      packet_alloc_skb net/packet/af_packet.c:2930 [inline]
      packet_snd net/packet/af_packet.c:3024 [inline]
      packet_sendmsg+0x70c2/0x9f10 net/packet/af_packet.c:3113
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg net/socket.c:745 [inline]
      __sys_sendto+0x735/0xa10 net/socket.c:2191
      __do_sys_sendto net/socket.c:2203 [inline]
      __se_sys_sendto net/socket.c:2199 [inline]
      __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x63/0x6b
    
    Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
    Reported-and-tested-by: syzbot+6a1423ff3f97159aae64@syzkaller.appspotmail.com
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i40e: disable NAPI right after disabling irqs when handling xsk_pool [+ + +]

Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date:   Tue Feb 20 22:45:52 2024 +0100

    i40e: disable NAPI right after disabling irqs when handling xsk_pool
    
    [ Upstream commit d562b11c1eac7d73f4c778b4cbe5468f86b1f20d ]
    
    Disable NAPI before shutting down queues that this particular NAPI
    contains so that the order of actions in i40e_queue_pair_disable()
    mirrors what we do in i40e_queue_pair_enable().
    
    Fixes: 123cecd427b6 ("i40e: added queue pair disable/enable functions")
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
    Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: fix uninitialized dplls mutex usage [+ + +]

Author: Michal Schmidt <mschmidt@redhat.com>
Date:   Fri Mar 1 14:37:08 2024 +0100

    ice: fix uninitialized dplls mutex usage
    
    [ Upstream commit 9224fc86f1776193650a33a275cac628952f80a9 ]
    
    The pf->dplls.lock mutex is initialized too late, after its first use.
    Move it to the top of ice_dpll_init.
    Note that the "err_exit" error path destroys the mutex. And the mutex is
    the last thing destroyed in ice_dpll_deinit.
    This fixes the following warning with CONFIG_DEBUG_MUTEXES:
    
     ice 0000:10:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.36.0
     ice 0000:10:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
     ice 0000:10:00.0: PTP init successful
     ------------[ cut here ]------------
     DEBUG_LOCKS_WARN_ON(lock->magic != lock)
     WARNING: CPU: 0 PID: 410 at kernel/locking/mutex.c:587 __mutex_lock+0x773/0xd40
     Modules linked in: crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ice(+) nvme nvme_c>
     CPU: 0 PID: 410 Comm: kworker/0:4 Not tainted 6.8.0-rc5+ #3
     Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 10/19/2023
     Workqueue: events work_for_cpu_fn
     RIP: 0010:__mutex_lock+0x773/0xd40
     Code: c0 0f 84 1d f9 ff ff 44 8b 35 0d 9c 69 01 45 85 f6 0f 85 0d f9 ff ff 48 c7 c6 12 a2 a9 85 48 c7 c7 12 f1 a>
     RSP: 0018:ff7eb1a3417a7ae0 EFLAGS: 00010286
     RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
     RDX: 0000000000000002 RSI: ffffffff85ac2bff RDI: 00000000ffffffff
     RBP: ff7eb1a3417a7b80 R08: 0000000000000000 R09: 00000000ffffbfff
     R10: ff7eb1a3417a7978 R11: ff32b80f7fd2e568 R12: 0000000000000000
     R13: 0000000000000000 R14: 0000000000000000 R15: ff32b7f02c50e0d8
     FS:  0000000000000000(0000) GS:ff32b80efe800000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 000055b5852cc000 CR3: 000000003c43a004 CR4: 0000000000771ef0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     PKRU: 55555554
     Call Trace:
      <TASK>
      ? __warn+0x84/0x170
      ? __mutex_lock+0x773/0xd40
      ? report_bug+0x1c7/0x1d0
      ? prb_read_valid+0x1b/0x30
      ? handle_bug+0x42/0x70
      ? exc_invalid_op+0x18/0x70
      ? asm_exc_invalid_op+0x1a/0x20
      ? __mutex_lock+0x773/0xd40
      ? rcu_is_watching+0x11/0x50
      ? __kmalloc_node_track_caller+0x346/0x490
      ? ice_dpll_lock_status_get+0x28/0x50 [ice]
      ? __pfx_ice_dpll_lock_status_get+0x10/0x10 [ice]
      ? ice_dpll_lock_status_get+0x28/0x50 [ice]
      ice_dpll_lock_status_get+0x28/0x50 [ice]
      dpll_device_get_one+0x14f/0x2e0
      dpll_device_event_send+0x7d/0x150
      dpll_device_register+0x124/0x180
      ice_dpll_init_dpll+0x7b/0xd0 [ice]
      ice_dpll_init+0x224/0xa40 [ice]
      ? _dev_info+0x70/0x90
      ice_load+0x468/0x690 [ice]
      ice_probe+0x75b/0xa10 [ice]
      ? _raw_spin_unlock_irqrestore+0x4f/0x80
      ? process_one_work+0x1a3/0x500
      local_pci_probe+0x47/0xa0
      work_for_cpu_fn+0x17/0x30
      process_one_work+0x20d/0x500
      worker_thread+0x1df/0x3e0
      ? __pfx_worker_thread+0x10/0x10
      kthread+0x103/0x140
      ? __pfx_kthread+0x10/0x10
      ret_from_fork+0x31/0x50
      ? __pfx_kthread+0x10/0x10
      ret_from_fork_asm+0x1b/0x30
      </TASK>
     irq event stamp: 125197
     hardirqs last  enabled at (125197): [<ffffffff8416409d>] finish_task_switch.isra.0+0x12d/0x3d0
     hardirqs last disabled at (125196): [<ffffffff85134044>] __schedule+0xea4/0x19f0
     softirqs last  enabled at (105334): [<ffffffff84e1e65a>] napi_get_frags_check+0x1a/0x60
     softirqs last disabled at (105332): [<ffffffff84e1e65a>] napi_get_frags_check+0x1a/0x60
     ---[ end trace 0000000000000000 ]---
    
    Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
    Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
    Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: reconfig host after changing MSI-X on VF [+ + +]

Author: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Date:   Fri Feb 23 07:40:24 2024 +0100

    ice: reconfig host after changing MSI-X on VF
    
    [ Upstream commit 4035c72dc1ba81a96f94de84dfd5409056c1d9c9 ]
    
    During VSI reconfiguration filters and VSI config which is set in
    ice_vf_init_host_cfg() are lost. Recall the host configuration function
    to restore them.
    
    Without this config VF on which MSI-X amount was changed might had a
    connection problems.
    
    Fixes: 4d38cb44bd32 ("ice: manage VFs MSI-X using resource tracking")
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: reorder disabling IRQ and NAPI in ice_qp_dis [+ + +]

Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date:   Tue Feb 20 22:45:53 2024 +0100

    ice: reorder disabling IRQ and NAPI in ice_qp_dis
    
    [ Upstream commit 99099c6bc75a30b76bb5d6774a0509ab6f06af05 ]
    
    ice_qp_dis() currently does things in very mixed way. Tx is stopped
    before disabling IRQ on related queue vector, then it takes care of
    disabling Rx and finally NAPI is disabled.
    
    Let us start with disabling IRQs in the first place followed by turning
    off NAPI. Then it is safe to handle queues.
    
    One subtle change on top of that is that even though ice_qp_ena() looks
    more sane, clear ICE_CFG_BUSY as the last thing there.
    
    Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
    Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi() [+ + +]

Author: Jacob Keller <jacob.e.keller@intel.com>
Date:   Tue Nov 28 11:42:15 2023 -0800

    ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()
    
    [ Upstream commit 2a2cb4c6c18130e9f14d2e39deb75590744d98ef ]
    
    The ice_vf_create_vsi() function and its VF ops helper introduced by commit
    a4c785e8162e ("ice: convert vf_ops .vsi_rebuild to .create_vsi") are used
    during an individual VF reset to re-create the VSI. This was done in order
    to ensure that the VSI gets properly reconfigured within the hardware.
    
    This is somewhat heavy handed as we completely release the VSI memory and
    structure, and then create a new VSI. This can also potentially force a
    change of the VSI index as we will re-use the first open slot in the VSI
    array which may not be the same.
    
    As part of implementing devlink reload, commit 6624e780a577 ("ice: split
    ice_vsi_setup into smaller functions") split VSI setup into smaller
    functions, introducing both ice_vsi_cfg() and ice_vsi_decfg() which can be
    used to configure or deconfigure an existing software VSI structure.
    
    Rather than completely removing the VSI and adding a new one using the
    .create_vsi() VF operation, simply use ice_vsi_decfg() to remove the
    current configuration. Save the VSI type and then call ice_vsi_cfg() to
    reconfigure the VSI as the same type that it was before.
    
    The existing reset logic assumes that all hardware filters will be removed,
    so also call ice_fltr_remove_all() before re-configuring the VSI.
    
    This new operation does not re-create the VSI, so rename it to
    ice_vf_reconfig_vsi().
    
    The new approach can safely share the exact same flow for both SR-IOV VFs
    as well as the Scalable IOV VFs being worked on. This uses less code and is
    a better abstraction over fully deleting the VSI and adding a new one.
    
    Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
    Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Reviewed-by: Petr Oros <poros@redhat.com>
    Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Stable-dep-of: 4035c72dc1ba ("ice: reconfig host after changing MSI-X on VF")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: virtchnl: stop pretending to support RSS over AQ or registers [+ + +]

Author: Jacob Keller <jacob.e.keller@intel.com>
Date:   Wed Jan 31 13:51:58 2024 -0800

    ice: virtchnl: stop pretending to support RSS over AQ or registers
    
    [ Upstream commit 2652b99e43403dc464f3648483ffb38e48872fe4 ]
    
    The E800 series hardware uses the same iAVF driver as older devices,
    including the virtchnl negotiation scheme.
    
    This negotiation scheme includes a mechanism to determine what type of RSS
    should be supported, including RSS over PF virtchnl messages, RSS over
    firmware AdminQ messages, and RSS via direct register access.
    
    The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its
    supported by the VF driver. However, if an older VF driver is loaded, it
    may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ.
    
    The ice driver happily agrees to support these methods. Unfortunately, the
    underlying hardware does not support these mechanisms. The E800 series VFs
    don't have the appropriate registers for RSS_REG. The mailbox queue used by
    VFs for VF to PF communication blocks messages which do not have the
    VF-to-PF opcode.
    
    Stop lying to the VF that it could support RSS over AdminQ or registers, as
    these interfaces do not work when the hardware is operating on an E800
    series device.
    
    In practice this is unlikely to be hit by any normal user. The iAVF driver
    has supported RSS over PF virtchnl commands since 2016, and always defaults
    to using RSS_PF if possible.
    
    In principle, nothing actually stops the existing VF from attempting to
    access the registers or send an AQ command. However a properly coded VF
    will check the capability flags and will report a more useful error if it
    detects a case where the driver does not support the RSS offloads that it
    does.
    
    Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support")
    Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
    Reviewed-by: Alan Brady <alan.brady@intel.com>
    Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

idpf: disable local BH when scheduling napi for marker packets [+ + +]

Author: Emil Tantilov <emil.s.tantilov@intel.com>
Date:   Wed Feb 7 16:42:43 2024 -0800

    idpf: disable local BH when scheduling napi for marker packets
    
    [ Upstream commit 330068589389ccae3452db15ecacc3e147ac9c1c ]
    
    Fix softirq's not being handled during napi_schedule() call when
    receiving marker packets for queue disable by disabling local bottom
    half.
    
    The issue can be seen on ifdown:
    NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
    
    Using ftrace to catch the failing scenario:
    ifconfig   [003] d.... 22739.830624: softirq_raise: vec=3 [action=NET_RX]
    <idle>-0   [003] ..s.. 22739.831357: softirq_entry: vec=3 [action=NET_RX]
    
    No interrupt and CPU is idle.
    
    After the patch when disabling local BH before calling napi_schedule:
    ifconfig   [003] d.... 22993.928336: softirq_raise: vec=3 [action=NET_RX]
    ifconfig   [003] ..s1. 22993.928337: softirq_entry: vec=3 [action=NET_RX]
    
    Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support")
    Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
    Signed-off-by: Alan Brady <alan.brady@intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: avoid returning frame twice in XDP_REDIRECT [+ + +]

Author: Florian Kauer <florian.kauer@linutronix.de>
Date:   Mon Feb 19 10:08:43 2024 +0100

    igc: avoid returning frame twice in XDP_REDIRECT
    
    [ Upstream commit ef27f655b438bed4c83680e4f01e1cde2739854b ]
    
    When a frame can not be transmitted in XDP_REDIRECT
    (e.g. due to a full queue), it is necessary to free
    it by calling xdp_return_frame_rx_napi.
    
    However, this is the responsibility of the caller of
    the ndo_xdp_xmit (see for example bq_xmit_all in
    kernel/bpf/devmap.c) and thus calling it inside
    igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
    driver) as well will lead to memory corruption.
    
    In fact, bq_xmit_all expects that it can return all
    frames after the last successfully transmitted one.
    Therefore, break for the first not transmitted frame,
    but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
    This is equally implemented in other Intel drivers
    such as the igb.
    
    There are two alternatives to this that were rejected:
    1. Return num_frames as all the frames would have been
       transmitted and release them inside igc_xdp_xmit.
       While it might work technically, it is not what
       the return value is meant to represent (i.e. the
       number of SUCCESSFULLY transmitted packets).
    2. Rework kernel/bpf/devmap.c and all drivers to
       support non-consecutively dropped packets.
       Besides being complex, it likely has a negative
       performance impact without a significant gain
       since it is anyway unlikely that the next frame
       can be transmitted if the previous one was dropped.
    
    The memory corruption can be reproduced with
    the following script which leads to a kernel panic
    after a few seconds.  It basically generates more
    traffic than a i225 NIC can transmit and pushes it
    via XDP_REDIRECT from a virtual interface to the
    physical interface where frames get dropped.
    
       #!/bin/bash
       INTERFACE=enp4s0
       INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`
    
       sudo ip link add dev veth1 type veth peer name veth2
       sudo ip link set up $INTERFACE
       sudo ip link set up veth1
       sudo ip link set up veth2
    
       cat << EOF > redirect.bpf.c
    
       SEC("prog")
       int redirect(struct xdp_md *ctx)
       {
           return bpf_redirect($INTERFACE_IDX, 0);
       }
    
       char _license[] SEC("license") = "GPL";
       EOF
       clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
       sudo ip link set veth2 xdp obj redirect.bpf.o
    
       cat << EOF > pass.bpf.c
    
       SEC("prog")
       int pass(struct xdp_md *ctx)
       {
           return XDP_PASS;
       }
    
       char _license[] SEC("license") = "GPL";
       EOF
       clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
       sudo ip link set $INTERFACE xdp obj pass.bpf.o
    
       cat << EOF > trafgen.cfg
    
       {
         /* Ethernet Header */
         0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
         0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
         const16(ETH_P_IP),
    
         /* IPv4 Header */
         0b01000101, 0,   # IPv4 version, IHL, TOS
         const16(1028),   # IPv4 total length (UDP length + 20 bytes (IP header))
         const16(2),      # IPv4 ident
         0b01000000, 0,   # IPv4 flags, fragmentation off
         64,              # IPv4 TTL
         17,              # Protocol UDP
         csumip(14, 33),  # IPv4 checksum
    
         /* UDP Header */
         10,  0, 1, 1,    # IP Src - adapt as needed
         10,  0, 1, 2,    # IP Dest - adapt as needed
         const16(6666),   # UDP Src Port
         const16(6666),   # UDP Dest Port
         const16(1008),   # UDP length (UDP header 8 bytes + payload length)
         csumudp(14, 34), # UDP checksum
    
         /* Payload */
         fill('W', 1000),
       }
       EOF
    
       sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp
    
    Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action")
    Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
    Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Tested-by: Naama Meir <naamax.meir@linux.intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able [+ + +]

Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date:   Tue Feb 20 22:45:51 2024 +0100

    ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
    
    [ Upstream commit cbf996f52c4e658b3fb4349a869a62fd2d4c3c1c ]
    
    Currently routines that are supposed to toggle state of ring pair do not
    take care of associated interrupt with queue vector that these rings
    belong to. This causes funky issues such as dead interface due to irq
    misconfiguration, as per Pavel's report from Closes: tag.
    
    Add a function responsible for disabling single IRQ in EIMC register and
    call this as a very first thing when disabling ring pair during xsk_pool
    setup. For enable let's reuse ixgbe_irq_enable_queues(). Besides this,
    disable/enable NAPI as first/last thing when dealing with closing or
    opening ring pair that xsk_pool is being configured on.
    
    Reported-by: Pavel Vazharov <pavel@x3me.net>
    Closes: https://lore.kernel.org/netdev/CAJEV1ijxNyPTwASJER1bcZzS9nMoZJqfR86nu_3jFFVXzZQ4NA@mail.gmail.com/
    Fixes: 024aa5800f32 ("ixgbe: added Rx/Tx ring disable/enable functions")
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests [+ + +]

Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date:   Mon Mar 11 12:29:43 2024 -0700

    KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
    
    commit 2a0180129d726a4b953232175857d442651b55a0 upstream.
    
    Mitigation for RFDS requires RFDS_CLEAR capability which is enumerated
    by MSR_IA32_ARCH_CAPABILITIES bit 27. If the host has it set, export it
    to guests so that they can deploy the mitigation.
    
    RFDS_NO indicates that the system is not vulnerable to RFDS, export it
    to guests so that they don't deploy the mitigation unnecessarily. When
    the host is not affected by X86_BUG_RFDS, but has RFDS_NO=0, synthesize
    RFDS_NO to the guest.
    
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux: Linux 6.7.10 [+ + +]

Author: Sasha Levin <sashal@kernel.org>
Date:   Wed Mar 13 07:41:55 2024 -0400

    Linux 6.7.10
    
    Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: kernelci.org bot <bot@kernelci.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/ipv6: avoid possible UAF in ip6_route_mpath_notify() [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Sun Mar 3 14:48:00 2024 +0000

    net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
    
    [ Upstream commit 685f7d531264599b3f167f1e94bbd22f120e5fab ]
    
    syzbot found another use-after-free in ip6_route_mpath_notify() [1]
    
    Commit f7225172f25a ("net/ipv6: prevent use after free in
    ip6_route_mpath_notify") was not able to fix the root cause.
    
    We need to defer the fib6_info_release() calls after
    ip6_route_mpath_notify(), in the cleanup phase.
    
    [1]
    BUG: KASAN: slab-use-after-free in rt6_fill_node+0x1460/0x1ac0
    Read of size 4 at addr ffff88809a07fc64 by task syz-executor.2/23037
    
    CPU: 0 PID: 23037 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-01035-gea7f3cfaa588 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:377 [inline]
      print_report+0x167/0x540 mm/kasan/report.c:488
      kasan_report+0x142/0x180 mm/kasan/report.c:601
     rt6_fill_node+0x1460/0x1ac0
      inet6_rt_notify+0x13b/0x290 net/ipv6/route.c:6184
      ip6_route_mpath_notify net/ipv6/route.c:5198 [inline]
      ip6_route_multipath_add net/ipv6/route.c:5404 [inline]
      inet6_rtm_newroute+0x1d0f/0x2300 net/ipv6/route.c:5517
      rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597
      netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
      netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
      netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
      netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x221/0x270 net/socket.c:745
      ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
      ___sys_sendmsg net/socket.c:2638 [inline]
      __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
     do_syscall_64+0xf9/0x240
     entry_SYSCALL_64_after_hwframe+0x6f/0x77
    RIP: 0033:0x7f73dd87dda9
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f73de6550c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007f73dd9ac050 RCX: 00007f73dd87dda9
    RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
    RBP: 00007f73dd8ca47a R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    R13: 000000000000006e R14: 00007f73dd9ac050 R15: 00007ffdbdeb7858
     </TASK>
    
    Allocated by task 23037:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      poison_kmalloc_redzone mm/kasan/common.c:372 [inline]
      __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:389
      kasan_kmalloc include/linux/kasan.h:211 [inline]
      __do_kmalloc_node mm/slub.c:3981 [inline]
      __kmalloc+0x22e/0x490 mm/slub.c:3994
      kmalloc include/linux/slab.h:594 [inline]
      kzalloc include/linux/slab.h:711 [inline]
      fib6_info_alloc+0x2e/0xf0 net/ipv6/ip6_fib.c:155
      ip6_route_info_create+0x445/0x12b0 net/ipv6/route.c:3758
      ip6_route_multipath_add net/ipv6/route.c:5298 [inline]
      inet6_rtm_newroute+0x744/0x2300 net/ipv6/route.c:5517
      rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597
      netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
      netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
      netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
      netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x221/0x270 net/socket.c:745
      ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
      ___sys_sendmsg net/socket.c:2638 [inline]
      __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
     do_syscall_64+0xf9/0x240
     entry_SYSCALL_64_after_hwframe+0x6f/0x77
    
    Freed by task 16:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      kasan_save_free_info+0x4e/0x60 mm/kasan/generic.c:640
      poison_slab_object+0xa6/0xe0 mm/kasan/common.c:241
      __kasan_slab_free+0x34/0x70 mm/kasan/common.c:257
      kasan_slab_free include/linux/kasan.h:184 [inline]
      slab_free_hook mm/slub.c:2121 [inline]
      slab_free mm/slub.c:4299 [inline]
      kfree+0x14a/0x380 mm/slub.c:4409
      rcu_do_batch kernel/rcu/tree.c:2190 [inline]
      rcu_core+0xd76/0x1810 kernel/rcu/tree.c:2465
      __do_softirq+0x2bb/0x942 kernel/softirq.c:553
    
    Last potentially related work creation:
      kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
      __kasan_record_aux_stack+0xae/0x100 mm/kasan/generic.c:586
      __call_rcu_common kernel/rcu/tree.c:2715 [inline]
      call_rcu+0x167/0xa80 kernel/rcu/tree.c:2829
      fib6_info_release include/net/ip6_fib.h:341 [inline]
      ip6_route_multipath_add net/ipv6/route.c:5344 [inline]
      inet6_rtm_newroute+0x114d/0x2300 net/ipv6/route.c:5517
      rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597
      netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
      netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
      netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
      netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x221/0x270 net/socket.c:745
      ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
      ___sys_sendmsg net/socket.c:2638 [inline]
      __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
     do_syscall_64+0xf9/0x240
     entry_SYSCALL_64_after_hwframe+0x6f/0x77
    
    The buggy address belongs to the object at ffff88809a07fc00
     which belongs to the cache kmalloc-512 of size 512
    The buggy address is located 100 bytes inside of
     freed 512-byte region [ffff88809a07fc00, ffff88809a07fe00)
    
    The buggy address belongs to the physical page:
    page:ffffea0002681f00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9a07c
    head:ffffea0002681f00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
    flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
    page_type: 0xffffffff()
    raw: 00fff00000000840 ffff888014c41c80 dead000000000122 0000000000000000
    raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    page_owner tracks the page as allocated
    page last allocated via order 2, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 23028, tgid 23027 (syz-executor.4), ts 2340253595219, free_ts 2339107097036
      set_page_owner include/linux/page_owner.h:31 [inline]
      post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
      prep_new_page mm/page_alloc.c:1540 [inline]
      get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
      __alloc_pages+0x255/0x680 mm/page_alloc.c:4567
      __alloc_pages_node include/linux/gfp.h:238 [inline]
      alloc_pages_node include/linux/gfp.h:261 [inline]
      alloc_slab_page+0x5f/0x160 mm/slub.c:2190
      allocate_slab mm/slub.c:2354 [inline]
      new_slab+0x84/0x2f0 mm/slub.c:2407
      ___slab_alloc+0xd17/0x13e0 mm/slub.c:3540
      __slab_alloc mm/slub.c:3625 [inline]
      __slab_alloc_node mm/slub.c:3678 [inline]
      slab_alloc_node mm/slub.c:3850 [inline]
      __do_kmalloc_node mm/slub.c:3980 [inline]
      __kmalloc+0x2e0/0x490 mm/slub.c:3994
      kmalloc include/linux/slab.h:594 [inline]
      kzalloc include/linux/slab.h:711 [inline]
      new_dir fs/proc/proc_sysctl.c:956 [inline]
      get_subdir fs/proc/proc_sysctl.c:1000 [inline]
      sysctl_mkdir_p fs/proc/proc_sysctl.c:1295 [inline]
      __register_sysctl_table+0xb30/0x1440 fs/proc/proc_sysctl.c:1376
      neigh_sysctl_register+0x416/0x500 net/core/neighbour.c:3859
      devinet_sysctl_register+0xaf/0x1f0 net/ipv4/devinet.c:2644
      inetdev_init+0x296/0x4d0 net/ipv4/devinet.c:286
      inetdev_event+0x338/0x15c0 net/ipv4/devinet.c:1555
      notifier_call_chain+0x18f/0x3b0 kernel/notifier.c:93
      call_netdevice_notifiers_extack net/core/dev.c:1987 [inline]
      call_netdevice_notifiers net/core/dev.c:2001 [inline]
      register_netdevice+0x15b2/0x1a20 net/core/dev.c:10340
      br_dev_newlink+0x27/0x100 net/bridge/br_netlink.c:1563
      rtnl_newlink_create net/core/rtnetlink.c:3497 [inline]
      __rtnl_newlink net/core/rtnetlink.c:3717 [inline]
      rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3730
    page last free pid 11583 tgid 11583 stack trace:
      reset_page_owner include/linux/page_owner.h:24 [inline]
      free_pages_prepare mm/page_alloc.c:1140 [inline]
      free_unref_page_prepare+0x968/0xa90 mm/page_alloc.c:2346
      free_unref_page+0x37/0x3f0 mm/page_alloc.c:2486
      kasan_depopulate_vmalloc_pte+0x74/0x90 mm/kasan/shadow.c:415
      apply_to_pte_range mm/memory.c:2619 [inline]
      apply_to_pmd_range mm/memory.c:2663 [inline]
      apply_to_pud_range mm/memory.c:2699 [inline]
      apply_to_p4d_range mm/memory.c:2735 [inline]
      __apply_to_page_range+0x8ec/0xe40 mm/memory.c:2769
      kasan_release_vmalloc+0x9a/0xb0 mm/kasan/shadow.c:532
      __purge_vmap_area_lazy+0x163f/0x1a10 mm/vmalloc.c:1770
      drain_vmap_area_work+0x40/0xd0 mm/vmalloc.c:1804
      process_one_work kernel/workqueue.c:2633 [inline]
      process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706
      worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787
      kthread+0x2ef/0x390 kernel/kthread.c:388
      ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
      ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
    
    Memory state around the buggy address:
     ffff88809a07fb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88809a07fb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    >ffff88809a07fc00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                           ^
     ffff88809a07fc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     ffff88809a07fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    
    Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240303144801.702646-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Check capability for fw_reset [+ + +]

Author: Moshe Shemesh <moshe@nvidia.com>
Date:   Sun Jan 28 20:43:58 2024 +0200

    net/mlx5: Check capability for fw_reset
    
    [ Upstream commit 5e6107b499f3fc4748109e1d87fd9603b34f1e0d ]
    
    Functions which can't access MFRL (Management Firmware Reset Level)
    register, have no use of fw_reset structures or events. Remove fw_reset
    structures allocation and registration for fw reset events notifications
    for these functions.
    
    Having the devlink param enable_remote_dev_reset on functions that don't
    have this capability is misleading as these functions are not allowed to
    influence the reset flow. Hence, this patch removes this parameter for
    such functions.
    
    In addition, return not supported on devlink reload action fw_activate
    for these functions.
    
    Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
    Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
    Reviewed-by: Aya Levin <ayal@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: E-switch, Change flow rule destination checking [+ + +]

Author: Jianbo Liu <jianbol@nvidia.com>
Date:   Thu Jan 11 01:27:47 2024 +0000

    net/mlx5: E-switch, Change flow rule destination checking
    
    [ Upstream commit 85ea2c5c5ef5f24fe6e6e7028ddd90be1cb5d27e ]
    
    The checking in the cited commit is not accurate. In the common case,
    VF destination is internal, and uplink destination is external.
    However, uplink destination with packet reformat is considered as
    internal because firmware uses LB+hairpin to support it. Update the
    checking so header rewrite rules with both internal and external
    destinations are not allowed.
    
    Fixes: e0e22d59b47a ("net/mlx5: E-switch, Add checking for flow rule destinations")
    Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
    Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: Fix fw reporter diagnose output [+ + +]

Author: Aya Levin <ayal@nvidia.com>
Date:   Tue Jan 16 20:13:34 2024 +0200

    net/mlx5: Fix fw reporter diagnose output
    
    [ Upstream commit ac8082a3c7a158640a2c493ec437dd9da881a6a7 ]
    
    Restore fw reporter diagnose to print the syndrome even if it is zero.
    Following the cited commit, in this case (syndrome == 0) command returns no
    output at all.
    
    This fix restores command output in case syndrome is cleared:
    $ devlink health diagnose pci/0000:82:00.0 reporter fw
        Syndrome: 0
    
    Fixes: d17f98bf7cc9 ("net/mlx5: devlink health: use retained error fmsg API")
    Signed-off-by: Aya Levin <ayal@nvidia.com>
    Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Change the warning when ignore_flow_level is not supported [+ + +]

Author: Jianbo Liu <jianbol@nvidia.com>
Date:   Mon Dec 25 01:47:05 2023 +0000

    net/mlx5e: Change the warning when ignore_flow_level is not supported
    
    [ Upstream commit dd238b702064b21d25b4fc39a19699319746d655 ]
    
    Downgrade the print from mlx5_core_warn() to mlx5_core_dbg(), as it
    is just a statement of fact that firmware doesn't support ignore flow
    level.
    
    And change the wording to "firmware flow level support is missing", to
    make it more accurate.
    
    Fixes: ae2ee3be99a8 ("net/mlx5: CT: Remove warning of ignore_flow_level support for VFs")
    Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
    Suggested-by: Elliott, Robert (Servers) <elliott@hpe.com>
    Reviewed-by: Roi Dayan <roid@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Fix MACsec state loss upon state update in offload path [+ + +]

Author: Emeel Hakim <ehakim@nvidia.com>
Date:   Mon Mar 13 17:03:03 2023 +0200

    net/mlx5e: Fix MACsec state loss upon state update in offload path
    
    [ Upstream commit a71f2147b64941efee156bfda54fd6461d0f95df ]
    
    The packet number attribute of the SA is incremented by the device rather
    than the software stack when enabling hardware offload. Because the packet
    number attribute is managed by the hardware, the software has no insight
    into the value of the packet number attribute actually written by the
    device.
    
    Previously when MACsec offload was enabled, the hardware object for
    handling the offload was destroyed when the SA was disabled. Re-enabling
    the SA would lead to a new hardware object being instantiated. This new
    hardware object would not have any recollection of the correct packet
    number for the SA. Instead, destroy the flow steering rule when
    deactivating the SA and recreate it upon reactivation, preserving the
    original hardware object.
    
    Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support")
    Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
    Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
    Reviewed-by: Gal Pressman <gal@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context [+ + +]

Author: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Date:   Thu Feb 8 15:09:34 2024 -0800

    net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context
    
    [ Upstream commit 90502d433c0e7e5483745a574cb719dd5d05b10c ]
    
    The NAPI poll context is a softirq context. Do not use normal spinlock API
    in this context to prevent concurrency issues.
    
    Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
    Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    CC: Vadim Fedorenko <vadfed@meta.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map [+ + +]

Author: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Date:   Mon Feb 5 13:12:28 2024 -0800

    net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map
    
    [ Upstream commit b7cf07586c40f926063d4d09f7de28ff82f62b2a ]
    
    Just simply reordering the functions mlx5e_ptp_metadata_map_put and
    mlx5e_ptpsq_track_metadata in the mlx5e_txwqe_complete context is not good
    enough since both the compiler and CPU are free to reorder these two
    functions. If reordering does occur, the issue that was supposedly fixed by
    7e3f3ba97e6c ("net/mlx5e: Track xmit submission to PTP WQ after populating
    metadata map") will be seen. This will lead to NULL pointer dereferences in
    mlx5e_ptpsq_mark_ts_cqes_undelivered in the NAPI polling context due to the
    tracking list being populated before the metadata map.
    
    Fixes: 7e3f3ba97e6c ("net/mlx5e: Track xmit submission to PTP WQ after populating metadata map")
    Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    CC: Vadim Fedorenko <vadfed@meta.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/rds: fix WARNING in rds_conn_connect_if_down [+ + +]

Author: Edward Adam Davis <eadavis@qq.com>
Date:   Tue Mar 5 08:13:08 2024 +0800

    net/rds: fix WARNING in rds_conn_connect_if_down
    
    [ Upstream commit c055fc00c07be1f0df7375ab0036cebd1106ed38 ]
    
    If connection isn't established yet, get_mr() will fail, trigger connection after
    get_mr().
    
    Fixes: 584a8279a44a ("RDS: RDMA: return appropriate error on rdma map failures")
    Reported-and-tested-by: syzbot+d4faee732755bba9838e@syzkaller.appspotmail.com
    Signed-off-by: Edward Adam Davis <eadavis@qq.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: microchip: fix register write order in ksz8_ind_write8() [+ + +]

Author: Tobias Jakobi (Compleo) <tobias.jakobi.compleo@gmail.com>
Date:   Mon Mar 4 16:41:35 2024 +0100

    net: dsa: microchip: fix register write order in ksz8_ind_write8()
    
    [ Upstream commit b7fb7729c94fb2d23c79ff44f7a2da089c92d81c ]
    
    This bug was noticed while re-implementing parts of the kernel
    driver in userspace using spidev. The goal was to enable some
    of the errata workarounds that Microchip describes in their
    errata sheet [1].
    
    Both the errata sheet and the regular datasheet of e.g. the KSZ8795
    imply that you need to do this for indirect register accesses:
    - write a 16-bit value to a control register pair (this value
      consists of the indirect register table, and the offset inside
      the table)
    - either read or write an 8-bit value from the data storage
      register (indicated by REG_IND_BYTE in the kernel)
    
    The current implementation has the order swapped. It can be
    proven, by reading back some indirect register with known content
    (the EEE register modified in ksz8_handle_global_errata() is one of
    these), that this implementation does not work.
    
    Private discussion with Oleksij Rempel of Pengutronix has revealed
    that the workaround was apparantly never tested on actual hardware.
    
    [1] https://ww1.microchip.com/downloads/aemDocuments/documents/OTH/ProductDocuments/Errata/KSZ87xx-Errata-DS80000687C.pdf
    
    Signed-off-by: Tobias Jakobi (Compleo) <tobias.jakobi.compleo@gmail.com>
    Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
    Fixes: 7b6e6235b664 ("net: dsa: microchip: ksz8795: handle eee specif erratum")
    Link: https://lore.kernel.org/r/20240304154135.161332-1-tobias.jakobi.compleo@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink() [+ + +]

Author: Rand Deeb <rand.sec96@gmail.com>
Date:   Wed Feb 28 18:54:48 2024 +0300

    net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink()
    
    [ Upstream commit 06e456a05d669ca30b224b8ed962421770c1496c ]
    
    The function ice_bridge_setlink() may encounter a NULL pointer dereference
    if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently
    in nla_for_each_nested(). To address this issue, add a check to ensure that
    br_spec is not NULL before proceeding with the nested attribute iteration.
    
    Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
    Signed-off-by: Rand Deeb <rand.sec96@gmail.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: lan78xx: fix runtime PM count underflow on link stop [+ + +]

Author: Oleksij Rempel <o.rempel@pengutronix.de>
Date:   Wed Feb 28 13:45:17 2024 +0100

    net: lan78xx: fix runtime PM count underflow on link stop
    
    [ Upstream commit 1eecc7ab82c42133b748e1895275942a054a7f67 ]
    
    Current driver has some asymmetry in the runtime PM calls. On lan78xx_open()
    it will call usb_autopm_get() and unconditionally usb_autopm_put(). And
    on lan78xx_stop() it will call only usb_autopm_put(). So far, it was
    working only because this driver do not activate autosuspend by default,
    so it was visible only by warning "Runtime PM usage count underflow!".
    
    Since, with current driver, we can't use runtime PM with active link,
    execute lan78xx_open()->usb_autopm_put() only in error case. Otherwise,
    keep ref counting high as long as interface is open.
    
    Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
    Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: pds_core: Fix possible double free in error handling path [+ + +]

Author: Yongzhi Liu <hyperlyzcs@gmail.com>
Date:   Wed Mar 6 18:57:14 2024 +0800

    net: pds_core: Fix possible double free in error handling path
    
    [ Upstream commit ba18deddd6d502da71fd6b6143c53042271b82bd ]
    
    When auxiliary_device_add() returns error and then calls
    auxiliary_device_uninit(), Callback function pdsc_auxbus_dev_release
    calls kfree(padev) to free memory. We shouldn't call kfree(padev)
    again in the error handling path.
    
    Fix this by cleaning up the redundant kfree() and putting
    the error handling back to where the errors happened.
    
    Fixes: 4569cce43bc6 ("pds_core: add auxiliary_bus devices")
    Signed-off-by: Yongzhi Liu <hyperlyzcs@gmail.com>
    Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
    Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
    Link: https://lore.kernel.org/r/20240306105714.20597-1-hyperlyzcs@gmail.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: sparx5: Fix use after free inside sparx5_del_mact_entry [+ + +]

Author: Horatiu Vultur <horatiu.vultur@microchip.com>
Date:   Fri Mar 1 09:06:08 2024 +0100

    net: sparx5: Fix use after free inside sparx5_del_mact_entry
    
    [ Upstream commit 89d72d4125e94aa3c2140fedd97ce07ba9e37674 ]
    
    Based on the static analyzis of the code it looks like when an entry
    from the MAC table was removed, the entry was still used after being
    freed. More precise the vid of the mac_entry was used after calling
    devm_kfree on the mac_entry.
    The fix consists in first using the vid of the mac_entry to delete the
    entry from the HW and after that to free it.
    
    Fixes: b37a1bae742f ("net: sparx5: add mactable support")
    Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240301080608.3053468-1-horatiu.vultur@microchip.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nf_conntrack_h323: Add protection for bmp length out of range [+ + +]

Author: Lena Wang <lena.wang@mediatek.com>
Date:   Tue Mar 5 11:38:55 2024 +0000

    netfilter: nf_conntrack_h323: Add protection for bmp length out of range
    
    [ Upstream commit 767146637efc528b5e3d31297df115e85a2fd362 ]
    
    UBSAN load reports an exception of BRK#5515 SHIFT_ISSUE:Bitwise shifts
    that are out of bounds for their data type.
    
    vmlinux   get_bitmap(b=75) + 712
    <net/netfilter/nf_conntrack_h323_asn1.c:0>
    vmlinux   decode_seq(bs=0xFFFFFFD008037000, f=0xFFFFFFD008037018, level=134443100) + 1956
    <net/netfilter/nf_conntrack_h323_asn1.c:592>
    vmlinux   decode_choice(base=0xFFFFFFD0080370F0, level=23843636) + 1216
    <net/netfilter/nf_conntrack_h323_asn1.c:814>
    vmlinux   decode_seq(f=0xFFFFFFD0080371A8, level=134443500) + 812
    <net/netfilter/nf_conntrack_h323_asn1.c:576>
    vmlinux   decode_choice(base=0xFFFFFFD008037280, level=0) + 1216
    <net/netfilter/nf_conntrack_h323_asn1.c:814>
    vmlinux   DecodeRasMessage() + 304
    <net/netfilter/nf_conntrack_h323_asn1.c:833>
    vmlinux   ras_help() + 684
    <net/netfilter/nf_conntrack_h323_main.c:1728>
    vmlinux   nf_confirm() + 188
    <net/netfilter/nf_conntrack_proto.c:137>
    
    Due to abnormal data in skb->data, the extension bitmap length
    exceeds 32 when decoding ras message then uses the length to make
    a shift operation. It will change into negative after several loop.
    UBSAN load could detect a negative shift as an undefined behaviour
    and reports exception.
    So we add the protection to avoid the length exceeding 32. Or else
    it will return out of range error and stop decoding.
    
    Fixes: 5e35941d9901 ("[NETFILTER]: Add H.323 conntrack/NAT helper")
    Signed-off-by: Lena Wang <lena.wang@mediatek.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nft_ct: fix l3num expectations with inet pseudo family [+ + +]

Author: Florian Westphal <fw@strlen.de>
Date:   Fri Mar 1 13:38:15 2024 +0100

    netfilter: nft_ct: fix l3num expectations with inet pseudo family
    
    [ Upstream commit 99993789966a6eb4f1295193dc543686899892d3 ]
    
    Following is rejected but should be allowed:
    
    table inet t {
            ct expectation exp1 {
                    [..]
                    l3proto ip
    
    Valid combos are:
    table ip t, l3proto ip
    table ip6 t, l3proto ip6
    table inet t, l3proto ip OR l3proto ip6
    
    Disallow inet pseudeo family, the l3num must be a on-wire protocol known
    to conntrack.
    
    Retain NFPROTO_INET case to make it clear its rejected
    intentionally rather as oversight.
    
    Fixes: 8059918a1377 ("netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations")
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_default_path_quality [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:35 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_default_path_quality
    
    [ Upstream commit 958d6145a6d9ba9e075c921aead8753fb91c9101 ]
    
    We need to protect the reader reading sysctl_netrom_default_path_quality
    because the value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_link_fails_count [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:45 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_link_fails_count
    
    [ Upstream commit bc76645ebdd01be9b9994dac39685a3d0f6f7985 ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:36 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
    
    [ Upstream commit cfd9f4a740f772298308b2e6070d2c744fb5cf79 ]
    
    We need to protect the reader reading the sysctl value
    because the value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_routing_control [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:44 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_routing_control
    
    [ Upstream commit b5dffcb8f71bdd02a4e5799985b51b12f4eeaf76 ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:40 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
    
    [ Upstream commit 806f462ba9029d41aadf8ec93f2f99c5305deada ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_transport_busy_delay [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:41 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
    
    [ Upstream commit 43547d8699439a67b78d6bb39015113f7aa360fd ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:39 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
    
    [ Upstream commit e799299aafed417cc1f32adccb2a0e5268b3f6d5 ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:43 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
    
    [ Upstream commit f99b494b40431f0ca416859f2345746199398e2b ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:42 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
    
    [ Upstream commit a2e706841488f474c06e9b33f71afc947fb3bf56 ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix a data-race around sysctl_netrom_transport_timeout [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:38 2024 +0800

    netrom: Fix a data-race around sysctl_netrom_transport_timeout
    
    [ Upstream commit 60a7a152abd494ed4f69098cf0f322e6bb140612 ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix data-races around sysctl_net_busy_read [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:46 2024 +0800

    netrom: Fix data-races around sysctl_net_busy_read
    
    [ Upstream commit d380ce70058a4ccddc3e5f5c2063165dc07672c6 ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser [+ + +]

Author: Jason Xing <kernelxing@tencent.com>
Date:   Mon Mar 4 16:20:37 2024 +0800

    netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
    
    [ Upstream commit 119cae5ea3f9e35cdada8e572cc067f072fa825a ]
    
    We need to protect the reader reading the sysctl value because the
    value can be changed concurrently.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Jason Xing <kernelxing@tencent.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

readahead: avoid multiple marked readahead pages [+ + +]

Author: Jan Kara <jack@suse.cz>
Date:   Thu Jan 4 09:58:39 2024 +0100

    readahead: avoid multiple marked readahead pages
    
    [ Upstream commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ]
    
    ra_alloc_folio() marks a page that should trigger next round of async
    readahead.  However it rounds up computed index to the order of page being
    allocated.  This can however lead to multiple consecutive pages being
    marked with readahead flag.  Consider situation with index == 1, mark ==
    1, order == 0.  We insert order 0 page at index 1 and mark it.  Then we
    bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page
    at index 2 is marked as well.  Then we bump order to 2, index is
    incremented to 4, mark gets rounded to 4 so page at index 4 is marked as
    well.  The fact that multiple pages get marked within a single readahead
    window confuses the readahead logic and results in readahead window being
    trimmed back to 1.  This situation is triggered in particular when maximum
    readahead window size is not a power of two (in the observed case it was
    768 KB) and as a result sequential read throughput suffers.
    
    Fix the problem by rounding 'mark' down instead of up.  Because the index
    is naturally aligned to 'order', we are guaranteed 'rounded mark' == index
    iff 'mark' is within the page we are allocating at 'index' and thus
    exactly one page is marked with readahead flag as required by the
    readahead code and sequential read performance is restored.
    
    This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix
    readahead with large folios").  The commit changed the rounding with the
    rationale:
    
    "...  we were setting the readahead flag on the folio which contains the
    last byte read from the block.  This is wrong because we will trigger
    readahead at the end of the read without waiting to see if a subsequent
    read is going to use the pages we just read."
    
    Although this is true, the fact is this was always the case with read
    sizes not aligned to folio boundaries and large folios in the page cache
    just make the situation more obvious (and frequent).  Also for sequential
    read workloads it is better to trigger the readahead earlier rather than
    later.  It is true that the difference in the rounding and thus earlier
    triggering of the readahead can result in reading more for semi-random
    workloads.  However workloads really suffering from this seem to be rare.
    In particular I have verified that the workload described in commit
    b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading
    random 100k blocks from a file like:
    
    [reader]
    bs=100k
    rw=randread
    numjobs=1
    size=64g
    runtime=60s
    
    is not impacted by the rounding change and achieves ~70MB/s in both cases.
    
    [jack@suse.cz: fix one more place where mark rounding was done as well]
      Link: https://lkml.kernel.org/r/20240123153254.5206-1-jack@suse.cz
    Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@suse.cz
    Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios")
    Signed-off-by: Jan Kara <jack@suse.cz>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Guo Xuenan <guoxuenan@huawei.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "net/mlx5: Block entering switchdev mode with ns inconsistency" [+ + +]

Author: Gavin Li <gavinl@nvidia.com>
Date:   Thu Oct 19 04:49:54 2023 +0300

    Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"
    
    [ Upstream commit 8deeefb24786ea7950b37bde4516b286c877db00 ]
    
    This reverts commit 662404b24a4c4d839839ed25e3097571f5938b9b.
    The revert is required due to the suspicion it is not good for anything
    and cause crash.
    
    Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency")
    Signed-off-by: Gavin Li <gavinl@nvidia.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "net/mlx5e: Check the number of elements before walk TC rhashtable" [+ + +]

Author: Saeed Mahameed <saeedm@nvidia.com>
Date:   Wed Dec 13 17:07:08 2023 -0800

    Revert "net/mlx5e: Check the number of elements before walk TC rhashtable"
    
    [ Upstream commit b7bbd698c90591546d22093181e266785f08c18b ]
    
    This reverts commit 4e25b661f484df54b6751b65f9ea2434a3b67539.
    
    This Commit was mistakenly applied by pulling the wrong tag, remove it.
    
    Fixes: 4e25b661f484 ("net/mlx5e: Check the number of elements before walk TC rhashtable")
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests/bpf: Fix up xdp bonding test wrt feature flags [+ + +]

Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue Mar 5 10:08:29 2024 +0100

    selftests/bpf: Fix up xdp bonding test wrt feature flags
    
    [ Upstream commit 0bfc0336e1348883fdab4689f0c8c56458f36dd8 ]
    
    Adjust the XDP feature flags for the bond device when no bond slave
    devices are attached. After 9b0ed890ac2a ("bonding: do not report
    NETDEV_XDP_ACT_XSK_ZEROCOPY"), the empty bond device must report 0
    as flags instead of NETDEV_XDP_ACT_MASK.
    
      # ./vmtest.sh -- ./test_progs -t xdp_bond
      [...]
      [    3.983311] bond1 (unregistering): (slave veth1_1): Releasing backup interface
      [    3.995434] bond1 (unregistering): Released all slaves
      [    4.022311] bond2: (slave veth2_1): Releasing backup interface
      #507/1   xdp_bonding/xdp_bonding_attach:OK
      #507/2   xdp_bonding/xdp_bonding_nested:OK
      #507/3   xdp_bonding/xdp_bonding_features:OK
      #507/4   xdp_bonding/xdp_bonding_roundrobin:OK
      #507/5   xdp_bonding/xdp_bonding_activebackup:OK
      #507/6   xdp_bonding/xdp_bonding_xor_layer2:OK
      #507/7   xdp_bonding/xdp_bonding_xor_layer23:OK
      #507/8   xdp_bonding/xdp_bonding_xor_layer34:OK
      #507/9   xdp_bonding/xdp_bonding_redirect_multi:OK
      #507     xdp_bonding:OK
      Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED
      [    4.185255] bond2 (unregistering): Released all slaves
      [...]
    
    Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Toke Hц╦iland-Jц╦rgensen <toke@redhat.com>
    Message-ID: <20240305090829.17131-2-daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: mptcp: decrease BW in simult flows [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Jan 31 22:49:51 2024 +0100

    selftests: mptcp: decrease BW in simult flows
    
    [ Upstream commit 5e2f3c65af47e527ccac54060cf909e3306652ff ]
    
    When running the simult_flow selftest in slow environments -- e.g. QEmu
    without KVM support --, the results can be unstable. This selftest
    checks if the aggregated bandwidth is (almost) fully used as expected.
    
    To help improving the stability while still keeping the same validation
    in place, the BW and the delay are reduced to lower the pressure on the
    CPU.
    
    Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
    Fixes: 219d04992b68 ("mptcp: push pending frames when subflow has free space")
    Cc: stable@vger.kernel.org
    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-6-4c1c11e571ff@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Thu Feb 29 14:34:44 2024 -0500

    tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string
    
    [ Upstream commit 51270d573a8d9dd5afdc7934de97d66c0e14b5fd ]
    
    I'm updating __assign_str() and will be removing the second parameter. To
    make sure that it does not break anything, I make sure that it matches the
    __string() field, as that is where the string is actually going to be
    saved in. To make sure there's nothing that breaks, I added a WARN_ON() to
    make sure that what was used in __string() is the same that is used in
    __assign_str().
    
    In doing this change, an error was triggered as __assign_str() now expects
    the string passed in to be a char * value. I instead had the following
    warning:
    
    include/trace/events/qdisc.h: In function Б─≤trace_event_raw_event_qdisc_resetБ─≥:
    include/trace/events/qdisc.h:91:35: error: passing argument 1 of 'strcmp' from incompatible pointer type [-Werror=incompatible-pointer-types]
       91 |                 __assign_str(dev, qdisc_dev(q));
    
    That's because the qdisc_enqueue() and qdisc_reset() pass in qdisc_dev(q)
    to __assign_str() and to __string(). But that function returns a pointer
    to struct net_device and not a string.
    
    It appears that these events are just saving the pointer as a string and
    then reading it as a string as well.
    
    Use qdisc_dev(q)->name to save the device instead.
    
    Fixes: a34dac0b90552 ("net_sched: add tracepoints for qdisc_reset() and qdisc_destroy()")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set [+ + +]

Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date:   Mon Mar 11 12:29:43 2024 -0700

    x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
    
    commit e95df4ec0c0c9791941f112db699fae794b9862a upstream.
    
    Currently MMIO Stale Data mitigation for CPUs not affected by MDS/TAA is
    to only deploy VERW at VMentry by enabling mmio_stale_data_clear static
    branch. No mitigation is needed for kernel->user transitions. If such
    CPUs are also affected by RFDS, its mitigation may set
    X86_FEATURE_CLEAR_CPU_BUF to deploy VERW at kernel->user and VMentry.
    This could result in duplicate VERW at VMentry.
    
    Fix this by disabling mmio_stale_data_clear static branch when
    X86_FEATURE_CLEAR_CPU_BUF is enabled.
    
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/rfds: Mitigate Register File Data Sampling (RFDS) [+ + +]

Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date:   Mon Mar 11 12:29:43 2024 -0700

    x86/rfds: Mitigate Register File Data Sampling (RFDS)
    
    commit 8076fcde016c9c0e0660543e67bff86cb48a7c9c upstream.
    
    RFDS is a CPU vulnerability that may allow userspace to infer kernel
    stale data previously used in floating point registers, vector registers
    and integer registers. RFDS only affects certain Intel Atom processors.
    
    Intel released a microcode update that uses VERW instruction to clear
    the affected CPU buffers. Unlike MDS, none of the affected cores support
    SMT.
    
    Add RFDS bug infrastructure and enable the VERW based mitigation by
    default, that clears the affected buffers just before exiting to
    userspace. Also add sysfs reporting and cmdline parameter
    "reg_file_data_sampling" to control the mitigation.
    
    For details see:
    Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
    
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xdp, bonding: Fix feature flags when there are no slave devs anymore [+ + +]

Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Tue Mar 5 10:08:28 2024 +0100

    xdp, bonding: Fix feature flags when there are no slave devs anymore
    
    [ Upstream commit f267f262815033452195f46c43b572159262f533 ]
    
    Commit 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
    changed the driver from reporting everything as supported before a device
    was bonded into having the driver report that no XDP feature is supported
    until a real device is bonded as it seems to be more truthful given
    eventually real underlying devices decide what XDP features are supported.
    
    The change however did not take into account when all slave devices get
    removed from the bond device. In this case after 9b0ed890ac2a, the driver
    keeps reporting a feature mask of 0x77, that is, NETDEV_XDP_ACT_MASK &
    ~NETDEV_XDP_ACT_XSK_ZEROCOPY whereas it should have reported a feature
    mask of 0.
    
    Fix it by resetting XDP feature flags in the same way as if no XDP program
    is attached to the bond device. This was uncovered by the XDP bond selftest
    which let BPF CI fail. After adjusting the starting masks on the latter
    to 0 instead of NETDEV_XDP_ACT_MASK the test passes again together with
    this fix.
    
    Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Magnus Karlsson <magnus.karlsson@intel.com>
    Cc: Prashant Batra <prbatra.mail@gmail.com>
    Cc: Toke Hц╦iland-Jц╦rgensen <toke@redhat.com>
    Cc: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Toke Hц╦iland-Jц╦rgensen <toke@redhat.com>
    Message-ID: <20240305090829.17131-1-daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

xfrm: Clear low order bits of ->flowi4_tos in decode_session4(). [+ + +]

Author: Guillaume Nault <gnault@redhat.com>
Date:   Wed Jan 3 16:06:32 2024 +0100

    xfrm: Clear low order bits of ->flowi4_tos in decode_session4().
    
    [ Upstream commit 1982a2a02c9197436d4a8ea12f66bafab53f16a0 ]
    
    Commit 23e7b1bfed61 ("xfrm: Don't accidentally set RTO_ONLINK in
    decode_session4()") fixed a problem where decode_session4() could
    erroneously set the RTO_ONLINK flag for IPv4 route lookups. This
    problem was reintroduced when decode_session4() was modified to
    use the flow dissector.
    
    Fix this by clearing again the two low order bits of ->flowi4_tos.
    Found by code inspection, compile tested only.
    
    Fixes: 7a0207094f1b ("xfrm: policy: replace session decode with flow dissector")
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

xfrm: Pass UDP encapsulation in TX packet offload [+ + +]

Author: Leon Romanovsky <leon@kernel.org>
Date:   Wed Jan 24 00:13:54 2024 -0800

    xfrm: Pass UDP encapsulation in TX packet offload
    
    [ Upstream commit 983a73da1f996faee9997149eb05b12fa7bd8cbf ]
    
    In addition to citied commit in Fixes line, allow UDP encapsulation in
    TX path too.
    
    Fixes: 89edf40220be ("xfrm: Support UDP encapsulation in packet offload mode")
    CC: Steffen Klassert <steffen.klassert@secunet.com>
    Reported-by: Mike Yu <yumike@google.com>
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Список изменений в Linux 6.7.10