Skip to content

[BPF] forwarding to peer with kubevirt does not work #10058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Yoda317 opened this issue Mar 25, 2025 · 6 comments · Fixed by #10308
Closed

[BPF] forwarding to peer with kubevirt does not work #10058

Yoda317 opened this issue Mar 25, 2025 · 6 comments · Fixed by #10308
Assignees

Comments

@Yoda317
Copy link

Yoda317 commented Mar 25, 2025

We have Kubevirt and Calico eBPF dataplane as cni. Virtual machine connected with pod network in bridge mode couldn't communicate with anything outside node where it's running.
If eBPF disabled on this node by felixconfiguration, everything works fine

Expected Behavior

Virtual machine is reachable from outside node where it is running

Current Behavior

Virtual machine is reachable only from host network and other vms on this node

Possible Solution

Steps to Reproduce (for bugs)

  1. Calico with eBPF enabled
    felixconfig
kind: FelixConfiguration
metadata:
  name: default
spec:
  bpfDataIfacePattern: ^((en|wl|ww|sl|ib)[Popsx].*|(wlan|wwan).*|tunl0$|vxlan.calico$|vxlan-v6.calico$|wireguard.cali$|wg-v6.cali$|egress.calico$|(eth|bond)[0-9]+.[0-9]+$)
  bpfEnabled: true
  bpfKubeProxyEndpointSlicesEnabled: true
  bpfKubeProxyIptablesCleanupEnabled: false
  bpfLogLevel: ""
  floatingIPs: Disabled
  logSeverityScreen: Warning
  prometheusGoMetricsEnabled: false
  prometheusProcessMetricsEnabled: false
  reportingInterval: 0s
  usageReportingEnabled: false
  vxlanPort: 4790
  xdpEnabled: false

  1. Kubevirt virtual machine connected to pod network in bridge mode.
    vm manifest
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vmp-06
  namespace: kubevirt
spec:
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/size: medium 
        kubevirt.io/domain: testvm
    spec:
      domain:
        devices:
          disks:
            - name: disk0 
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
          - name: default 
            bridge: {}
        resources:
          requests:
            memory: 2G
      networks:
      - name: default
        pod: {}
      nodeSelector:
        kubernetes.io/hostname: node63144
      volumes:
        - name: disk0
          persistentVolumeClaim:
            claimName: pvc-vmp-06
        - name: cloudinitdisk
          cloudInitNoCloud:
            userData: |-
              #cloud-config
                             

Context

NAME                         READY   STATUS    RESTARTS   AGE    IP               NODE                NOMINATED NODE   READINESS GATES
virt-launcher-vmp-06-qx5sl   2/2     Running   0          3d6h   10.237.163.238  node63144   <none>           1/1

ping from outside

ping 10.237.163.238
PING 10.237.163.238 (10.237.163.238) 56(84) bytes of data.
64 bytes from 10.237.163.238: icmp_seq=1 ttl=56 time=234 ms
^C
--- 10.237.163.238 ping statistics ---
5 packets transmitted, 1 received, 80% packet loss, time 4065ms
rtt min/avg/max/mdev = 234.781/234.781/234.781/0.000 ms

only ONE icmp echo reply returned

on wire:

# pods if
tcpdump -i cali3986e885c88 icmp
listening on cali3986e885c88, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:07:07.989340 IP 10.220.141.102 > 10.237.163.238: ICMP echo request, id 12963, seq 1, length 64
20:07:07.989455 IP 10.237.163.238 > 10.220.141.102: ICMP echo reply, id 12963, seq 1, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

# host if
tcpdump -i eth0 icmp and host 10.237.163.238
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:08:43.989263 IP 10.220.141.102 > 10.237.163.238: ICMP echo request, id 13033, seq 1, length 64
20:08:43.989415 IP 10.237.163.238 > 10.220.141.102: ICMP echo reply, id 13033, seq 1, length 64
20:08:44.861817 IP 10.220.141.102 > 10.237.163.238: ICMP echo request, id 13033, seq 2, length 64
20:08:45.885376 IP 10.220.141.102 > 10.237.163.238: ICMP echo request, id 13033, seq 3, length 64
20:08:46.899469 IP 10.220.141.102 > 10.237.163.238: ICMP echo request, id 13033, seq 4, length 64
20:08:47.929243 IP 10.220.141.102 > 10.237.163.238: ICMP echo request, id 13033, seq 5, length 64
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel

ping from vm

vmp-06:~$ ping 10.123.198.45
PING 10.123.198.45 (10.123.198.45) 56(84) bytes of data.
^C
--- 10.123.198.45 ping statistics ---
19 packets transmitted, 0 received, 100% packet loss, time 18439ms

# on pods if
 tcpdump -i cali3986e885c88 icmp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cali3986e885c88, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:24:57.819991 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 1, length 64
20:24:58.851370 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 2, length 64
20:24:59.875369 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 3, length 64
20:25:00.899433 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 4, length 64
20:25:01.923426 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 5, length 64
20:25:02.947438 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 6, length 64
20:25:03.971410 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 7, length 64
20:25:04.995431 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 8, length 64
^C
8 packets captured
8 packets received by filter
0 packets dropped by kernel

# on host if
tcpdump -i eth0 icmp and host 10.237.163.238
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:25:11.139506 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 14, length 64
20:25:11.140549 IP 10.123.198.45 > 10.237.163.238: ICMP echo reply, id 23, seq 14, length 64
20:25:12.163532 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 15, length 64
20:25:12.164581 IP 10.123.198.45 > 10.237.163.238: ICMP echo reply, id 23, seq 15, length 64
20:25:13.187516 IP 10.237.163.238 > 10.123.198.45: ICMP echo request, id 23, seq 16, length 64
20:25:13.188563 IP 10.123.198.45 > 10.237.163.238: ICMP echo reply, id 23, seq 16, length 64
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel

When the first packet passed, conntrack record appears in the ct map, and after that all following packets begin to discard

kubectl -n kube-system exec -it calico-node-jq7bn -- calico-node -bpf conntrack dump | grep 10.237.163.238
Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init), mount-bpffs (init)
ICMP 10.220.141.102:0 -> 10.237.163.238:0  Age: 7.087960938s Active ago 127.655102ms

After 15s when record in ct map is cleared, again first icmp packet passes well

Your Environment

  • Calico version 3.29.2
  • Calico dataplane (iptables, windows etc.) eBPF
  • Orchestrator version (e.g. kubernetes, mesos, rkt): k8s v1.27.6
  • Operating System and version: Ubuntu 22.04.4 LTS kernel 6.2.0-34-generic
  • Kubevirt 1.2.2
@tomastigera tomastigera added kind/support area/bpf eBPF Dataplane issues labels Mar 25, 2025
@tomastigera
Copy link
Contributor

After 15s when record in ct map is cleared, again first icmp packet passes well

That is certainly strange 🤔 TCP gets through? Partially?

@tomastigera
Copy link
Contributor

@Yoda317 would you be able to set

bpfLogLevel: "Debug"
bpfLogFilters:
  all: host 10.237.163.238

and provide the bpf logs using tc exec bpf debug in the calico node pod on the vm's host? The bpfLogFilters is optional and is meant to reduce output. We need to capture the ping in the logs.

https://docs.tigera.io/calico/latest/operations/ebpf/troubleshoot-ebpf#ebpf-program-debug-logs

@Yoda317
Copy link
Author

Yoda317 commented Mar 25, 2025

That is certainly strange 🤔 TCP gets through? Partially?

Yep
SYN, SYN-ACK, tcp handshake not completed

 tcpdump -i cali3986e885c88 port 22
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cali3986e885c88, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:17:10.989015 IP 10.220.144.18.44146 > 10.237.163.238.ssh: Flags [S], seq 1290241275, win 65280, options [mss 1360,sackOK,TS val 2416759283 ecr 0,nop,wscale 7], length 0
21:17:10.989141 IP 10.237.163.238.ssh > 10.220.144.18.44146: Flags [S.], seq 3872542047, ack 1290241276, win 65160, options [mss 1460,sackOK,TS val 1826158281 ecr 2416759283,nop,wscale 7], length 0
21:17:12.002332 IP 10.237.163.238.ssh > 10.220.144.18.44146: Flags [S.], seq 3872542047, ack 1290241276, win 65160, options [mss 1460,sackOK,TS val 1826159295 ecr 2416759283,nop,wscale 7], length 0
21:17:14.018356 IP 10.237.163.238.ssh > 10.220.144.18.44146: Flags [S.], seq 3872542047, ack 1290241276, win 65160, options [mss 1460,sackOK,TS val 1826161311 ecr 2416759283,nop,wscale 7], length 0
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel

# but in bpf  ct table it appears as EST
TCP 10.220.144.18:44146 -> 10.237.163.238:22  Age: 2m20.433036606s Active ago 29.954759989s ESTABLISHED

@Yoda317
Copy link
Author

Yoda317 commented Mar 25, 2025

@tomastigera

provide the bpf logs using tc exec bpf debug in the calico node pod on the vm's host?

ping_bpf.log

@tomastigera
Copy link
Contributor

The second packet gets to the VM as well:

  napi/eth0-8720-2620    [070] D..2. 1757103.584294: bpf_trace_printk: eth0.401--------I: Redirect to peer interface (1709) succeeded.
  napi/eth0-8720-2620    [070] D..2. 1757103.584295: bpf_trace_printk: eth0.401--------I: Traffic is towards host namespace, marking with 0x3000000.
  napi/eth0-8720-2620    [070] D..2. 1757103.584296: bpf_trace_printk: eth0.401--------I: Final result=ALLOW (0). Program execution time: 23828ns

This takes a different path and you want see it on the tcpdump on this device. The packet may have wrong MAC/type and may get dropped by the VM.

Could you set bpfRedirectToPeer: Disabled in felixconfiguration to see if it fixes your problem? https://docs.tigera.io/calico/latest/reference/resources/felixconfig#bpfRedirectToPeer

@Yoda317
Copy link
Author

Yoda317 commented Mar 26, 2025

Yes, it works.
Thank you

@tomastigera tomastigera changed the title kubevirt virtual machine network issue with calico eBPF [BPF] forwarding to peer with kubevirt does not work Mar 26, 2025
@sridhartigera sridhartigera self-assigned this Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants