Serializable enums: should comparison of values of two different enum types work? (or enum and underlying type) #1369

vlstill · 2025-05-05T06:07:11Z

enum bit<8> E1 { A = 1, B = 2, C = 10 }
enum bit<8> E2 { X = 2, Y = 1, Z = 3 }
enum bit<2> E3 { P = 1, Q = 2 }

// ...

control c(inout headers hdr) {
    apply {
        bit<8> a = 10;
        if (a == E1.C) { ... } // (1) ??
        if (E1.A == E2.X) { ... } // (2) ??
        if (E1.A == E3.Q) { ... } // (3) ??
    }
}

The question is if any of the case (1), (2), (3) should work.

As @ChrisDodd pointed out,

an enum with an underlying type can be thought of as being a type derived from the underlying type carrying equality, assignment, and casts to/from the underlying type.

I would ready it as permitting (1) for sure and maybe (2), but not (3), because there the underlying type is different.

I think the case of different underlying type is clear (it is not allowed). Personally, I would prefer all of them being disallowed, but I am not sure if P4C actually disallows (1) currently, or only (2), so doing that might be a breaking change. Maybe at least a consideration for "P4-26".

vlstill · 2025-05-12T16:47:00Z

Discussed on P4 LDWG 2025-05-12 (with @rcgoodfellow, @jafingerhut, @ajwalton), we agreed that the comparison of enums of different types, and even enums with integers can be dangerous. We have also discussed the implicit cast from serializable enum type to its underlying type and tended to disallowing even that. Of course, that would be a breaking change, forcing addition of explicit casts into existing P4 programs that use these features. We have discussed potentially treating these as warnings first, before introducing them as errors in future version of P4 spec & P4C.

Follow on:

check what would need to change in spec to disallow implicit cast and heterogeneous comparison for serializable enums [@vstill];
check how the compiler behaves before Allow constant folding of serializable enums of different types p4c#5246 [@vstill] & for non-constant values;
proceed with collection of further discussion inputs/opinions.

Example of dangerous uses:

Consider e.g. a serializable enum for ether-type values -- it does not make much sense to compare e.g. EthType.IPv4 with arbitrary bit<16> values, nor does it make sense to do something like EthType.IPv4 + 4.

@vgurevich, by chance do you remember why the implicit casting and comparison for serializable enums were added? Do you know about uses cases that rely on it?

vgurevich · 2025-05-12T19:55:37Z

The original intent behind serializable enums was to provide convenient named constants to the specific integer types. Nothing more, nothing less. I think that changing this direction is quite dangerous and unnecessary.

We did have situations where we needed to allow flexibility/ Although EthType.IPv4+4 might look non-sensical , EthType.TPID & 0xFEFF == EthType.TPID is a fairly standard idiom. For that same reason, it was agreed that the variables of the enum types can hold any value representable by the underlying integer and not be restricted to only those explicityly named.

If there is a desire to have safer enums, they can be derived from type-based types. Note, BTW, that these type-based types are not widely used precisely because they are too safe and thus people use typedef-based types in pretty much all the cases.

vlstill · 2025-05-13T07:17:29Z

OK, thanks @vgurevich for filling in the details that noone on the LDWG remembered. So the original intent is basically a fancy typedef with constants? I think if this is the intention, it should be at least clearly stated in the spec.

Overall, modern languages tend to be more restrictive by default and have the less safe behavior (such as automatic casting/lifting of operations) opt-in (note that C++ is not a modern language in this regard). Maybe we might consider such approach for the P4-next and leave the current behavior for P4-16? I will still double-check the current P4C behavior in more details.

vgurevich · 2025-05-13T16:49:42Z

@vlstill , you are correct -- the original intent was a fancy typedef with constants that can also be nicely displayed, e.g. in a CLI.

As for the safety, let's think of it this way: the main reason we have serializable enums is because they can be put into the headers, is it not? (otherwise non-serializable would do, correct)? However, no matter how safe the language is made, it cannot control the contents of the headers it receives from the network, right? So, that means that either:

Serializable enums must have names for all values from 0 to 2^n-1
Or if there is a header that has a serializable enum type field inside and packet with such a field is received, it MUST contain only the defined values or else there must be a very serious exception pretty much immediately
Or We actually allow serializable enums to contain any value represented by the underlying integer with all the consequences
Or serializable enums will probably not be very useful as field types inside the headers

I hope that your choice will be (3) and that I didn't miss other important options. If we agree on that, then implicit casts between serializable enums and integers (and remember, they only work one way, i.e. from enum to integer, but not backwards) becomes a nice convenience feature that doesn't affect safety one way or another.

rcgoodfellow · 2025-05-14T17:33:03Z

Restricting an enumeration to a specific underlying type representation is not unique to P4. Rust, Java, Scala, Zig and many others have this ability. The entire point of doing this is allowing the enumeration to be used as a boundary between unstructured and structured data representation. Once the data is in a structured representation the portions of a program operating on that structured representation can depend on a well defined domain of values for that data.

As for the safety, let's think of it this way: the main reason we have serializable enums is because they can be put into the headers, is it not? (otherwise non-serializable would do, correct)? However, no matter how safe the language is made, it cannot control the contents of the headers it receives from the network, right?

Headers in P4 do not come out of nowhere, we are not just casting a pointer to a buffer onto a struct like you might in C. Headers are parsed. If an enumeration is part of a header, and the data from the wire cannot be parsed into a valid representation for that enum, then parsing has failed. In cases where unstructured data is permissible, modern lanaguages allow the compsition of enums to partition structured data from unstructured data in a way that requires the code consuming the enum to explicitly deal with unstructured cases (example). Compositional enums is something we are planning on proposing for p4-next.

vgurevich · 2025-05-14T20:19:16Z

@rcgoodfellow,

Headers in P4 do not come out of nowhere, we are not just casting a pointer to a buffer onto a struct like you might in C. Headers are parsed. If an enumeration is part of a header, and the data from the wire cannot be parsed into a valid representation for that enum, then parsing has failed.

This is an interesting idea, but it doesn't seem to match the current definition of packet.extract().

Specifically, assuming this code fragment:

enum bit<4> ip_version_t { IPv4=4, IPv6=6 };

header ipv4_h {
    ip_version_t version; 
    . . .
}

header ipv6_h {
   ip_version_t version;
   . . .
}

struct headers_t {
    . . .
    ipv4_h ipv4;
    ipv6_h ipv6;
    . . . 
}

parser IngressParser(...)
{
   . . . 
   state parse_ipv4 {
       pkt.extract(hdr.ipv4);
       . . .
   }
   . . .
}

Is the definition of the enum legit? (I think so)
Is it useful and does it make sense to use it the way I just did? (I think so)

In tat case, what should exactly happen if the first nibble of the packet buffer contains something other than 4 or 6? Currently, the header will be extracted and the check will need to be done elsewhere. Do you think this is wrong? (I do not think so).

Overall, from the very beginning P4 was designed to be a language where a C programmer would feel more-or-less at home. That was done on purpose, partially because of the background of the people who started it, but mostly because that makes total sense to do in a restricted environment. All behind-the scenes checks cost a lot.

So, I would suggest that radical changes like this will go in a new generation of P4, which might be based on completely different principles. I also hope that we'll have the hardware to try those ideas on :)

vlstill added the ldwg-discussion Plan to discuss at next LDWG label May 5, 2025

vlstill mentioned this issue May 13, 2025

BUG: Unexpected comparison X.B; with 1 p4lang/p4c#5269

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serializable enums: should comparison of values of two different enum types work? (or enum and underlying type) #1369

Serializable enums: should comparison of values of two different enum types work? (or enum and underlying type) #1369

vlstill commented May 5, 2025

vlstill commented May 12, 2025

Uh oh!

vgurevich commented May 12, 2025

Uh oh!

vlstill commented May 13, 2025

Uh oh!

vgurevich commented May 13, 2025

Uh oh!

rcgoodfellow commented May 14, 2025

Uh oh!

vgurevich commented May 14, 2025

Uh oh!

Serializable enums: should comparison of values of two different enum types work? (or enum and underlying type) #1369

Serializable enums: should comparison of values of two different enum types work? (or enum and underlying type) #1369

Comments

vlstill commented May 5, 2025

vlstill commented May 12, 2025

Uh oh!

vgurevich commented May 12, 2025

Uh oh!

vlstill commented May 13, 2025

Uh oh!

vgurevich commented May 13, 2025

Uh oh!

rcgoodfellow commented May 14, 2025

Uh oh!

vgurevich commented May 14, 2025

Uh oh!