Skip to content

Serializable enums: should comparison of values of two different enum types work? (or enum and underlying type) #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vlstill opened this issue May 5, 2025 · 6 comments
Labels
ldwg-discussion Plan to discuss at next LDWG

Comments

@vlstill
Copy link
Contributor

vlstill commented May 5, 2025

See discussion from p4lang/p4c#5246.

enum bit<8> E1 { A = 1, B = 2, C = 10 }
enum bit<8> E2 { X = 2, Y = 1, Z = 3 }
enum bit<2> E3 { P = 1, Q = 2 }

// ...

control c(inout headers hdr) {
    apply {
        bit<8> a = 10;
        if (a == E1.C) { ... } // (1) ??
        if (E1.A == E2.X) { ... } // (2) ??
        if (E1.A == E3.Q) { ... } // (3) ??
    }
}

The question is if any of the case (1), (2), (3) should work.

As @ChrisDodd pointed out,

an enum with an underlying type can be thought of as being a type derived from the underlying type carrying equality, assignment, and casts to/from the underlying type.

I would ready it as permitting (1) for sure and maybe (2), but not (3), because there the underlying type is different.

I think the case of different underlying type is clear (it is not allowed). Personally, I would prefer all of them being disallowed, but I am not sure if P4C actually disallows (1) currently, or only (2), so doing that might be a breaking change. Maybe at least a consideration for "P4-26".

@vlstill vlstill added the ldwg-discussion Plan to discuss at next LDWG label May 5, 2025
@vlstill
Copy link
Contributor Author

vlstill commented May 12, 2025

Discussed on P4 LDWG 2025-05-12 (with @rcgoodfellow, @jafingerhut, @ajwalton), we agreed that the comparison of enums of different types, and even enums with integers can be dangerous. We have also discussed the implicit cast from serializable enum type to its underlying type and tended to disallowing even that. Of course, that would be a breaking change, forcing addition of explicit casts into existing P4 programs that use these features. We have discussed potentially treating these as warnings first, before introducing them as errors in future version of P4 spec & P4C.

Follow on:


Example of dangerous uses:

  • Consider e.g. a serializable enum for ether-type values -- it does not make much sense to compare e.g. EthType.IPv4 with arbitrary bit<16> values, nor does it make sense to do something like EthType.IPv4 + 4.

@vgurevich, by chance do you remember why the implicit casting and comparison for serializable enums were added? Do you know about uses cases that rely on it?

@vgurevich
Copy link
Contributor

The original intent behind serializable enums was to provide convenient named constants to the specific integer types. Nothing more, nothing less. I think that changing this direction is quite dangerous and unnecessary.

We did have situations where we needed to allow flexibility/ Although EthType.IPv4+4 might look non-sensical , EthType.TPID & 0xFEFF == EthType.TPID is a fairly standard idiom. For that same reason, it was agreed that the variables of the enum types can hold any value representable by the underlying integer and not be restricted to only those explicityly named.

If there is a desire to have safer enums, they can be derived from type-based types. Note, BTW, that these type-based types are not widely used precisely because they are too safe and thus people use typedef-based types in pretty much all the cases.

@vlstill
Copy link
Contributor Author

vlstill commented May 13, 2025

OK, thanks @vgurevich for filling in the details that noone on the LDWG remembered. So the original intent is basically a fancy typedef with constants? I think if this is the intention, it should be at least clearly stated in the spec.

Overall, modern languages tend to be more restrictive by default and have the less safe behavior (such as automatic casting/lifting of operations) opt-in (note that C++ is not a modern language in this regard). Maybe we might consider such approach for the P4-next and leave the current behavior for P4-16? I will still double-check the current P4C behavior in more details.

@vgurevich
Copy link
Contributor

@vlstill , you are correct -- the original intent was a fancy typedef with constants that can also be nicely displayed, e.g. in a CLI.

As for the safety, let's think of it this way: the main reason we have serializable enums is because they can be put into the headers, is it not? (otherwise non-serializable would do, correct)? However, no matter how safe the language is made, it cannot control the contents of the headers it receives from the network, right? So, that means that either:

  1. Serializable enums must have names for all values from 0 to 2^n-1
  2. Or if there is a header that has a serializable enum type field inside and packet with such a field is received, it MUST contain only the defined values or else there must be a very serious exception pretty much immediately
  3. Or We actually allow serializable enums to contain any value represented by the underlying integer with all the consequences
  4. Or serializable enums will probably not be very useful as field types inside the headers

I hope that your choice will be (3) and that I didn't miss other important options. If we agree on that, then implicit casts between serializable enums and integers (and remember, they only work one way, i.e. from enum to integer, but not backwards) becomes a nice convenience feature that doesn't affect safety one way or another.

@rcgoodfellow
Copy link
Collaborator

Restricting an enumeration to a specific underlying type representation is not unique to P4. Rust, Java, Scala, Zig and many others have this ability. The entire point of doing this is allowing the enumeration to be used as a boundary between unstructured and structured data representation. Once the data is in a structured representation the portions of a program operating on that structured representation can depend on a well defined domain of values for that data.

As for the safety, let's think of it this way: the main reason we have serializable enums is because they can be put into the headers, is it not? (otherwise non-serializable would do, correct)? However, no matter how safe the language is made, it cannot control the contents of the headers it receives from the network, right?

Headers in P4 do not come out of nowhere, we are not just casting a pointer to a buffer onto a struct like you might in C. Headers are parsed. If an enumeration is part of a header, and the data from the wire cannot be parsed into a valid representation for that enum, then parsing has failed. In cases where unstructured data is permissible, modern lanaguages allow the compsition of enums to partition structured data from unstructured data in a way that requires the code consuming the enum to explicitly deal with unstructured cases (example). Compositional enums is something we are planning on proposing for p4-next.

@vgurevich
Copy link
Contributor

@rcgoodfellow,

Headers in P4 do not come out of nowhere, we are not just casting a pointer to a buffer onto a struct like you might in C. Headers are parsed. If an enumeration is part of a header, and the data from the wire cannot be parsed into a valid representation for that enum, then parsing has failed.

This is an interesting idea, but it doesn't seem to match the current definition of packet.extract().

Specifically, assuming this code fragment:

enum bit<4> ip_version_t { IPv4=4, IPv6=6 };

header ipv4_h {
    ip_version_t version; 
    . . .
}

header ipv6_h {
   ip_version_t version;
   . . .
}

struct headers_t {
    . . .
    ipv4_h ipv4;
    ipv6_h ipv6;
    . . . 
}

parser IngressParser(...)
{
   . . . 
   state parse_ipv4 {
       pkt.extract(hdr.ipv4);
       . . .
   }
   . . .
}

Is the definition of the enum legit? (I think so)
Is it useful and does it make sense to use it the way I just did? (I think so)

In tat case, what should exactly happen if the first nibble of the packet buffer contains something other than 4 or 6? Currently, the header will be extracted and the check will need to be done elsewhere. Do you think this is wrong? (I do not think so).

Overall, from the very beginning P4 was designed to be a language where a C programmer would feel more-or-less at home. That was done on purpose, partially because of the background of the people who started it, but mostly because that makes total sense to do in a restricted environment. All behind-the scenes checks cost a lot.

So, I would suggest that radical changes like this will go in a new generation of P4, which might be based on completely different principles. I also hope that we'll have the hardware to try those ideas on :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ldwg-discussion Plan to discuss at next LDWG
Projects
None yet
Development

No branches or pull requests

3 participants