Using Klustron Enum Data Type
Using Klustron Enum Data Type
In database design, we often need to add a "type" field to many tables, such as "male" and "female" for the gender of a person, "kindergarten", "elementary school", "middle school", and "university" for the type of a school, and "sedan", "SUV", "MPV", etc. for the type of a car. The essential characteristic of these fields is that they contain a "finite set of discrete values."
For this type of field, using the enum data type is the most appropriate choice, and both MySQL and PostgreSQL have this data type. The enum type has the following characteristics:
The actual value stored in the enum field in the table is its ordinal value (integer), not the string.
When searching and comparing by enum value, it is done according to the ordinal value, not the string of the enum item. However, MySQL has a problem in this regard (see screenshot below): when doing range comparison (>, <, >=, <=) with a string that is the same as a certain enum item, it does not compare according to the ordinal value, but according to the literal value. However, when sorting by the enum column, MySQL correctly sorts by the ordinal values of the enum items.
In PostgreSQL, enum types need to be defined in advance and then used in create table statements. The advantage of this approach is that multiple tables can use the same enum type, avoiding inconsistencies that may arise from defining them multiple times.
Indexes can be defined on enum columns, and the index key is sorted based on the ordinal value of the enum items, which is also an integer value.
The following screenshots are examples using Klustron. The MySQL examples are using Klustron storage nodes (MySQL 8.0.15).

Create a table using the enum data type, and insert data.
Perform range and equality searches on the enum column, with the ordinal value determining the ordering of the enum items.
When sorting by an enum column, the sorting is based on the ordinal value, not the literal string. The enum type metadata can be viewed in the metadata table.
MySQL's enum usage: it can be used for equality searches, but for range searches, it compares based on the literal string of the enum item, which is actually incorrect.
The enum column is defined in MySQL by assigning each enum item an ordinal value based on the order in which they appear, starting from 0.
When sorting by the enum column, the sorting is based on the ordinal value of the enum item. From the metadata dictionary, you can see the ordinal value of the "rank" enum item.

After inserting more rows, the sorting is still based on the ordinal value of the enum column, rather than the literal string of the items.

MySQL also sorts by the ordinal value of the enum column, but when comparing by size with enum strings, it cannot cleverly compare by ordinal value.
After giving examples of how to use the enum data type, I will now list some poor design practices in database design that do not use enum. These practices should be avoided and not emulated.
Some people directly use string types, such as columns with varchar(N) type, to store these field values. The problem with this approach is that it is possible that an unexpected field value may be entered due to errors in the upper-layer application. For example, in the school type field in the previous example, if the data processing in the application layer is insufficient and a row Rx.type = 'University' is inserted, then this row Rx can also be inserted into the table. So when you search for rows where type='University', you won't find Rx. Another problem is the space utilization problem. In MySQL and PostgreSQL, enum fields actually store the numerical values of enum values, which usually saves space compared to storing strings, and numerical comparison is faster than string comparison during search and comparison.
Some people use numbers to represent types in the tables of the database and complete the conversion between numbers and type strings in the application layer. For example, in the school type mentioned earlier, 1 represents 'Kindergarten' and 2 represents 'Primary school', and then the conversion between this number and the string is completed in the application layer. This approach adds workload and maintenance overhead to the application layer development. If more type values need to be added later, the application layer code needs to be modified. Also, without the application layer code, it is impossible to understand the meaning of the field value (of course, comments can be added for explanation), which affects data readability.
From this, we can see several advantages of using enum types:
Data validation rejects illegal values.
Efficient storage and calculation.
Strong data readability, without relying on application code to interpret.
No numerical interpretation or conversion is required by the application program, reducing the cost of application software development and maintenance.
Therefore, it is strongly recommended to use enum types appropriately when designing databases and application systems.