[#845] Interval syntax implementation

This is the implementation of interval syntax which is a subset of interval type implementation.

Following features are implemented

  • DATE_ADD function with interval argument
  • DATE_SUB function with interval argument
  • + operator with interval operand
  • - operator with interval operand
  • Interval literal syntax that will be used to define the operands for the above operators

Refer to doc/grammar.rst for interval function specification.

MySQL and Postgres handling of intervals is different in some cases

In case of Postgres

  • Return value for an expression involving date or timestamp and an interval is a timestamp
  • In case the operand has a type with time zone then return value will also have time zone
  • Return value for an expression having time and interval is time irrespective of whether the interval has date fields or not.
  • Interval literals allow year to second specification

In case of MySQL

  • Return value of DATE_ADD and DATE_SUB
    • Depends on the arguments
    • NULL when date is NULL
    • DATE if calculation doesn't involve time part
    • TIME if calculation only involves time part
    • TIMESTAMP if calculation involves time part and date part
    • For MySQL 8.0.22 through 8.0.27 prepared statements, these functions returned DATETIME values regardless of argument types
  • Return value of + or - expressions are similar to DATE_ADD and DATE_SUB
  • Interval literals only allow year to month or day to second specification

Octo's behavior is similar to Postgres. Additionally field specification can have delimiter _ instead of to.

Design

  • Interval literal is parsed and validated by grammar rules and a value_STATEMENT type SqlStatement is formed with SqlValue->type being INTERVAL_LITERAL. SqlValue is updated to include v.interval_stmt field. This field will hold a SqlStatement of type interval_STATEMENT. An interval_STATEMENT has an SqlInterval. All places in the parser code interacts with this value_STATEMENT to determine the validity of its usage in the query.
  • After the parsing stage src/optimization_transforms/lp_generate_where.c stores the SqlValue of type INTERVAL_LITERAL in a LogicalPlan of type LP_VALUE. plan->v.lp_value.value will hold the SqlValue.
  • After logical planning stage, the logical plan and the SqlValue it holds is referred to by the M plan generation code to write M plans for interval operations. During M plan generation, SQL operators + and - are translated to invocations of $$AddInterval^%ydboctoplanhelpers() and $$SubInterval^%ydboctoplanhelpers(). Similarly, SQL DATE_ADD and DATE_SUB functions are translated to invocations of $$PostgreSQL^%ydboctofDATEADD/ $$PostgreSQL^%ydboctofDATESUB and $$MySQL^%ydboctofDATEADD/$$MySQL^%ydboctofDATESUB based on Octo's --emulation config. All of these functions internally call $$AddInterval^%ydboctoplanhelpers() and $$SubInterval^%ydboctoplanhelpers().
  • src/m_templates/tmpl_print_expression.ctemplate upon recognizing the function to be added to M plan as INTERVAL related it carefully adds all the fields of the interval operand as individual arguments to the function.
  • The INTERVAL M routines in ^%ydboctoplanhelpers call the newly added external functions ydboctoAddIntervalC and ydboctoSubIntervalC to do the actual operation.
  • Following is an example of the M code generated for the the query select date'2023-01-01' + interval '1 year 1 month';
     20     . . SET %ydboctoexpr=$$AddInterval^%ydboctoplanhelpers($$Transform2UnixTime^%ydboctoplanhelpers($GET(%ydboctocursor(cursorId,"parameters",1)),17,36,"%Y-%m-%d"),17,$GET(%ydboctocursor(cursorId,"parameters",2)),$GET(%ydboctocursor(cursorId,"parameters",3)),$GET(%ydboctocursor(cursorId,"parameters",4)),$GET(%ydboctocursor(cursorId,"parameters",5)),$GET(%ydboctocursor(cursorId,"parameters",6)),$GET(%ydboctocursor(cursorId,"parameters",7)),$GET(%ydboctocursor(cursorId,"parameters",8)))
    • $$AddInterval is the interval function which implements the + operation.
    • The DATE is processed by $$Transform2UnixTime^%ydboctoplanhelpers($GET(%ydboctocursor(cursorId,"parameters",1)),17,36,"%Y-%m-%d") and placed as the first argument
    • The next argument is type info related to 1st argument17
    • The next set of arguments are individual interval fields. They are stored as parameter values. IN this example they will be 1,1,0,0,0,0,0. They represent year,month,day,hour,minute,second,microsecond in the same order.

Implementation details

  • Implementation is similar to date/time type
  • Parser i.e. .y files have been updated to include grammar rules which enable interval usage in expressions and functions. At present interval value is accepted by DATE_ADD/DATE_SUB functions and +/- operators. Any other usage of interval will result an ERROR.
  • DATE_ADD and DATE_SUB are included in Octo seed file.
  • src/parser/function_definition.c prevents functions other than DATE_ADD and DATE_SUB to include INTERVAL type parameters. It also prevents specifying return type as INTERVAL for all functions.
  • Code to prevent INTERVAL usage in other places is added to src/populate_data_type.c (most of it is here) and few other places.
  • Changes to src/ensure_same_type.c prevents functions like COALESCE which have their own type and are not classified as a function call from having INTERVAL arguments.
  • DAY and YEAR are included as a token with this change and are used to specify INTERVAL's fields. Since these keywords can also be used to specify a function and column name they have been appropriately added to the rules in src/parser/sql_identifier.y.
  • Note that INTERVAL type parameter for a function is defined by adding an additional rule function_parameter_type_list, ideally data_type should have been modified to include INTERVAL but this leads to plugging all the usages of data_type with error handling as INTERVAL is not allowed as a data type in any other place. This has to be revisited when full blown INTERVAL support is needed for Octo.
  • Change to src/get_sqlvaluetype_from_sqldatatype.c is to allow hashing of the arguments of an interval function
  • Changes to src/get_user_visible_type_string.c, src/emit_check_constraint.c and src/emit_create_function.c are used to form the text definitions for INTERVAL usages.
  • An INTERVAL is represented by SqlInterval structure and it consists of the following fields
    • enum SqlIntervalType type; /* Field specification */
    • struct SqlStatement *year; // SqlValue
    • struct SqlStatement *month; // SqlValue
    • struct SqlStatement *day; // SqlValue
    • struct SqlStatement *hour; // SqlValue
    • struct SqlStatement *minute; // SqlValue
    • struct SqlStatement *second; // SqlValue
    • struct SqlStatement *microsecond; // SqlValue
  • The above structure is populated by src/interval_literal.c which is called by the grammar rules for INTERVAL in src/parser.y. Most of the interval literal processing happens here.
  • The range of these fields is same as integer field. Since we do not allow storing of intervals this structure holds the values temporarily. During M plan generation these values are added as operands to interval expressions.
  • src/compress_statement.c and src/decompress_statement.c had to updated as intervals can be used in views and constraints. These are stored in as text and binary definition so the changes in these files are needed to be able to load tables and views properly.
  • src/copy_sql_statement.c had to be modified as this is needed when interval is used in an expression of BETWEEN operation.
  • Interval type is identified at different stages of Octo by LP_INTERVAL (LPActionType), interval_STATEMENT (SqlStatementType) and INTERVAL_LITERAL (SqlValueType)
  • The external functions which process intervals are added to ydbocto.xc.in
  • New error types are introduced to handle error cases of interval usages. Refer to doc/errors.rst for more info on them.
  • Some structures related to Postgres tables are updated to process interval as function types
    • PSQL_TypeOid_interval = 1186 is added to PSQL_TypeOid. This helps src/store_function_in_pg_proc.c to store the parameter type of DATE_ADD and DATE_SUB. 1186 was obtained by running the following query in Postgres.
      names=> select typname,oid from pg_catalog.pg_type where typname='interval';
       typname  | oid
      ----------+------
       interval | 1186
      (1 row)
  • Text definition of interval usages in create table is as shown below
    create table test (id integer, dob date check (dob < date'2023-01-01'+interval'100 years'));
    
    Check constraints:
        "test_dob_check" CHECK ((dob < (DATE'2023-01-01' + INTERVAL'100 year 0 month 0 day 0 hour 0 minute 0.0 second')))
      
    text definition of the table:
    ^%ydboctoschema("test","text",0)="CREATE TABLE `test` (`id` INTEGER PIECE 1, `dob` DATE CONSTRAINT ""test_dob_check"" CHECK ((dob < (DATE'2023-01-01' + INTERVAL'100 year 0 month 0 day 0 hour 0 minute 0.0 second'))) PIECE 2) GLOBAL ""^%ydboctoD5wMIlvjjFxqqShZd97oEED(keys(""""%yo_keycol""""))"" 
    DELIM ""|"" READWRITE;"
  • Text definition of interval usages in create view is as shown below
    View definition:
    create view v1 as select date'2023-01-01' + interval'1 year 1 month';
    ^%ydboctoschema("v1","text",0)="create view v1 as select date'2023-01-01' + interval'1 year 1 month';"
    
    View definition:
    create view v2 as select date'2023-01-01' + interval '1-1';
    ^%ydboctoschema("v2","text",0)="create view v2 as select date'2023-01-01' + interval '1-1';"
  • Text definition of INTERVAL functions are shown below
    ^%ydboctoocto("functions","date_add","%ydboctoF8SktjV95v5wgwzll0pJK5J","text",0)="CREATE FUNCTION `date_add`( DATE,  INTERVAL) RETURNS TIMESTAMP AS $$^%ydboctofDATEADD;"
    ^%ydboctoocto("functions","date_sub","%ydboctoFWuOLsam5mSfzav4hOpH2EA","text",0)="CREATE FUNCTION `date_sub`( DATE,  INTERVAL) RETURNS TIMESTAMP AS $$^%ydboctofDATESUB;"

Testing

  • tests/test_interval_syntax.bats.in is added to validate this implementation

Future Enhancements

  • Current structure holds interval fields in 7 integers. When interval type is implemented this can be reduced to 3 fields (months, days, and microseconds) like how its done in Postgres.
Edited by Ganesh Mahesh

Merge request reports

Loading