[#845] Interval syntax implementation
This is the implementation of interval syntax which is a subset of interval type implementation.
Following features are implemented
-
DATE_ADDfunction with interval argument -
DATE_SUBfunction with interval argument -
+operator with interval operand -
-operator with interval operand - Interval literal syntax that will be used to define the operands for the above operators
Refer to doc/grammar.rst for interval function specification.
MySQL and Postgres handling of intervals is different in some cases
In case of Postgres
- Return value for an expression involving date or timestamp and an interval is a timestamp
- In case the operand has a type with time zone then return value will also have time zone
- Return value for an expression having time and interval is time irrespective of whether the interval has date fields or not.
- Interval literals allow year to second specification
In case of MySQL
- Return value of
DATE_ADDandDATE_SUB- Depends on the arguments
- NULL when date is NULL
- DATE if calculation doesn't involve time part
- TIME if calculation only involves time part
- TIMESTAMP if calculation involves time part and date part
- For MySQL 8.0.22 through 8.0.27 prepared statements, these functions returned DATETIME values regardless of argument types
- Return value of
+or-expressions are similar toDATE_ADDandDATE_SUB - Interval literals only allow year to month or day to second specification
Octo's behavior is similar to Postgres. Additionally field specification can have delimiter _ instead of to.
Design
- Interval literal is parsed and validated by grammar rules and a
value_STATEMENTtypeSqlStatementis formed withSqlValue->typebeingINTERVAL_LITERAL.SqlValueis updated to includev.interval_stmtfield. This field will hold aSqlStatementof typeinterval_STATEMENT. Aninterval_STATEMENThas anSqlInterval. All places in the parser code interacts with thisvalue_STATEMENTto determine the validity of its usage in the query. - After the parsing stage
src/optimization_transforms/lp_generate_where.cstores theSqlValueof typeINTERVAL_LITERALin aLogicalPlanof typeLP_VALUE.plan->v.lp_value.valuewill hold theSqlValue. - After logical planning stage, the logical plan and the
SqlValueit holds is referred to by the M plan generation code to write M plans for interval operations. During M plan generation, SQL operators+and-are translated to invocations of$$AddInterval^%ydboctoplanhelpers()and$$SubInterval^%ydboctoplanhelpers(). Similarly, SQLDATE_ADDandDATE_SUBfunctions are translated to invocations of$$PostgreSQL^%ydboctofDATEADD/$$PostgreSQL^%ydboctofDATESUBand$$MySQL^%ydboctofDATEADD/$$MySQL^%ydboctofDATESUBbased on Octo's--emulationconfig. All of these functions internally call$$AddInterval^%ydboctoplanhelpers()and$$SubInterval^%ydboctoplanhelpers(). -
src/m_templates/tmpl_print_expression.ctemplateupon recognizing the function to be added to M plan as INTERVAL related it carefully adds all the fields of the interval operand as individual arguments to the function. - The INTERVAL M routines in
^%ydboctoplanhelperscall the newly added external functionsydboctoAddIntervalCandydboctoSubIntervalCto do the actual operation. - Following is an example of the M code generated for the the query
select date'2023-01-01' + interval '1 year 1 month';20 . . SET %ydboctoexpr=$$AddInterval^%ydboctoplanhelpers($$Transform2UnixTime^%ydboctoplanhelpers($GET(%ydboctocursor(cursorId,"parameters",1)),17,36,"%Y-%m-%d"),17,$GET(%ydboctocursor(cursorId,"parameters",2)),$GET(%ydboctocursor(cursorId,"parameters",3)),$GET(%ydboctocursor(cursorId,"parameters",4)),$GET(%ydboctocursor(cursorId,"parameters",5)),$GET(%ydboctocursor(cursorId,"parameters",6)),$GET(%ydboctocursor(cursorId,"parameters",7)),$GET(%ydboctocursor(cursorId,"parameters",8)))-
$$AddIntervalis the interval function which implements the+operation. - The
DATEis processed by$$Transform2UnixTime^%ydboctoplanhelpers($GET(%ydboctocursor(cursorId,"parameters",1)),17,36,"%Y-%m-%d")and placed as the first argument - The next argument is type info related to 1st argument
17 - The next set of arguments are individual interval fields. They are stored as parameter values. IN this example they will be
1,1,0,0,0,0,0. They representyear,month,day,hour,minute,second,microsecondin the same order.
-
Implementation details
- Implementation is similar to date/time type
- Parser i.e.
.yfiles have been updated to include grammar rules which enable interval usage in expressions and functions. At present interval value is accepted byDATE_ADD/DATE_SUBfunctions and+/-operators. Any other usage of interval will result an ERROR. -
DATE_ADDandDATE_SUBare included in Octo seed file. - src/parser/function_definition.c prevents functions other than
DATE_ADDandDATE_SUBto include INTERVAL type parameters. It also prevents specifying return type as INTERVAL for all functions. - Code to prevent INTERVAL usage in other places is added to
src/populate_data_type.c(most of it is here) and few other places. - Changes to
src/ensure_same_type.cprevents functions like COALESCE which have their own type and are not classified as a function call from having INTERVAL arguments. -
DAYandYEARare included as a token with this change and are used to specify INTERVAL's fields. Since these keywords can also be used to specify a function and column name they have been appropriately added to the rules insrc/parser/sql_identifier.y. - Note that INTERVAL type parameter for a function is defined by adding an additional rule
function_parameter_type_list, ideallydata_typeshould have been modified to include INTERVAL but this leads to plugging all the usages ofdata_typewith error handling as INTERVAL is not allowed as a data type in any other place. This has to be revisited when full blown INTERVAL support is needed for Octo. - Change to
src/get_sqlvaluetype_from_sqldatatype.cis to allow hashing of the arguments of an interval function - Changes to
src/get_user_visible_type_string.c,src/emit_check_constraint.candsrc/emit_create_function.care used to form the text definitions for INTERVAL usages. - An INTERVAL is represented by
SqlIntervalstructure and it consists of the following fieldsenum SqlIntervalType type; /* Field specification */struct SqlStatement *year; // SqlValuestruct SqlStatement *month; // SqlValuestruct SqlStatement *day; // SqlValuestruct SqlStatement *hour; // SqlValuestruct SqlStatement *minute; // SqlValuestruct SqlStatement *second; // SqlValuestruct SqlStatement *microsecond; // SqlValue
- The above structure is populated by
src/interval_literal.cwhich is called by the grammar rules for INTERVAL insrc/parser.y. Most of the interval literal processing happens here. - The range of these fields is same as integer field. Since we do not allow storing of intervals this structure holds the values temporarily. During M plan generation these values are added as operands to interval expressions.
-
src/compress_statement.candsrc/decompress_statement.chad to updated as intervals can be used in views and constraints. These are stored in as text and binary definition so the changes in these files are needed to be able to load tables and views properly. -
src/copy_sql_statement.chad to be modified as this is needed when interval is used in an expression of BETWEEN operation. - Interval type is identified at different stages of Octo by
LP_INTERVAL(LPActionType),interval_STATEMENT(SqlStatementType) andINTERVAL_LITERAL(SqlValueType) - The external functions which process intervals are added to ydbocto.xc.in
- New error types are introduced to handle error cases of interval usages. Refer to doc/errors.rst for more info on them.
- Some structures related to Postgres tables are updated to process interval as function types
-
PSQL_TypeOid_interval = 1186is added toPSQL_TypeOid. This helpssrc/store_function_in_pg_proc.cto store the parameter type ofDATE_ADDandDATE_SUB.1186was obtained by running the following query in Postgres.names=> select typname,oid from pg_catalog.pg_type where typname='interval'; typname | oid ----------+------ interval | 1186 (1 row)
-
- Text definition of interval usages in create table is as shown below
create table test (id integer, dob date check (dob < date'2023-01-01'+interval'100 years')); Check constraints: "test_dob_check" CHECK ((dob < (DATE'2023-01-01' + INTERVAL'100 year 0 month 0 day 0 hour 0 minute 0.0 second'))) text definition of the table: ^%ydboctoschema("test","text",0)="CREATE TABLE `test` (`id` INTEGER PIECE 1, `dob` DATE CONSTRAINT ""test_dob_check"" CHECK ((dob < (DATE'2023-01-01' + INTERVAL'100 year 0 month 0 day 0 hour 0 minute 0.0 second'))) PIECE 2) GLOBAL ""^%ydboctoD5wMIlvjjFxqqShZd97oEED(keys(""""%yo_keycol""""))"" DELIM ""|"" READWRITE;" - Text definition of interval usages in create view is as shown below
View definition: create view v1 as select date'2023-01-01' + interval'1 year 1 month'; ^%ydboctoschema("v1","text",0)="create view v1 as select date'2023-01-01' + interval'1 year 1 month';" View definition: create view v2 as select date'2023-01-01' + interval '1-1'; ^%ydboctoschema("v2","text",0)="create view v2 as select date'2023-01-01' + interval '1-1';" - Text definition of INTERVAL functions are shown below
^%ydboctoocto("functions","date_add","%ydboctoF8SktjV95v5wgwzll0pJK5J","text",0)="CREATE FUNCTION `date_add`( DATE, INTERVAL) RETURNS TIMESTAMP AS $$^%ydboctofDATEADD;" ^%ydboctoocto("functions","date_sub","%ydboctoFWuOLsam5mSfzav4hOpH2EA","text",0)="CREATE FUNCTION `date_sub`( DATE, INTERVAL) RETURNS TIMESTAMP AS $$^%ydboctofDATESUB;"
Testing
-
tests/test_interval_syntax.bats.inis added to validate this implementation
Future Enhancements
- Current structure holds interval fields in 7 integers. When interval type is implemented this can be reduced to 3 fields (months, days, and microseconds) like how its done in Postgres.
Edited by Ganesh Mahesh