Skip to content

[#1050] Support timestamp values of the format 2024-02-21T13:31:48.05098021+07:00

Issue

  • Requested value format is 2024-02-21T13:31:48.05098021+07:00.
  • In the given value
    • T is used as a separator between date and time
    • Greater than 6 sub-second precision value is specified
  • The issue specifies sub-second information greater than 6 places can be ignored
  • Octo currently generates the following error for such input
    OCTO> select timestamp with time zone'2024-02-21T13:31:48.05098021+07:00';
    [ERROR]: ERR_INVALID_DATE_TIME_VALUE: "2024-02-21T13:31:48.05098021+07:00" is invalid for type "TIMESTAMP WITH TIME ZONE", format "TEXT" and datestyle "ISO, YMD"
    LINE 1:1: ...ct timestamp with time zone'2024-02-21T13:31:48.05098021+07:00';
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  • The T usage follows ISO 8601 (https://en.wikipedia.org/wiki/ISO_8601) specification. It is allowed for T to separate date and time (ex:'2024-02-21T13:31:48') in a timestamp. Additionally, time can also have values starting with T (ex:`'T01:01:01')
  • Such inputs need to be allowed by Octo

Analysis

  • In Octo,
    • Only space is allowed as the separator between date and time in a timestamp
    • Time is not allowed to begin with T
    • Sub-second data cannot be greater than 6 in precision
  • To allow the requested values first validate_date_time_value() in src/aux/ydboctodateoperations.c needs to be updated. All date/time literals go through this code, even readonly table values which are accessed through a SELECT is validated by this code.
  • Following considers all the possible query processing elements that needs to be reviewed for the requested change
    • Literal
      • M plan generated to convert to internal format:
        $Text2UnixTime^%ydboctoplanhelpers($GET(%ydboctocursor(cursorId,"parameters",1)),21,"%Y-%m-%d %H:%M:%S%z")
    • Readonly table column processing
      • A simple select of a table without any optimization will make use of M code shown below to convert date/time values to internal format
        SET %ydboctoexpr=$$Text2UnixTime^%ydboctoplanhelpers($GET(^test2(%ydboctocursor(cursorId,"keys",1,"test2","id"))),21,"%Y-%m-%d %H:%M:%S%z") SET:(""=%ydboctoexpr) %ydboctoexpr=$ZYSQLNULL
      • A select undergoes key fixing when WHERE clause like WHERE column = date'01-01-2023' is used. Example of such a query's physical plan is shown below.
        16     DO
        17     . SET %ydboctoexpr=$$Text2UnixTime^%ydboctoplanhelpers($GET(%ydboctocursor(cursorId,"parameters",1)),17,"%Y-%m-%d")
        18     SET %ydboctoexpr=$$DateTimeCast^%ydboctoplanhelpers(%ydboctoexpr,17,10)
        19     SET %ydboctoexpr=$$PrintDateTimeResultColumnValue^%ydboctoplanhelpers(%ydboctoexpr,21,36,"%Y-%m-%d %H:%M:%S%z")
        20     SET %ydboctocursor(cursorId,"keys",1,"test2","dob")=%ydboctoexpr IF '$ZYISSQLNULL(%ydboctocursor(cursorId,"keys",1,"test2","dob")) IF $DATA(^%ydbAIMDzhoKBhqIsfTKdEzkTp0Y0D(0,%ydboctocursor(cursorId,"keys",1,"test2","dob"))) DO
        22     . FOR  SET %ydboctocursor(cursorId,"keys",1,"test2","id")=$ORDER(^%ydbAIMDzhoKBhqIsfTKdEzkTp0Y0D(0,%ydboctocursor(cursorId,"keys",1,"test2","dob"),%ydboctocursor(cursorId,"keys",1,"test2","id"))) QUIT:(%ydboctocursor(cursorId,"keys",1,"test2","id")="")  DO
        29     . . . SET %ydboctoexpr=$$Text2UnixTime^%ydboctoplanhelpers(%ydboctocursor(cursorId,"keys",1,"test2","dob"),21,"%Y-%m-%d %H:%M:%S%z")
        Since, a TIMESTAMP or TIMESTAMP WITH TIME ZONE column of TEXT format can have both 'date time' and 'dateTtime' values this optimization cannot be applied anymore. This is true for columns of type TIME and format TEXT too. Other formats and types can still make use of this optimization.
      • For a select having WHERE clause like WHERE column > date'01-01-2023', optimization was not applied before this change (because string comparison for a date/time column value leads to wrong results) so no change needed here.
    • Readwrite table column processing
      • All date/time values are stored in internal format so no additional handling needed for a SELECT on such a table
      • WHERE clause processing makes use of internal format values so no additional handling needed here too
    • Cast operation
      • Following code handles the conversion.Text2Unix is called before DateTimeCast. So DateTimeCast doesn't have to worry about the new value.
        3 ;  select timestamp with time zone'2023-01-01 01:01:01-06'::date;
        20     . . SET %ydboctoexpr=$$Text2UnixTime^%ydboctoplanhelpers($GET(%ydboctocursor(cursorId,"parameters",1)),21,"%Y-%m-%d %H:%M:%S%z")
        21     . . SET %ydboctoexpr=$$DateTimeCast^%ydboctoplanhelpers((%ydboctoexpr),21,6)
  • From the above cases it is clear that
    • Text2UnixTime (converts date/time value (of TEXT format) to internal format) needs to be updated
    • Key fixing optimization needs to be disallowed for readonly table columns of type TIME/TIME WITH TIME ZONE /TIMESTAMP/TIMESTAMP WITH TIME ZONE and format TEXT
    • Rest of the code will work fine because they all work with the internal format value and doesn't care about the original value
  • Making these changes will handle the new value

Implementation

  • validate_date_time_value() is defined in src/aux/ydboctodateoperations.c. This is called for literal validation as well as readonly column value validation. This function invokes is_date_time_literal_in_valid_format() to do the validation of date/time values of format TEXT. The called function is updated along with the macros it uses to allow more than 6 digits for sub-seconds and, T is allowed to be the starting character of HOUR and as a separator between date and time instead of a space.
  • src/optimization_transforms/lp_optimize_where_multi_equals_ands.c is responsible for determining whether key fixing optimization needs to be applied for a query or not. This file is updated to disallow optimization for equals and IS operation when the expression is formed by values explained in the Analysis section. date/time specific code in LP_BOOLEAN_LESS_THAN case block is updated to avoid unnecessary IF checks when the left or right type is not a column value.
  • Text2UnixTime is defined in src/aux/_ydboctoplanhelpers.m. It is used to convert a TEXT format date/time value to internal format. This M routine calls ydboctoText2InternalFormatM() (defined in src/aux/ydboctodateoperations.c as ydboctoText2InternalFormatC()) an external function. Before this commit the external function processed timestamp with only space separating date/time and microseconds of max 6 precision. To support the new value the external function now calls a helper at the beginning which converts the given input to a value with space separator and max 6 sub-second precision. Then rest of the function processes the converted value as before. This handles the new input.
  • Following is the description of additional changes done
    • Following function definition is added to octo-seed.sql. This enables TIMESTAMP usage shown in the example section below. Makes this usage more user friendly.
      Function Definition
      ---------------------
      CREATE FUNCTION date_format(TIMESTAMP, VARCHAR) RETURNS VARCHAR AS $$^%ydboctofDATEFORMAT;;
      
      Example usages
      ----------------
      OCTO> select date_format(timestamp with time zone'2023-01-01 01:01:01-05', '%m-%d-%Y');
      date_format
      01-01-2023
      (1 row)
      OCTO> select date_format(timestamp'2023-01-01 01:01:01', '%m-%d-%Y');
      date_format
      01-01-2023
      (1 row)
      Because of this change FMT_SEED_DEFINITION is updated from 9 to 10. This bump ensures that the new definition is added.
    • Space previously was only allowed as a separator between date and time. Only one space was allowed. With this change arbitrary spaces are allowed, even at the beginning and end of date/time text format value. To enable this, is_date_time_literal_in_valid_format() and convert_to_std_time_lit() along with macros which they invoke have been updated. Before reaching is_date_time_literal_in_valid_format(), Text2UnixTime^ydboctoplanhelpers does some string processing to help the later external functions. Among the processing done one is to add :00 if timezone information is expressed only as two digits (for ex:-05). In order to do this space between date and time was used to separate time value and obtain time zone. This code is modified by this change to handle multiple spaces. Mainly, input is trimmed using $$FUNC^%TRIM(inputStr) this removes spaces at the start and end. Next, while obtaining the timezone info, negative time zone processing code path is modified to use a : instead of as delimiter for $piece call. This ensures arbitrary spaces in-between date and time of a timestamp is handled correctly.

Test

  • TDTT068, TDTT040, TDTT083, TDTT087, TDTT044, TDTT087, TDTT068, TDTT002, TDTT061, TDTT088, TDTT055 and TDTT084 subtests are updated to test the new value
  • Following are keywords that are related to what tests were added. These might help in doing more of such changes in the future.
    1. Sub-seconds exceeding certain threshold (this issues an error in Postgres but Octo will accept this ignoring values greater than 6 decimal places) -> TDTT068
       2024-02-21T13:31:48.0509802111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111+07:00
    2. No subsecond, no timezone, mixed input values in literals and readonly table,  -> TDTT040
    3. Literals -> TDTT040, TDTT083
    4. Readonly tables -> TDTT040, TDTT087
    5. WHERE clause usage and optimization checks -> TDTT044
    6. Comparison of literals and columns, literals and literals, literals and columns of same type, literals and columns of different type -> TDTT099, TDTT087, TDTT044, TDTT088
    7. Readwrite tables -> TDTT087, TDTT068, tdtt044, tdtt002, tdtt061, tdtt088, TDTT055
      (Insert with the new type, Insert with the new and old type)
    8. Type and format cast using functions -> TDTT084
  • As a result of the octo-seed.sql addition doneTYIO01.ref, TSSCQ20.ref, TSQ011.ref, TSQ004.ref, TJC021.ref, TDCQ01.ref, TCF022.ref, TC033.ref,TC029.ref reference files had to be updated.
  • TDTT044 reference file had complete M plans of queries. This made the file long and hard to maintain. This change replaces the M plans with relevant parts from it.
  • octo288.m previously only tested VARCHAR with this function. It is updated with this change to test TIMESTAMP also
  • TDTT102 validates the space handling done in this commit. Also, it validates arbitrary T specification between date and time of a timestamp.
  • tests/outref/TDTT040_2.ref is changed as more than 6 precision is ignored now 23:59:59.1000000 is valid. The next error case in the readonly values show up in the error message 24:00:00.000000.

Doc

  • doc/grammar.rst is updated to include the new value supported
Edited by Ganesh Mahesh

Merge request reports

Loading