Skip to content

[#288] Added support for SQL functions `now()`, `lpad()`, `date_format()`, `truncate()`, and `day()`

Jon Badiali requested to merge jonbadiali/YDBOcto:octo288 into master

Background

The following functions required by #288 (closed) were implemented prior to this commit:

The following function required by #288 (closed) remains to be implemented after this commit:

  • DATE_ADD()

Changes

This commit adds support for various SQL functions that were not previously supported by Octo. The functions added are:

  • CURRENT_TIME()
  • CURRENT_TIMESTAMP()
  • DAY()
  • DAYOFMONTH() (alias of DAY())
  • DATE_FORMAT()
  • LOCALTIME()
  • LOCALTIMESTAMP() (alias of CURRENT_TIMESTAMP())
  • LPAD()
  • NOW() (alias of CURRENT_TIMESTAMP())
  • TRUNCATE() (alias of the previously implemented TRUNC())

Note that some functions listed above were not requested by #288 (closed), but were nonetheless implemented as they didn't need much additional work to support and they contribute to Octo's MySQL compatibility, which was the intent behind #288 (closed).

A new M extrinsic function was added to the pre-existing _ydboctosqlfunctions.m file:

  • $$DAY^%ydboctosqlfunctions()

The remaining SQL functions are implemented by the following new M extrinsic functions:

  • $$^%ydboctofCURRENTTIME
  • $$^%ydboctofCURRENTTIMESTAMP
  • $$^%ydboctofDATEFORMAT
  • $$^%ydboctofLOCALTIME
  • $$^%ydboctofLPAD
  • $$^%ydboctofTRUNCATE

Note that these last three extrinsic functions specify only a routine name, but no label. This reflects a restructuring of how extrinsic functions for SQL functions are maintained. Previously, all SQL functions were maintained in a single _ydboctosqlfunctions.m file. Now, SQL functions that behave differently depending on which SQL database is being emulated have their own M routine/file. Each such file contains one or more labels implementating the given SQL function for one or more SQL database emulations, e.g. PostgreSQL and/or MySQL. The individual SQL function files follow a naming convention of _ydboctof*functionname*.m

This restructuring was done to accomodate differences in behavior for SQL functions of the same name between different SQL database emulations, e.g. PostgreSQL, MySQL, etc. That is, some SQL functions, such as LOCALTIME(), CURRENT_TIME(), and CURRENT_TIMESTAMP() have the same interface in both PostgreSQL and MySQL, but behave differently (produce different output) in each case. By moving each SQL function into its own file, it is now possible to select between different SQL function implementations at plan generation time based on the specified database emulation (currently only PostgreSQL or MySQL). More on this below.

Specifically, each M routine for a SQL function contains one or more labels, each containing an M implementation of the given SQL function. The number of labels present depends on the number of different implementations for the given function in supported database emulations. For example, the ROUND() function behaves the same in both MySQL and PostgreSQL and, since these are the only emulations currently supported, only one label and M implementation for this function is present in _ydboctosqlfunctions.m. On the other hand, CURRENT_TIME() is implemented differently in MySQL and PostgreSQL, so two labels are present in _ydboctofCURRENTTIME.m.

The selection of the proper label with the M routine for a given SQL function is done at plan generation time by calling a new get_emulation_string() helper function from tmpl_print_expression.ctemplate and using this to help construct the extrinsic function name, which is composed of the routine name plus the emulation setting string defined by config->database_emulation. More on that below.

For example, a query that contains a call to CURRENT_TIME() will yield an M plan that contains a reference to an M extrinsic function that implements CURRENT_TIME(). Per the aforementioned naming convention, this extrinsic function will reference the routine _ydboctofCURRENTTIME.m. This routine contains a total of four labels: one label each for the two different implementations of the SQL function needed to replicate behavior in both MySQL and PostgreSQL, and one label for each of the two emulations that maps the emulation name to the label name of the implementation. New code in tmpl_print_expression.ctemplate will use the get_emulation_string() the correct emulation-label mapping from _ydboctofCURRENTTIME.m and use the resulting label to construct a fully qualified extrinsic function name to insert into the generated plan.

To make emulation mode lookup and extrinsic function name construction more maintainable, new OCTOLIT_MYSQL, OCTOLIT_POSTGRESQL, and MAX_EMULATION_STRING_LEN macros were added for use in database emulation selection code.

To force the generation of different M plans in case of different emulations and prevent reuse of plans with calls to the same SQL function but in different emulations, the current database emulation setting is included in the plan name hash in hash_canonical_query.c

The database emulation is specified at Octo startup using a new emulate configuration setting in octo.conf or by using the new -e/--emulate command line option. Each of these methods accepts the same possible emulation settings, namely MYSQL and POSTGRES. The emulation setting is read in octo_init.c and, if specified, converted to the corresponding value of a new DatabaseEmulation enum and stored in a new database_emulation member of the OctoConfig struct. These additions were made in config.h. If no database emulation is specified, a default emulation is used. The default database emulation is POSTGRES.

Among the newly supported functions, DATE_FORMAT() is a special case in that its M routine requires call-out to a new ydboctoDateFormat() C helper function defined in ydboctodateformat.c. This was done to allow access to various time-related C library functions that are not present in M and make the porting of portions of the MySQL implementation, which is written in C++, to Octo more straightforward.

The general approach of the new ydboctoDateFormat() implementation is to read the format specified in the DATE_FORMAT() call, populate any format fields that are uniquely defined in MySQL, and then to populate the remaining fields using the strftime C library function. Various supporting functions were ported from MySQL source code to help in those cases where the behavior for a given format code is unique to MySQL, which proved necessary as the MySQL code does not explain many of its date calculations or where its algorithms originated.

The new ydboctoDateFormat() function is made accessible to Octo via a mapping in a new ydbocto.xc YottaDB call-out file, which references a new libcocto.so shared library that exposes this helper function. Various other files were updated to account for ydbocto.xc and libcocto.so:

  • needs_copyright.sh to omit .xc files
  • CMakeLists.txt:
    • Updated to include ydbocto.xc installation in make install
    • Updated to build and install libcocto.so
  • build.sh was updated to include ydbocto.xc and libcocto.so in binary tarball, and exclude mysql-*-.sql symbolic links from auto-upgrade cleanup.
  • octoinstall.sh.in was updated to include ydbocto.xc and libcocto.so in installation script

To support the new _ydboctofLPAD.m routine, which handles LPAD calls in different emulations, the following changes were also made: - Added new %ydboctoerror error code UNKNOWNFUNCTION for use with LPAD in MySQL emulation - Added new ERR_UNKNOWN_FUNCTION_EMULATION error message for use in ydb_error_check.c in case of this new M error code + Updated docs accordingly - Revised _ydboctofLPAD.m to use this new error code, revised logic to avoid LVUNDEF error - Revised load_fixture helper function to allow passage of additional options to octo, in this case to allow specifying an emulation mode - Moved LPAD M label to _ydboctofLPAD.m

Test Changes

To test these changes, a new test_date_time_functions.bats.in module was added to the test suite. Existing date function related tests were moved from TSCP16 to TDTF01 in the new test_date_time_functions.bats.in and the relevant fixture was renamed accordingly.

TDTF02 tests Octo's MySQL function implementations against a MySQL test database, while TDTF03 does the same, but for PostgreSQL emulation. Each of these randomly generates queries using the DATE_FORMAT() function to cover a full range of dates and times and maximize test coverage, with one notable exception. That is, in the case of years between 1 and 100 AD, MySQL produces seemingly correct results, such that depending on the given year, the results will indicate a year in one of several different centuries, even though all should occur in the first century. Since Octo produces results that do indeed reflect the correct, i.e. first, century, the year range 1-100 AD is omitted from these tests.

In addition to the above, TDTF02 also makes use of a new octo288.m fixture, which randomly generates DATE_FORMAT() queries with random patterns and values, providing additional test coverage of DATE_FORMAT().

CURRENT_TIME(), CURRENT_TIMESTAMP(), LOCALTIME(), LOCALTIMESTAMP(), LPAD(), NOW(), and TRUNCATE() are tested by TDTF01.sql. DATE_FORMAT() is tested by TDTF02-04. CURRENT_USER, DAY(), and DAYOFMONTH()are tested byTDTF05. TDTF06-TDTF16` test various edge cases in date string input.

Also:

  • TERR037 was added to confirm error issued when DATE_FORMAT() is called when Octo is run with PostgreSQL emulation
  • TCF029 was added to test correct handling of the maximum length of extrinsic functions during CREATE FUNCTION execution

Additionally, MySQL crosscheck functionality was added for testing Octo MySQL emulation against upstream MySQL implementations, roughly mirroring the behavior used for similar PostgreSQL crosscheck functionality. To this end, several new MySQL-related helper functions were added to test_helpers.bash.in:

  • run_query_in_octo_and_mysql_and_crosscheck_multiple_queries(), roughly mirroring run_query_in_octo_and_postgres_and_crosscheck_multiple_queries(), with some MySQL-specific differences
  • run_query_in_octo_and_mysql_and_crosscheck(), roughly mirroring run_query_in_octo_and_postgres_and_crosscheck(), with some MySQL-specific differences
    • Includes checktime flag to allow running special checks for time functions
  • run_query_in_two_emulations_and_crosscheck_multiple_queries() for crosschecking behavior between two Octo emulations, e.g. Octo's MySQL implementation checked against Octo's PostgreSQL emulation
    • a supporting run_query_in_two_emulations_and_crosscheck() helper function
  • run_mysql for reducing duplication when running (and configuring) a call to mysql
  • create_mysql_database() for creating new MySQL databases
  • load_mysql_fixture() for running .sql fixtures against a MySQL database
  • Conditional timestamp-relative query omission flag is included in mysql crosscheck function to prevent erroneous failures due to time discrepancies

Similarly, a new set_ydb_xc_octo helper function was added to test_helpers.bash.in reduce duplication when setting the ydb_xc_octo variable testing, and to handle the DISABLE_INSTALL build case.

To initialize a MySQL installation for testing using these new tests, a new hello_db.bats.in file was added. This file initializes both MySQL and PostgreSQL installations. To this end, hello_psql.bats.in was removed and its logic rolled into hello_db.bats.in. The following changes were made to allow the usage of a single outref for testing the initialization of both databases, avoiding much duplication:

  • Precision and scale specifications were added for the NUMERIC columns in postgres-nullnames.sql
    • Added trim_trailing_zeroes option to crosscheck function call in TTA006 to account for trailing 0s added due to addition of precision and scale specifications
  • Added new mysql-boolean.sql fixture that differs from postgres-boolean.sql
    • Needed since MySQL only supports 1/0 and TRUE/FALSE for boolean types
  • Added composite schema to hello_db.bats.in
  • Added sort call to various outputs in hello_db.bats.in to ensure consistency in output
  • Revised row selection by head and tail to:
    • Ensure consistent capitalization, i.e. all-lowercase
    • Emit four tail rows to remove discrepancies in output where Postgres includes the number of rows returned
    • Handle a special case, i.e. Shippers table, wherein tail returns too many rows, leading to duplication of column headers
  • Added forced lower-casing to hello_db.bats.in table names to ensure consistency with Postgres lowercase table names
  • Added sed search and replace calls to hello_db.bats.in to ensure consistency of the following across database outputs:
    • Column delimiters
    • Decimal representations
    • NULL value representations
    • An edge case where MySQL forces a decimal to an integer in the case of one row only (not sure why)
  • Renamed HP01.ref to HDB01.ref and removed all references to HP01.ref

To support the MySQL implementation of LPAD(), which has a different interface than PostgreSQL, an additional CREATE FUNCTION definition for LPAD() was added to octo-seed.sql and its corresponding M routine was revised to accept 3 parameters and handle both the 2 and 3 parameter cases.

To test the new emulate configuration setting changes, a new TOC02 test case was added.

Also:

  • A new TERR035 test case was also added to validate errors are raised if an invalid extrinsic function name is used for a SQL function
  • A new TERR036 test case was also added to validate errors are raised if an invalid number of parameters is passed to a SQL function, i.e. LPAD
  • TCF004, TCF011, and TCF012 were expanded to also test support and error reporting for M intrinsic functions that use abbreviated syntax, e.g. $L[ENGTH]
  • bats-tests.cmake was updated to run MySQL tests when mysql is present on the system

To support MySQL testing in the pipeline, .gitlab-ci.yml was updated to install MySQL for the ubuntu and rocky jobs, and to run the GitLab MySQL service. This service is run to avoid additional complexity and difficulty in configuration and maintenance of MySQL daemons in a Docker environment, which is not straightforward.

Also, the outrefs of some existing tests were updated to reflect the above changes:

  • Additional functions now stored in the database
    • tests/outref/TC029.ref
    • tests/outref/TC033.ref
    • tests/outref/TCF022.ref
    • tests/outref/TSSCQ20.ref
    • tests/outref/TSQ004.ref
    • tests/outref/TSQ011.ref
  • Change in extrinisic function name referenced by SQL function:
    • tests/outref/TSCP04_errors.ref
    • tests/outref/TSCP05_errors.ref
  • Additional command line options:
    • tests/outref/THO00.ref
    • tests/outref/THO01.ref
  • Different error message line number:
    • tests/outref/TERR001.ref
  • Different plan hash:
    • tests/outref/TCR01.ref
  • Different plan order:
    • tests/outref/TCT011.ref

Additionally, the various new SQL functions were added to the documentation, i.e.:

  • TRUNCATE
  • NOW
  • DAY
  • DAYOFMONTH
  • LOCALTIME
  • LOCALTIMESTAMP
  • CURRENT_TIME
  • CURRENT_TIMESTAMP
  • DATE_FORMAT
  • LPAD

Several pre-existing but undocumented functions were also added to the documentation:

  • CURRENT_CATALOG
  • CURRENT_DATABASE()
  • CURRENT_ROLE
  • CURRENT_SCHEMA
  • CURRENT_USER
  • SESSION_USER

The following functions were also added to the documentation, but are commented out as they are not fully implemented:

  • PG_ENCODING_TO_CHAR
  • PG_IS_IN_RECOVERY
  • PG_IS_XLOG_REPLAY_PAUSED

Similarly, the new -e/--emulate option was added to both intro.rst and rocto.rst, and the new 'emulate' config file option was added to config.rst.

Also, README.md was updated to include with MySQL install and setup instructions, as well as a Building the documentation section.

Miscellaneous Changes

  • Lexer updates to support case-insensitivity for parenless functions
    • Parser update to support use of parenless function names in CREATE FUNCTION (used in seed data)
    • Added TCF028 test case for case-insensitive parenless functions
  • Added links to CentOS MySQL installation instructions to .gitlab-ci.yml
  • Revised function documentation prototype format to be generic and not an example by omitting SELECT and ...
  • Added make clean documentation instructions to README
  • Revised CentOS reference to Rocky Linux
  • Added UTF check to pre-commit hook fail commit if UTF characters are detected in committed files
    • SQL, ZWR, and reference files are excluded from this check, as they may contain UTF data for testing purposes
  • Added new EXTRINSIC_PREFIX_LEN macro
Edited by Jon Badiali

Merge request reports