[#288] Added support for SQL functions `now()`, `lpad()`, `date_format()`, `truncate()`, and `day()`
Background
The following functions required by #288 (closed) were implemented prior to this commit:
CURRENT_USER()
-
COALESCE()
(!696 (merged)) -
ISNULL()
(!710 (merged))
The following function required by #288 (closed) remains to be implemented after this commit:
DATE_ADD()
Changes
This commit adds support for various SQL functions that were not previously supported by Octo. The functions added are:
CURRENT_TIME()
CURRENT_TIMESTAMP()
DAY()
-
DAYOFMONTH()
(alias ofDAY()
) DATE_FORMAT()
LOCALTIME()
-
LOCALTIMESTAMP()
(alias ofCURRENT_TIMESTAMP()
) LPAD()
-
NOW()
(alias ofCURRENT_TIMESTAMP()
) -
TRUNCATE()
(alias of the previously implementedTRUNC()
)
Note that some functions listed above were not requested by #288 (closed), but were nonetheless implemented as they didn't need much additional work to support and they contribute to Octo's MySQL compatibility, which was the intent behind #288 (closed).
A new M extrinsic function was added to the pre-existing _ydboctosqlfunctions.m
file:
$$DAY^%ydboctosqlfunctions()
The remaining SQL functions are implemented by the following new M extrinsic functions:
$$^%ydboctofCURRENTTIME
$$^%ydboctofCURRENTTIMESTAMP
$$^%ydboctofDATEFORMAT
$$^%ydboctofLOCALTIME
$$^%ydboctofLPAD
$$^%ydboctofTRUNCATE
Note that these last three extrinsic functions specify only a routine name, but no label. This reflects a restructuring of how extrinsic functions for SQL functions are maintained. Previously, all SQL functions were maintained in a single _ydboctosqlfunctions.m
file. Now, SQL functions that behave differently depending on which SQL database is being emulated have their own M routine/file. Each such file contains one or more labels implementating the given SQL function for one or more SQL database emulations, e.g. PostgreSQL and/or MySQL. The individual SQL function files follow a naming convention of _ydboctof*functionname*.m
This restructuring was done to accomodate differences in behavior for SQL functions of the same name between different SQL database emulations, e.g. PostgreSQL, MySQL, etc. That is, some SQL functions, such as LOCALTIME()
, CURRENT_TIME()
, and CURRENT_TIMESTAMP()
have the same interface in both PostgreSQL and MySQL, but behave differently (produce different output) in each case. By moving each SQL function into its own file, it is now possible to select between different SQL function implementations at plan generation time based on the specified database emulation (currently only PostgreSQL or MySQL). More on this below.
Specifically, each M routine for a SQL function contains one or more labels, each containing an M implementation of the given SQL function. The number of labels present depends on the number of different implementations for the given function in supported database emulations. For example, the ROUND()
function behaves the same in both MySQL and PostgreSQL and, since these are the only emulations currently supported, only one label and M implementation for this function is present in _ydboctosqlfunctions.m
. On the other hand, CURRENT_TIME()
is implemented differently in MySQL and PostgreSQL, so two labels are present in _ydboctofCURRENTTIME.m
.
The selection of the proper label with the M routine for a given SQL function is done at plan generation time by calling a new get_emulation_string()
helper function from tmpl_print_expression.ctemplate
and using this to help construct the extrinsic function name, which is composed of the routine name plus the emulation setting string defined by config->database_emulation
. More on that below.
For example, a query that contains a call to CURRENT_TIME()
will yield an M plan that contains a reference to an M extrinsic function that implements CURRENT_TIME()
. Per the aforementioned naming convention, this extrinsic function will reference the routine _ydboctofCURRENTTIME.m
. This routine contains a total of four labels: one label each for the two different implementations of the SQL function needed to replicate behavior in both MySQL and PostgreSQL, and one label for each of the two emulations that maps the emulation name to the label name of the implementation. New code in tmpl_print_expression.ctemplate
will use the get_emulation_string()
the correct emulation-label mapping from _ydboctofCURRENTTIME.m
and use the resulting label to construct a fully qualified extrinsic function name to insert into the generated plan.
To make emulation mode lookup and extrinsic function name construction more maintainable, new OCTOLIT_MYSQL
, OCTOLIT_POSTGRESQL
, and MAX_EMULATION_STRING_LEN
macros were added for use in database emulation selection code.
To force the generation of different M plans in case of different emulations and prevent reuse of plans with calls to the same SQL function but in different emulations, the current database emulation setting is included in the plan name hash in hash_canonical_query.c
The database emulation is specified at Octo startup using a new emulate
configuration setting in octo.conf
or by using the new -e
/--emulate
command line option. Each of these methods accepts the same possible emulation settings, namely MYSQL
and POSTGRES
. The emulation setting is read in octo_init.c
and, if specified, converted to the corresponding value of a new DatabaseEmulation
enum and stored in a new database_emulation
member of the OctoConfig
struct. These additions were made in config.h
. If no database emulation is specified, a default emulation is used. The default database emulation is POSTGRES
.
Among the newly supported functions, DATE_FORMAT()
is a special case in that its M routine requires call-out to a new ydboctoDateFormat()
C helper function defined in ydboctodateformat.c
. This was done to allow access to various time-related C library functions that are not present in M and make the porting of portions of the MySQL implementation, which is written in C++, to Octo more straightforward.
The general approach of the new ydboctoDateFormat()
implementation is to read the format specified in the DATE_FORMAT()
call, populate any format fields that are uniquely defined in MySQL, and then to populate the remaining fields using the strftime
C library function. Various supporting functions were ported from MySQL source code to help in those cases where the behavior for a given format code is unique to MySQL, which proved necessary as the MySQL code does not explain many of its date calculations or where its algorithms originated.
The new ydboctoDateFormat()
function is made accessible to Octo via a mapping in a new ydbocto.xc
YottaDB call-out file, which references a new libcocto.so
shared library that exposes this helper function. Various other files were updated to account for ydbocto.xc
and libcocto.so
:
-
needs_copyright.sh
to omit.xc
files -
CMakeLists.txt
:- Updated to include
ydbocto.xc
installation inmake install
- Updated to build and install
libcocto.so
- Updated to include
-
build.sh
was updated to includeydbocto.xc
andlibcocto.so
in binary tarball, and excludemysql-*-.sql
symbolic links fromauto-upgrade
cleanup. -
octoinstall.sh.in
was updated to includeydbocto.xc
andlibcocto.so
in installation script
To support the new _ydboctofLPAD.m
routine, which handles LPAD
calls in different emulations, the following changes were also made:
- Added new %ydboctoerror
error code UNKNOWNFUNCTION
for use with LPAD
in MySQL emulation
- Added new ERR_UNKNOWN_FUNCTION_EMULATION
error message for use in ydb_error_check.c
in case of this new M error code
+ Updated docs accordingly
- Revised _ydboctofLPAD.m
to use this new error code, revised logic to avoid LVUNDEF
error
- Revised load_fixture
helper function to allow passage of additional options to octo, in this case to allow specifying an emulation mode
- Moved LPAD
M label to _ydboctofLPAD.m
Test Changes
To test these changes, a new test_date_time_functions.bats.in
module was added to the test suite. Existing date function related tests were moved from TSCP16
to TDTF01
in the new test_date_time_functions.bats.in
and the relevant fixture was renamed accordingly.
TDTF02
tests Octo's MySQL function implementations against a MySQL test database, while TDTF03
does the same, but for PostgreSQL emulation. Each of these randomly generates queries using the DATE_FORMAT()
function to cover a full range of dates and times and maximize test coverage, with one notable exception. That is, in the case of years between 1 and 100 AD, MySQL produces seemingly correct results, such that depending on the given year, the results will indicate a year in one of several different centuries, even though all should occur in the first century. Since Octo produces results that do indeed reflect the correct, i.e. first, century, the year range 1-100 AD is omitted from these tests.
In addition to the above, TDTF02
also makes use of a new octo288.m
fixture, which randomly generates DATE_FORMAT()
queries with random patterns and values, providing additional test coverage of DATE_FORMAT()
.
CURRENT_TIME()
, CURRENT_TIMESTAMP()
, LOCALTIME()
, LOCALTIMESTAMP()
, LPAD()
, NOW()
, and TRUNCATE()
are tested by TDTF01.sql
. DATE_FORMAT()
is tested by TDTF02-04
. CURRENT_USER
, DAY(), and
DAYOFMONTH()are tested by
TDTF05.
TDTF06-
TDTF16` test various edge cases in date string input.
Also:
-
TERR037
was added to confirm error issued whenDATE_FORMAT()
is called when Octo is run with PostgreSQL emulation -
TCF029
was added to test correct handling of the maximum length of extrinsic functions duringCREATE FUNCTION
execution
Additionally, MySQL crosscheck functionality was added for testing Octo MySQL emulation against upstream MySQL implementations, roughly mirroring the behavior used for similar PostgreSQL crosscheck functionality. To this end, several new MySQL-related helper functions were added to test_helpers.bash.in
:
-
run_query_in_octo_and_mysql_and_crosscheck_multiple_queries()
, roughlymirroring run_query_in_octo_and_postgres_and_crosscheck_multiple_queries()
, with some MySQL-specific differences -
run_query_in_octo_and_mysql_and_crosscheck()
, roughlymirroring run_query_in_octo_and_postgres_and_crosscheck()
, with some MySQL-specific differences- Includes
checktime
flag to allow running special checks for time functions
- Includes
-
run_query_in_two_emulations_and_crosscheck_multiple_queries()
for crosschecking behavior between two Octo emulations, e.g. Octo's MySQL implementation checked against Octo's PostgreSQL emulation- a supporting
run_query_in_two_emulations_and_crosscheck()
helper function
- a supporting
-
run_mysql
for reducing duplication when running (and configuring) a call tomysql
-
create_mysql_database()
for creating new MySQL databases -
load_mysql_fixture()
for running.sql
fixtures against a MySQL database - Conditional timestamp-relative query omission flag is included in mysql crosscheck function to prevent erroneous failures due to time discrepancies
Similarly, a new set_ydb_xc_octo
helper function was added to test_helpers.bash.in
reduce duplication when setting the ydb_xc_octo
variable testing, and to handle the DISABLE_INSTALL
build case.
To initialize a MySQL installation for testing using these new tests, a new hello_db.bats.in
file was added. This file initializes both MySQL and PostgreSQL installations. To this end, hello_psql.bats.in
was removed and its logic rolled into hello_db.bats.in
. The following changes were made to allow the usage of a single outref for testing the initialization of both databases, avoiding much duplication:
- Precision and scale specifications were added for the
NUMERIC
columns inpostgres-nullnames.sql
- Added
trim_trailing_zeroes
option to crosscheck function call inTTA006
to account for trailing 0s added due to addition of precision and scale specifications
- Added
- Added new
mysql-boolean.sql
fixture that differs frompostgres-boolean.sql
- Needed since MySQL only supports 1/0 and TRUE/FALSE for boolean types
- Added
composite
schema tohello_db.bats.in
- Added
sort
call to various outputs inhello_db.bats.in
to ensure consistency in output - Revised row selection by
head
andtail
to:- Ensure consistent capitalization, i.e. all-lowercase
- Emit four tail rows to remove discrepancies in output where Postgres includes the number of rows returned
- Handle a special case, i.e.
Shippers
table, whereintail
returns too many rows, leading to duplication of column headers
- Added forced lower-casing to
hello_db.bats.in
table names to ensure consistency with Postgres lowercase table names - Added
sed
search and replace calls tohello_db.bats.in
to ensure consistency of the following across database outputs:- Column delimiters
- Decimal representations
- NULL value representations
- An edge case where MySQL forces a decimal to an integer in the case of one row only (not sure why)
- Renamed
HP01.ref
toHDB01.ref
and removed all references toHP01.ref
To support the MySQL implementation of LPAD()
, which has a different interface than PostgreSQL, an additional CREATE FUNCTION
definition for LPAD()
was added to octo-seed.sql
and its corresponding M routine was revised to accept 3 parameters and handle both the 2 and 3 parameter cases.
To test the new emulate
configuration setting changes, a new TOC02
test case was added.
Also:
- A new
TERR035
test case was also added to validate errors are raised if an invalid extrinsic function name is used for a SQL function - A new
TERR036
test case was also added to validate errors are raised if an invalid number of parameters is passed to a SQL function, i.e.LPAD
-
TCF004
,TCF011
, andTCF012
were expanded to also test support and error reporting for M intrinsic functions that use abbreviated syntax, e.g.$L[ENGTH]
-
bats-tests.cmake
was updated to run MySQL tests whenmysql
is present on the system
To support MySQL testing in the pipeline, .gitlab-ci.yml
was updated to install MySQL for the ubuntu
and rocky
jobs, and to run the GitLab MySQL service. This service is run to avoid additional complexity and difficulty in configuration and maintenance of MySQL daemons in a Docker environment, which is not straightforward.
Also, the outrefs of some existing tests were updated to reflect the above changes:
- Additional functions now stored in the database
tests/outref/TC029.ref
tests/outref/TC033.ref
tests/outref/TCF022.ref
tests/outref/TSSCQ20.ref
tests/outref/TSQ004.ref
tests/outref/TSQ011.ref
- Change in extrinisic function name referenced by SQL function:
tests/outref/TSCP04_errors.ref
tests/outref/TSCP05_errors.ref
- Additional command line options:
tests/outref/THO00.ref
tests/outref/THO01.ref
- Different error message line number:
tests/outref/TERR001.ref
- Different plan hash:
tests/outref/TCR01.ref
- Different plan order:
tests/outref/TCT011.ref
Additionally, the various new SQL functions were added to the documentation, i.e.:
TRUNCATE
NOW
DAY
DAYOFMONTH
LOCALTIME
LOCALTIMESTAMP
CURRENT_TIME
CURRENT_TIMESTAMP
DATE_FORMAT
LPAD
Several pre-existing but undocumented functions were also added to the documentation:
CURRENT_CATALOG
CURRENT_DATABASE()
CURRENT_ROLE
CURRENT_SCHEMA
CURRENT_USER
SESSION_USER
The following functions were also added to the documentation, but are commented out as they are not fully implemented:
PG_ENCODING_TO_CHAR
PG_IS_IN_RECOVERY
PG_IS_XLOG_REPLAY_PAUSED
Similarly, the new -e
/--emulate
option was added to both intro.rst
and rocto.rst
, and the new 'emulate' config file option was added to config.rst
.
Also, README.md
was updated to include with MySQL install and setup instructions, as well as a Building the documentation
section.
Miscellaneous Changes
- Lexer updates to support case-insensitivity for parenless functions
- Parser update to support use of parenless function names in CREATE FUNCTION (used in seed data)
- Added
TCF028
test case for case-insensitive parenless functions
- Added links to CentOS MySQL installation instructions to
.gitlab-ci.yml
- Revised function documentation prototype format to be generic and not an example by omitting
SELECT
and...
- Added
make clean
documentation instructions toREADME
- Revised
CentOS
reference toRocky Linux
- Added UTF check to pre-commit hook fail commit if UTF characters are detected in committed files
- SQL, ZWR, and reference files are excluded from this check, as they may contain UTF data for testing purposes
- Added new
EXTRINSIC_PREFIX_LEN
macro