gkcore codebase improvements discussion

This issue is to discuss about current GNUKhata setup and possible improvements that can be done to make the development easier and codebase scalable.

Current Scenario

JWT based access management

A token (gkusertoken) is sent to user when a user is logged in successfully. This token will contain userid and username.
A new token (gktoken) is sent to user when user selects an organization from organization selection screen. This token will contain userid and orgcode.
gktoken is sent with every GET, POST and PUT request heads.
Inside each API function/method, user ID and organization code is extracted from the token. If token is absent or on failure to extract user ID, gkstatus 2 is returned as response.
Could not find user role based permission management.

Database Connection Management

Database engine is created at gkcore/models/meta.py and returned from dbconnect function.
dbconnect is called in gkcore/__init__.py and assigned to eng (along with creating secret string inside a try except block).
This engine is managed as an API class attribute and inside each API method, connections are used in a try-except block, giving gkstatus for connection failed on exception. Some exceptions are custom handled to return duplicate entry response.
We are using mostly the Core features of SQLAlchemy and ORM functionalities are not touched.

Note: Currently we are using SQL Alchemy 1.3.20, which is legacy now.

Refactoring steps

A branch with refactored code can be seen here - https://gitlab.com/gnukhata/gkcore/-/tree/backend-refactoring-poc. Project configuration and Unit of Measurement API is refactored in that poc branch.

Organize code

Move database related code to gkcore/models/meta.py - 90ba2df6
Move routing to a separate file gkcore/routes.py - 5e47a2b8 There is an instance of circular importing between gkcore/__init__.py and gkcore/utils.py.

Add security policy

Add a basic security policy that checks for authentication and allow all permission to admin user role. Security polity strictly follows current authentication mechanism - 490cc306

Refactor Unit of Measurement API

This API was selected because it was a small API with 245 loc, which was reduced to 165 loc after refactoring - 29f3b5df

Use newly added security policy implemented to API by adding permission string to @view_config decorator.
Use context manager to reliably handle DBAPI connections.

Consequences

Since we are not suppressing exceptions, server will raise 5xx errors. Same is the case with permission management, the requests will be in HTTP response status codes instead of always sending 200 status with a gkstatus value. Front end will have to respect HTTP status codes.
userAuthCheck and authCheck will deprecate. Even though the current poc will support old and new authentication handling, it is suggested to deprecate the old one eventually and also add proper support for user role based permission management.

Benefits

API level user role based permission management can be implemented.
API codebase complexity can be removed along with lines of code.
API will be debug-able since exceptions are not suppressed.
With proper rollback support and isolation, code base can be refactored to be ACID compatible. Which will make the stored data more reliable.

Migration planning

The current refactored project configuration can support both old and new API implementation simultaneously.
We could test the new API structure in couple of frequently used APIs.
If the API tests are not failing, we can migrate APIs one by one to new structure over time.

Extra

Overriding exception handling

view_config decorator can be used to handle exceptions too. So we individually manage each exceptions if we want. The below code will send response status 500 along with exception error text if any API raise any exception. At the same time, it will print the traceback on console.

Even if decide to go with sending status 200 with gkstatus value, this can be used to send custom responses on exceptions.

diff --git a/gkcore/views/__init__.py b/gkcore/views/__init__.py
index e69de29b..c8159909 100644
--- a/gkcore/views/__init__.py
+++ b/gkcore/views/__init__.py
@@ -0,0 +1,12 @@
+from pyramid.view import view_config
+import sys, traceback
+
+@view_config(context=Exception)
+def exception_view(error, request):
+    traceback.print_exc(file=sys.stdout)
+    # set debug status to show error message as response
+    request.response.status = 500
+    if DEBUG:
+        request.response.json_body = {"error": f"{error}"}
+    request.response.json_body = {"error": "Oh no! Something terrible happened."}
+    return request.response

Query output serializer

Currently we are manually assining keyvalues to create response json. This is because SQLAlchemy result is not always serializable. Multiple strategies are listed here - https://stackoverflow.com/questions/5022066/how-to-serialize-sqlalchemy-result-to-json.

Using SQLAlchemy ORM

Currently we are using the core features only (https://docs.sqlalchemy.org/en/20/intro.html). Benefits of using SQLAlchemy ORM needs to be checked.

Migrations Management

Currently we are adding migration files in db_migrate.py file using raw SQL. This is not scalable and will make management of database changes difficult. Having a dedicated migration management tool will help us split up the current single migration file and version them. The main advantage will be the capability of moving up and down through this versions.

Alembic is a great migrations manager that is being used along with SQL Alchemy - https://alembic.sqlalchemy.org/en/latest/.

Back-end validation

Currently with do not have project wide implementation of a back end validation mechanism. A back end validation mechanism will help to reduce data errors and improve security. Suggestions are, Colander with ColanderAlchemy - Crafted to use in Pyramid with SQLAlchemy, have auto schema generation capabilities. (https://docs.pylonsproject.org/projects/colander/en/latest/, https://colanderalchemy.readthedocs.io/) Pydantic - One of the most used python validation library (https://docs.pydantic.dev/latest/)

References

Pyramid security documentation: https://docs.pylonsproject.org/projects/pyramid/en/latest/narr/security.html
SQLAlchemy Core Connections documentation: https://docs.sqlalchemy.org/en/13/core/connections.html
SQLAlchemy Session API documentation: https://docs.sqlalchemy.org/en/20/orm/session_api.html
SQLAlchemy ORM Session creation FAQ: https://docs.sqlalchemy.org/en/13/orm/session_basics.html#session-faq-whentocreate
Pyramid Configurator API documentation: https://docs.pylonsproject.org/projects/pyramid/en/2.0-branch/api/config.html
View Config documentation: https://docs.pylonsproject.org/projects/pyramid/en/2.0-branch/narr/viewconfig.html

Edited Jul 03, 2024 by Kannan V M