Calibrae is a distributed database system with a financial ledger, aimed at eliminating redundant and antiquated security mechanisms and replacing them with up to date knowledge in the field of computer science.
Ostensibly it's primary functions are a cryptocurrency and forum system. But it will become so much more than this.
First planned function addition will be that the code of the platform itself will be a subject of the rewards system with its own distinct reputation regulation system.
And then, a unified protocol for instant and email messaging, an exchange system, a classifieds advertising system for fostering market activity, privacy protection and eventually, a cloud storage and application execution platform.
Ultimately, the goal is to protocol-ize everything, and write the code in a clear, understandable way that minimises the cost of maintenance, and subjects coders to peer review instead of ignorant managers who only see flashy outputs and don't have any concept of how much more snappy it could be if it were written properly, or how quickly it could be written, if it were not knitted out of obfuscation that would make the bureaucracy of Eastern Europe circa mid-to-late 20th century proud.
Calibrae is a project which intentionally breaks the mould. It implements a cryptocurrency, but without a blockchain ledger. It implements a distributed forum like Steem's and chat system, again without a blockchain, but also that exploits the extreme resilience of the SporeDB design that allows the network to function even if only one node has not gone byzantine (erroneous or malicious).
It will also, ultimately, implement its own version of Gitlab, complete with a rewards system related to management functions of controlling branches and merge rights.
It is a very ambitious project, and it is being created by people who are new to the business of software development, because we believe that the software development business has become so encrusted with meaningless rituals called 'conventions' and 'idiom' instead of following the guidance of Computer Science fields relevant, such as theory about naming systems, source code layout for readabilty, documentation, development management, and most importantly, the cutting edge of the database and distributed systems fields.
Calibrae was initiated by Loki, who has been a long time amateur student of computer science, who has had many periods of hiatus away from the field due to various reasons (mainly mental and physical health, as well as poverty), and he brings his wealth of experience in more mundane aspects of life, especially the more private and personal ones, to bear upon what essentially is a total rethink and realignment of the goals of computer science and the higher, more philosophical goals related to fostering social order and boosting the power of arbitration systems so that vicious monopolistic and oligopolistic camerillas finally have a Nemesis that will take their little dirty games and hurl them into the Abyss.
1.1. Implementation language choice
Calibrae is written in the Google language Go, chosen because it is the cleanest language specification and most conformant with the latest knowledge from the field of Computer Science. Go has traditionally been associated primarily with writing server software, and very easy to learn and it enforces and encourages good coding practices.
1.2. Future plans to fork Golang
It is intended that down the track that Golang will be forked, and I want to announce in this page that it will eventually be called 'g', a language name that has not been taken, unless you consider the case is not important. The small letter G is written differently in formal typefaces, and resembles an 8, and in cyrillic, the small letter G is like a backwards S. In aid of this project, as standard google Go libraries are being changed for use in this project to eliminate errors in linting conventions and naming schemes, not at all ironically, that are supposedly not 'Idiomatic' in go, such as stutter, and names that are generic and thus meaningless.
1.3. Based on SporeDB
The SporeDB project is being trawled for it's excellent protocol designs and a large part of the Calibrae codebase for the server will be refactored from sporedb and adapted to the purposes of Calibrae. Sporedb currently lacks a dynamic cluster/swarm membership system, for one thing, and also, its network protocol library will be swapped out for a reliable UDP protocol so that it is made more resilient to network partitioning.
Calibrae rejects the model of the blockchain as a mechanism for securing transactions, both because of its lack of ability to prune, as well as the processing intensive process of synchronisation. The various database components in Calibrae will be designed to be separable wherever possible, and the process of synchronisation will involve simply requesting the data from nodes trusted by a node operator, and optimised by using the Bittorrent style checksummed blocks to minimise traffic, so out of date versions of the same data store can be quickly synchronised.
1.4. Scalability and Privacy
The network will also have built in sharding logic so that node operators can select which parts of the Calibrae database are being operated by a node, such as languges used in the forum, and modes that specialise only in certifying the ledger, or even membership database, or even just functioning as read-only cache nodes or proxies, which will be programmable to enable various Tor and I2P like location obfuscation/counter traffic analysis methods.
Note that nodes that do not mirror the complete forum database do not therefore calculate rewards from votes on posts, they merely record them.
The ledger will be extended in the future to enable shielded transactions based on the Zcash anonymous ledger scheme as well.
Privacy and security are significant targets for Calibrae, but they are future elements that whose integration will be considered in the design and protocols from the very beginning, so there is no obstacles to adding these features.
1.5. The forum
There will be a forum with peer-review reward allocation system that functions as the first and primary method of issuing new tokens, based on the Steem model that uses 'Inflation Tax' in a different way to that used by Central Banks and Governments, to instead function to allocate the share of the cost of running the network and direct it towards the most ethical and consistent individuals, as a primary filter above all.
Thus, in the implementation of various 'forums' in Calibrae, which will eventually also include a Gitlab like codebase management system for the code of Calibrae itself, which enables directly paying coders for their contributions as well as managing the code effectively without a central management team, the importance of a functioning reputation calculation and modulation algorithm will stand alongside stake (holdings of vested tokens) as equally important factors in the governance of the network.
The forum will also have special 'corporate' user accounts which will enable more centralised management of control of the content, similar to concepts talked about in the Steem system that they are now calling "Communities". This management system will also extend to the messaging system to enable moderated chat systems.
1.6. Reputation systems
Reputations are a very much field-constrained value, and thus for each type of forum, there is a separate user reputation score. The voting power is calculated as a combination of vested stake, as well as the reputation, relative to the highest reputation score in the field, thus the discussion forum (and chat/messaging systems) have separate reputation scores to the code repository management system, for the simple reason that simply being highly respected in general does not mean competence in computer programming.
The reputation scores across the language shards of the forum will not be separated, however, because despite cultural differences, in general the language being used is irrelevant to the matter of the regard of the group towards individuals.
Users will have configurable filters that they can use to decide based on reputation whether they want to see a user, but also, when a user is being harassed by another, they can 'shun' this user, which will make the database know not to retrieve data related to the shunned user when fulfilling queries.
The shunning is recorded openly on the database, and has a reputation diminishment effect on the target, based on the voting power of the shunner.
There is no corresponding reputation boost based on following, because this is an empty activity that should not improve reputation by itself, but rather, the interest shown by upvoting a user's content is the only way to increase reputation score.
1.7. Messaging system
Calibrae is VERY ambitious - we intend to replace the many, sometimes clunky, sometimes insecure short (instant message) and long (email) human-to-human messaging systems. The rewards and up/down voting system will be implemented both with a moderated, and potentially thereby, encrypted multi user messaging systems, as well as transparent, public messaging systems where similar schemes to the voting system, and based on user account reputations in the forum will function to allow client-side moderation based on Web of Trust style subjective reputation as well as the global forum reputation system, at the user's option.
1.8. Media and other file hosting
Initially there will be no method of distributing the hosting of files, but eventually a special URL format and bittorrent-style distribution system will allow nodes to cache files uploaded by users, in a way that allows resilience against network partitioning (disconnection of segments of the network) or attempts at censoring media file content.
There is a lot of work being done in this field by projects like MaidSafe and Storj and a mushrooming number of competing projects like Sia and others.
In the forum, these objects will also be subject to voting schemes, as ultimately, every media file is at first metadata, such as a summary, and then the file object is then distributed and identified by its hash and human-readable identifiers based on uploading user, tags, and so on, using various kinds of database search algorithms.
In the process of reading and considering the foregoing general conceptual map of what Calibrae will be about, and looking at the codebase of SporeDB, I am slowly building up what are the necessary first steps and highest level abstractions in the architecture.
2.1. Trust and Reputation
The first thing that immediately stands out is that SporeDB uses a manual Web of Trust system similar to that used in PGP. This is quite labor intensive for manual assignment of trust levels, and does not lend itself to the formation of reputations simply.
This is very problematic if it is intended that a large proportion of users be running servers. Managing a Web of Trust with hundreds if not thousands of server accounts to check, is an impossible task.
2.1.1. Accounts database is the primary object
So, first thing that has to be said is that the account database comes before anything else. The account database also includes the swarm membership database system, which keeps track of which nodes are up, which are down, who owns them, and what their reputation is.
2.1.2. Server operator reputation incentives
The way that the reputation system is integrated into the trust system is that nodes have a number of procedures they use to alter their reputation.
188.8.131.52. Liveness bonus
When a server is up, this fact is recorded in the swarm membership database, and this applies a small boost to the reputation, of 1% of the user's base reputation acquired through other means (but initially, through only simply the act of having a live server online. This is an incentive to keep the server up, because as the user builds reputation the boost of having their node live boosts their reputation, reduces the limiter on their stake in calculating voting power.
184.108.40.206. Application authenticity and currency
When the web application service is added, having a current consensus of the node serving up a canonical, and ideally, current version of the app, is another reputation booster.
220.127.116.11. Forum participation
When the forum is running, the posts made by the server operator will also contribute to the development of their reputation score. Presumably part of the activity of a server node operator on the forum will relate to their service, as well as ancillary projects that develop the utility of the network, both directly as developers and indirectly in developing ancillary services. Because the early userbase will be dominated by server operators, it is likely that over time server operators will have a top tier position in the reputation system, and so they should.
The results of misbehaviour in the forum will be the diminishment of their effective position in the hierarchy of server operators, ranked by their reputations. Popularity is the usual metric for reputation growth of other types of user accounts, but server operators, it has to do with how faithful the operator is to the seriousness of their task, and their dedication to the development of the network, improving its stability, utility, and thereby attracting more users to form the constituency and to buy in and thus raise the exchange value of the tokens.
2.2. Schema of the accounts database
2.2.1 Human readable identifier code
The account creation process pulls a random value from the system entropy gatherer consisting of 20 bits (1048576 possibilities). The system will cap the values at 999,999, so they stay within the constraint of a 6 digit number, and reroll if a value above this size appears.
The second component is user chosen. This consists of a 3-6 character alphabetical prefix code. This can be such as 'ani', 'chaos', 'loki', 'andy', 'brett', 'andrew', and so on. It is not important exactly what is chosen, but if there is an existing account with the same prefix and suffix, the 6 number postfix will be re-rolled.
The reason for using these identifiers, which will be universal for all users, is that they are like car registration plates or passport/licence codes. They can be remembered relatively easy by humans. These are the primary identifiers of accounts, and will be written in this format:
By making these codes human-readable and memorable, it enables users to identify other users more easily. This value is immutable and cannot be changed.
The likelihood that a 6 digit maximum prefix and a 6 digit decimal suffix will exhaust reasonable choices of memorable prefixes is extremely unlikely, since there is only currently about 6 billion people, meaning 6000 distinct prefix codes would have to be used to exhaust it. With the wide variety of languages and their associated names, even if we constrain this to people choosing names, there is still likely over 5000 names to every language, and then we can add another 40-50,000 words that are not names.
This gives a total number of possibilities of 178,867,906,207,629,399,300,243,456 of possible identifiers, all of which will be short enough to be memorable by humans. It is more humanised and friendly than a purely numeric code, and by giving the user the choice of the first 6 characters of the code it gives some choice to users as to their immutable identifier, in human language.
2.2.2. Mutable username
Next value in the schema for user accounts is an arbitrary, up to 32 character long, case sensitive, space containing value, which works as the user's 'handle'.
Combined with the date of a post this acts as a partial unique key, which can easily narrow down the account in question to a small enough set that the previous identifier can easily distinguish between accounts. Quite often it will be as distinctive as the unique identifier itself, but if multiple identical mutable usernames come up in a search result, then the users can refer to a post and complete the uniqueness constraint.
A similar account system is used in the chat system Discord, where users create an original unique identifier, but can assign display values that differ. In the cases where within a single scope, such as any given result of a query of posts in the forum, like a trending list, or in the comment thread under a post, has identical usernames, the client application will render the identifier code prominently directly after the username to highlight the fact multiple users are using the same handle in the list.
Since the handles have no limits except no double spaces, prefix spaces or terminal spaces, and can use any other printable character, there is no real reason why people should choose identical handles. The principle here is much like the assignment of names between humans. Individual names can collide, and often do, but very rarely does the combination of first name and last name, or even first name and birthplace, or other variants. These are ancient database schemes that humans have been using for millenia.
In fact, the use of birthdates as unique key components by government databases is quite stupid. This is why it is so easy to steal government issued identities. This system combines a potential 178 SEPTILLION possibilities just in the unique identifier, and then uses 2^256 possibilities for the secret identifier.
2.2.3. Public Key (and private key)
The public key is a secondary identifier, which also will be checked for uniqueness but likely will never collide, hence the use of this key size becoming a common method in cryptography, with the use of Elliptic Curves.
A 256 bit elliptic curve key also has a word code consisting of 12 words. When creating the code, the user should have the option of choosing the words used, and suggestions made when the choice falls outside of the constraints. These word codes usually consist of a specific database where each word forms a hash based on its sequence of letters, forming the string, and usually eliminate the use of short words like articles, prepositions, and the like. They are shorter than a haiku, and by giving users the choice to design their word code, the security of the login is dramatically increased, since it is thus ensured strongly against loss, a factor in cryptographic security that is often forgotten by mathematically minded programmers.
This word code then derives the secret key, which then derives the public key. The public key is stored in the database, and the private key should never be stored on accessible networked devices, outside of a memory protection system, which should automatically lock after some reasonable time period, between 5 minutes and 1 hour, depending on the user's preference. Possibly a simple hardware solution can be created that enables the user to plug in, and then immediately unplug a flash memory device, that contains the key, protected by a short password, for security.
But fundamentally, it is 256 bits of EC private key, which can be used to test the identity of the account.
For encryption, this key is used to certify the identity of the counterparty, and used to encrypt the symmetric key that is used for the message. For longer amounts of data, a Diffie Hellman PFS rekeying system will be used to break up the data to avoid patterns forming in the payload data.
2.2.4. Arbitrary Secondary data
This is a feature that depends on external verification methods, whether by checking an external account capable of posting permalinked content, or the use of foreign key verification methods as used in cryptocurrencies for wallet addresses.
This can include external account verifications. To do this, there would be a format including the account name, a permalink verifiably linked to the account name, and a message is posted to this account, on the permalink, which bears a signature from the user's current, or past public key.
When these keys are changed, the user should also have to create a new entry with a new signature to keep the external accounts verification data consistent and up to date, as they will flag as invalid once the signing key expires.
The purpose of adding this data to the database includes the ability to ensure when another user is making dealings with this user, they can inspect external information to further certify trust.
This can include asserting control of a particular cryptocurrency address, by signing a message in this field with the private key of the wallet address corresponding to the public key. The message would need to be a random string of at least 256 bytes of data, likely encoded in base58, including the public key as the prefix.
2.3. Swarm Membership Database
The next element, subordinate and dependent upon the prior database, is the swarm membership system. This consists of transient, expiring service advertisments, and each node updates the last seen field, which keeps the node live. These updates are also propagated as a last step in the process of processing requests from another node. The node itself monitors the knowledge of its live status coming in from other nodes, and knows that it must update this before the timeout (Time to Live) for a swarm member node record, if otherwise it is idle.
2.3.1. Swarm Membership Database Schema
18.104.22.168. Service Advertisment
The server will periodically publish 'heartbeat' service advertisments with a prescribed TTL. These records are transient, and can be augmented with a 'Last Seen' record, which is a signed declaration that the node was last communicated with at a certain time. However, even if the node is seen by other nodes, it still must update this advertisment with a new signed broadcast saying they are still live.
The service advertisment also includes the operational mode of the node. It corresponds to the execMode of nodes, that is an immutable configuration at each launch. This can be one of router, cache, witness, ledger, mirror and RPC.
Firstly, all nodes store the account database. They need this to have access to the keys of other accounts so they know who is where and how to encrypt messages to them.
Encryption is performed at the protocol level, and requires the public keys of recipients. It does not depend on the use of a secondary encryption protocol. This is is because the use cases of SSL/TLS encryption do not apply to the Reliable UDP protocol that is used by all nodes to pass information. The reliable UDP protocol is enhanced with a DH PFS key sharing algorithm that is triggered every 256 messages passed, by the protocol buffer layer.
The reason for doing it this way is because Routers will have also a function to relay a message, which will enable the later implementation of an onion routing protocol to protect nodes against the location of the client. Regardless of this, the network will have a 5 minute mutex against transactions of an account originating somewhere else. This does not reveal the location of the client sending the message, it only mandates that they can only change their relay point every 5 minutes or so.
Routers just pass messages around and that's all they do. These messages can include relay requests, which can bear encrypted payloads that can be relay requests for other nodes. Thus, it is built into the system that it is possible to create a Tor like relay system.
Caches only store data for clients querying them, they do not store the whole database. These are used by client applications, that do not pass through the web application interface. They can be configured to subscribe to accounts or posts in the forum as well.
Witnesses only certify transactions and pass them around. They store the whole database but they will not respond to requests to replicate the whole database.
Ledger nodes only store the financial ledger of the transferrable tokens of the system, as well as the accounts database. This type of node only stores the ledger, and does not concern itself with any other type of transaction. The additions to the accounts from forum rewards are akin to new mined blocks, the authority to mint is built into this via the authority from the rewards pool, which is a consensus, deterministic system.
Mirrors only provide a subset of the database, they must hold the Ledger and Account database, the latter as all others must, but in the Forums they can select a set of forums they mirror. They advertise which forums they are interested in, and other nodes know thereby which data they are not interested in knowing about. They can be requested to replicate part of the database for other nodes that need to synchronise.
RPC nodes are full nodes, and can certify and provide answers to queries about the contents of the whole database, including requests to mirror it. The most important distinction between them and witnesses is in their response to queries, which they share in common with Mirrors, in their more limited capacity. The other thing that they do is serve up the web application, and check up on other RPC nodes that they are serving the correct application (which they will do through external proxy relay systems to prevent those being checked from knowing that it's just a check).
22.214.171.124. Last Seen
Heartbeat liveness systems are weak by themselves, thus the need for the secondary record. The secondary records that can be created by other nodes augment the primary advertisment, but do not entirely replace it. A node may miss its deadline of expiry of advertisment, yet still have a Last Seen that is newer than this. This field is a subtable or attached list to the first field