SQL Naming Conventions
Naming Conventions
These conventions will be rigorously enforced without exceptions allowed for any reason. Whilst experts frequently disagree on the best naming convention they almost always agree that any naming convention is better than none and that a strictly enforced naming convention is better than a loosely enforced one.
For readabiltiy reasons, add an underscore between words. If you run things together, you can't tell the difference between "API rate sheet" and "a pirate sheet". Never start or end a word with an underscore.
Never use underscores for a naming convention. Think of a word to describe what the thing is. For example pod_id_member_event_relationship_id is not as good as handshake. If you were to use underscores in a naming convention it would create confusion as to which parts of the the name are the convention and which are for readibility.
All primary keys should be named tablename_id. All foreign keys should also be labelled foreigntable_id. This means that a column named person_id_id will contain a person_id ID wherever it appears in the schema which enables you to simplify joins by using the USING keyword as well as simplifying application code by removing ambiguity.
If a table has a single-column primary key, such as an auto-increment column, then that column is named the same thing as the table followed by the characters _id. For example, the user table's primary key is called user. Naming it id or user or anything else may be logical and consistent too, but in my experience it leads to a lot more code. If this doesn't feel natural, maybe you should consider the naming of your table. What if the thing you'd call table_id would naturally have a name, for example a person_id_id is usually called a username, in that case is it OK to break the naming convention? No. It is more important that the naming is consistent than that it feels totally natural in the english language. Bending to English grammer or English naming conventions simply makes the code more complex.
If a column expresses a relationship_id among tables, name it the same as the related table but add the characters _id. For example, a table of blog posts should have a column called user_id which is a foreign key to the user table's user_id field. This convention ensures that primary key and foreign key names are unique across the schema which in turn allows you to simplify the writing of joins by using the USING keyword, and ensures the same variable name in your application code always refers to the same thing.
Use a natural primary key if the primary key is very unlikely to change, otherwise use an autoincrementing ID. If you wouldn't use the column value in part of a URL, don't use it as a primary key.
Never use SQL or Python reserved words in table or column names, think of something else. For example, group could be pod_id, date could be posted, occurred, saved etc. You can often use the name of an action rather of a thing.
All table, column, view, function and other schema objects must be all lowercase. This helps with consistency, readibility and type-ability
SQL keywords should be in UPPERCASE to make it easier to differentiate them when reading SQL
All SQL should be written fully expanded to start with, it saves time in the long run. Structure anything other than the main keywords by indenting four spaces. Every entry gets its own line unless there is only one entry in which case it can directly follow the keyword. Nested clauses within the WHERE clause are further indented by four spaces making the grouping easily identifiable. This convention allows parts of the query to be easily commented out. Here's an example:
SELECT foo.foo_id AS something, f.name, b.b_id, b.name, b.description FROM foo AS f JOIN bar AS b USING (foo_id) WHERE f.name = 'some' AND ( b.name LIKE '%fiz%' OR b.name LIKE '%fiz%' ) ORDER BY b.b_idContinuing in this vain you could put each comma in the select list on its own line as well, but that's rather ugly.
Where the same primary key is referenced twice in the same table you have to break the foreign key naming convention. eg if two people are related you might label the columns person_id_inviter_id and person_id_invitee_id so that the person_id bit is what the eye reads first (it is the most important part).
Extra Reasons for Applying These Rules in the Context of a Python Application
- Variables in Python are traditionally lowercase so it makes sense to have table and column names lowercase in the schema so that Python variables can have the same names as the SQL equivalents
- The indentation recommended for SQL is very similar to the Python indentation style I recommend for Python code.
- I always prefer naming things uid rather than id to avoid the collision with the Python builtin function id() but since no column should exist with this convention anyway, I see no problem with table_id rather than table_uid.
Discussion of Primary Key Column Naming Convention
Three choices for the naming convention of primary keys are:
- id/uid/oid
- tablename
- tablename_id
Using id, uid or oid is generally a bad idea because it means that in application code you end up using id as a variable and when you are dealing with multiple tables it quickly becomes very confusing and hard-to-track-down bugs get introduced. Also it is unclear what foreign key columns referencing the ID should be called because the tables containing them will have an id column of their own. Naming conventions end up being introduced such as using tablename_id as the foreign key column name but this has the unpleasant effect that joins end up looking like:
ON table.foreign_id = foreign.id
Using the name of the table for the primary key column at first seems like a very good idea. Any ORM can easily work out the primary key from the table name and almost always the name of the table has some relevance to the primary key anyway. You also get the benefit that JOINS are very easy to write becuase you can use USING rather than ON. The drawback is actually quite subtle. Imagine you have a table named person_id with a primary key named person_id. In your application you are likley to refer to an object representing a row in that table with a person_id variable. At the same time you are likely to name the a variable holding the primary key person_id. If you write functions or methods which take a person_id argument it is then not clear whether a primary key or row object is expected and the consequence is that although the SQL code is clean, the application code becomes confused.
The third choice is to use tablename_id as the primary key column. This has all the advantages of using the table name on its own as the name of the primary key, including the ability to use USING and the clarity with which foreign keys can be named but it doesn't result in the same application variable naming problems. The only drawback is that why referencing the primary key of a row object in application code you'll typically have to write something along the lines of person_id.person_id_id rather than person_id.id. This is clearly the least bad option.
Modifying AuthKit Tables
You are not allowed to. User account management is kept completely isolated from other database tasks, allowing horizontal partitioning of functionality as well as the oppurtunity to switch the authentication back-end to LDAP or Active Directory if necessary.
Use of Triggers
Triggers are not allowed anywhere:
- Everything you use triggers for can be handled equally well in a well-designed application model
- Code in the model API layer can be run on any machine, can scale and will not affect database performance
- It is easier to spot and debug errors if the code is in the application layer since it can easily be logged
- It is easier to code logic in Python than SQL
- You can correct problems without having to take down the database to remove the triggers
- You don't end up calling triggers when you don't need to
- We might want to replicate the database at some point and replication of trigger-based systems is significantly more complex and less reliable
If we were writing a banking application where many different applications modified the data then a trigger-based approach would be better as it would ensure consistency but all database changes come via HTTP through the application layer so this is not a concern for our application. What is more, our model layer exposes an API which can easily be turned into a web service which we could expose so we get the benefits of trigger security with the downsides of complex maintence.
Enums
Enums are all handled in Python code for the following reasons:
- Not all databases support them
- The different database have different implementations
- There is no reason to have them implemented in the database apart from habit
- It doesn't add any complexity to store them in the application and it improves performance