Annotation Database: Difference between revisions
No edit summary |
|||
Line 46: | Line 46: | ||
Some entities, such as Abalone observations, should be hidden from the public. The database includes a system of group-based user access restrictions which apply to every entity in the <code>rov.event</code> table. | Some entities, such as Abalone observations, should be hidden from the public. The database includes a system of group-based user access restrictions which apply to every entity in the <code>rov.event</code> table. | ||
Each user can have zero or more memberships in groups assigned by an administrator, for example, the <code>msea_admin</code> group, which allows a user to use the administration tools on the website. The groups table is inherited from the Django <code>public.auth_group</code> table and modified with a 32-bit binary column. Each group (up to a maximum of 31 groups) has a unique mask, corresponding to the group's ID by the function < | Each user can have zero or more memberships in groups assigned by an administrator, for example, the <code>msea_admin</code> group, which allows a user to use the administration tools on the website. The groups table is inherited from the Django <code>public.auth_group</code> table and modified with a 32-bit binary column. Each group (up to a maximum of 31 groups) has a unique mask, corresponding to the group's ID by the function <code>2^id</code>. So for the ID <code>1</code> the mask value (with extraneous high-order bits elided) is <code>0010 (2)</code>, for ID <code>2</code> the mask is <code>0100 (4)</code>, etc. Every group has the first bit set, so the actual values are <code>0011</code> and <code>0101</code>. | ||
Each row in the <code>rov.event</code> has a corresponding column, <code>restriction_mask</code>, which, when binary [https://en.wikipedia.org/wiki/Bitwise_operation#Bitwise_operators ANDed] with the group mask results in a value that determines whether a user has access rights. By default, every bit in the mask is set, so the restriction mask's default value is <code>1111</code>. ORing this mask with any group mask gives a non-zero result. | Each row in the <code>rov.event</code> has a corresponding column, <code>restriction_mask</code>, which, when binary [https://en.wikipedia.org/wiki/Bitwise_operation#Bitwise_operators ANDed] with the group mask results in a value that determines whether a user has access rights. By default, every bit in the mask is set, so the restriction mask's default value is <code>1111</code>. ORing this mask with any group mask gives a non-zero result. |
Latest revision as of 21:30, 27 June 2025
Software
The database runs on PostgreSQL with the PostGIS spatial extension. The database and Web server (Nginx) are hosted on a cloud-based server running the Debian Linux operating system.
Organisation
There are several databases on the server:
msea
stores the annotation database.specimen
stores the specimen database.pmls
stores the PMLS database.intertidal
stores the intertidal database.
- This wiki resides in the
wiki
schema on in themsea
but should be moved in the future.
Annotation Database Deployment and Upgrades
The annotation database definition consists of a baseline schema dump and a series of versioned upgrade files. The version information is stored in the table public.db_version
.
The db_manager.py
script in the model
folder applies the necessary upgrades. The usage is
python db_manage.py [-p] [-c] <config file>
The -p
switch applies permissions to any new entities (schemas, tables, views, etc.) and the -c
switch commits the changes. Without -c
the script performs a "dry run" to confirm that the updates will complete successfully before permanent upgrades are applied.
The script will read the version table and only apply scripts after the last upgrade version. The version is parsed from the filename itself, which will be in the form, db_<major version>_<minor version>_<patch>_<sequence>_<description>.sql
, where:
major version
is the major version of the database.minor version
is the minor version.patch
is the patch version.sequence
sequence is a number that determines the order in which files for a specific version are applied.description
is a very short string giving a hint about what's in the file. An example of such a file would be:
db_2_1_11_0_status_detail_name.sql
Where the version is 2.1.11
; this file would be applied first in a sequence of files with the same version; and the purpose of the file has something to do with the status_detail
table.
The upgrade script is in the model
folder, and the version files are in model/ddl
.
Design
The database is relational and normalized which maximizes data integrity while minimizing duplication. It was originally designed using the industry-standard Visual Paradigm software, but is now updated by the incremental application of DDL by a script, db_manager.py (see below). The database was developed alongside the Website backend using Django but the Django models do not manage the table structure directly in the usual way: the database is intended to be framework-agnostic, so the Django models must be manually updated to agree with the database structure. Nevertheless, it is sometimes necessary to apply migrations using the Django manager.
Since the database is inherently spatial, the PostGIS spatial extension is installed and all spatial entities are stored using PostGIS geometry columns with the spherical geography type, which handles spatial representation and calculations on the spheroid better than the standard Cartesian geometry type.
Restricted Access to Entities
Some entities, such as Abalone observations, should be hidden from the public. The database includes a system of group-based user access restrictions which apply to every entity in the rov.event
table.
Each user can have zero or more memberships in groups assigned by an administrator, for example, the msea_admin
group, which allows a user to use the administration tools on the website. The groups table is inherited from the Django public.auth_group
table and modified with a 32-bit binary column. Each group (up to a maximum of 31 groups) has a unique mask, corresponding to the group's ID by the function 2^id
. So for the ID 1
the mask value (with extraneous high-order bits elided) is 0010 (2)
, for ID 2
the mask is 0100 (4)
, etc. Every group has the first bit set, so the actual values are 0011
and 0101
.
Each row in the rov.event
has a corresponding column, restriction_mask
, which, when binary ANDed with the group mask results in a value that determines whether a user has access rights. By default, every bit in the mask is set, so the restriction mask's default value is 1111
. ORing this mask with any group mask gives a non-zero result.
An example. Say the user is a member of group A, whose mask is 0011
and group B, whose mask is 1001
. If we OR these masks together, we get 1011
, which is the user's access mask. Say the user would like to view a record which only allows access to members of group C, whose mask is 0101
. Therefore, the record's restriction mask is set to 0100
. If we AND the user's mask and the restriction mask, we get,
1011 AND 0100 -------- = 0000
Since the result is zero, the user has no access. However, if the record's mask allows access to members of group B, the calculation is,
1011 AND 0010 -------- = 0010
Since this mask is non-zero, the user has access to the record.
This system is much more efficient than a purely relational solution (i.e., linked tables), with little impact on queries. However, the masks must be re-compiled whenever the restrictions are changed, or groups or events are added. The masks are compiled using the procedure compile_restriction_masks(boolean)
. The boolean argument, if true
forces the recompilation of all masks, otherwise only the new masks (which default to zero) are compiled.
Connecting
There are a variety of scripts for accessing, importing and managing data, in addition to the Website itself, which is based on Django. All of these load connection parameters from configuration files in the configs
directory in the user's home directory, whether on Windows or Linux. Typically, the configuration files are accessed by a path similar to ~/configs/db_prod.conf
, were ~
is the standard shorthand for the user's home directory and db_prod.conf
is the file containing connection parameters for the production database's non-administrative user. A standard set of files in the configs
might look like:
~/configs/db_prod.conf
-- The production database non-administrative user. Used by the Website to access data.~/configs/db_prod_admin.conf
-- The production database administrative user. Used bydb_manager.py
and other scripts that can change the structure of the database or its contents.~/configs/db_stage.conf
-- Similar todb_prod.conf
but used for the staging database.~/configs/db_stage_admin.conf
-- Similar todb_prod_admin.conf
but used for the staging database.~/configs/db_dev.conf
-- Used on the developer's local machine for access to a development database.
The configuration files will contain colon-delimited pairs of strings on each line:
host:localhost port:5432 dbname:[database name] username:[user name] password:[password]
The database name will be one of msea_dev
for the local development database (though the developer can choose any name); msea
for the production database; or msea_stage
for the staging database.
Upgrades
As above, the database is upgraded by running a script, which applies the necessary DDL files.
With no switches, the script will perform a dry run, applying the DDL to the database but not committing the changes. If it succeeds, the upgrade can be applied permanently by using the -c
switch. The script checks a table to determine the current database version and applies upgrades from that version forward (the versions are encoded in the DDL file names). If the version is given as an argument, upgrades are applied only up to that version (inclusive). The final argument to the script is the path to a configuration file which contains colon-delimited database connection parameters.
An example invocation of the script for a dry run is,
./db_manager.py ~/configs/db_prod_admin.conf
To apply the changes permanently, one would invoke,
./db_manager.py -c -p ~/configs/db_prod_admin.conf
In both of these invocations, the configuration file is for the production database with admin permissions (which are required to alter the structure). The -c
flag commits the changes and the -p
flag applies the correct permissions to any created entities.
Backups
A cron job runs a backup on the production database every night, saving a compressed dump to a designated folder. The backup is actually performed by the db_maintenance.sh script, which is installed by the database install script, install.sh. More information can be found on the Deployment page.
Entity Documentation
Full database documentation, generated from the production database, can be found here: Database Entity Documentation.
Helpful Database Tools
A few tools that can be useful for navigating and visualising the database: