Annotation Database Import App: Difference between revisions
(One intermediate revision by the same user not shown) | |||
Line 352: | Line 352: | ||
Handlers are callable classes that process annotation labels, tags and information associated with them. There are two types of labels, [[#Tag Handlers|tag labels]], and [[#Tag Data Handlers|data labels]]. Each type of handler declares a list of tags which, when they appear, will trigger a call to the handler. | Handlers are callable classes that process annotation labels, tags and information associated with them. There are two types of labels, [[#Tag Handlers|tag labels]], and [[#Tag Data Handlers|data labels]]. Each type of handler declares a list of tags which, when they appear, will trigger a call to the handler. | ||
All available handlers are loaded from the <code> | All available handlers are loaded from the <code>tag_handler</code>s and <code>property_handler</code>s directories in the <code>importer</code> constructor. | ||
==== Tag Handlers ==== | ==== Tag Handlers ==== | ||
Line 367: | Line 367: | ||
* <code>on_bottom_tag_handler</code> -- Handles updates on the state of the platform: on or off the bottom. | * <code>on_bottom_tag_handler</code> -- Handles updates on the state of the platform: on or off the bottom. | ||
==== | ==== Property Handlers ==== | ||
Property handlers expect to handle or process data associated with the tag. | |||
The current list of tag data handlers is: | The current list of tag data handlers is: | ||
* <code> | * <code>habitat_property_handler</code> -- Applies habitat data to the habitat event context from the input. | ||
* <code> | * <code>observation_property_handler</code> -- Applies observation data to the habitat event context from the input. | ||
* <code> | * <code>status_property_handler</code> -- Applies status event data to the habitat event context from the input. |
Latest revision as of 16:30, 10 July 2024
Data for import is produced by principal investigators, annotators and NDST. As a final step, the importer will combine all of this data into a single importable dataset.
There are three steps for this final import process.
Downloading Import Data
The database import queue page displays lists of jobs configured by principal investigators, annotators and NDST using their respective tools. You will see a table containing jobs for Principal Investigators and Annotators. Click the download button beside each one to download a JSON file containing the job data for the cruise you wish to import.
The NDST section has a button labeled, "Refresh from NDST Database". This will load the data from the Dive Logging App into a transitional table in the main database. If new equipment or personnel have been created in the Dive Logging App, a table for each will appear that will allow you to map the new equipment onto an extant piece of hardware in the database, or a new person into the personnel table. If the person or equipment haven't been created yet, you can create them by clicking the "+" button.
Below this, you will find a table marked "Current Cruises" containing complete metadata for the cruise, dives, transects, personnel and equipment configurations. Download the file for the cruise you'll be importing.
The Import Program
Cruises can be imported on the command line using the importer.py
program, or the graphical utility, by using the -g
switch. The program gives a user the ability to load data into the Development, Staging and Production environments using a loaded configuration file, and to select which types of entities are to be dropped and/or (re)created.
The import process is much faster if run on the server using the command-line interface, but note the issue with Access databases.
The program provides a dry-run feature, so the import can be checked for validity before it is committed.
Import Program Algorithm
The import program progressed through these steps:
- Load the handlers.
- Load the main configuration file.
- Assemble a list of required steps based on user inputs, such as which entities to replace and/or load.
- Load the import job configuration for principal investigators.
- Create and configure the cruise entity to which all other entities are attached.
- Load the import job configuration for annotators.
- Configure the label tree that will be used to map incoming annotation labels to database entities.
- Create and/or configure the annotation protocol used for annotations represented by the label tree.
- Load the personnel mapping for Biigle annotators.
- Load the import job configuration for NDST.
- Update the cruise with additional metadata.
- Create or update the dives.
- Create or update the transects.
- Create or update the platform (ship, submersible) and equipment configurations.
- Create or the personnel roles on the cruise, dives and transects.
- Generate the interval trees used in subsequent phases to locate dives and transects during which observations have occurred, by temporal correspondence.
- Resolve the label map -- loads the necessary lookups to satisfy the references in mapped labels.
- Process Biigle annotations.
- Process VideoMiner annotations.
- Process CSV annotations.
- Process data streams (navigation, telemetry, water properties, etc.)
- Process comment events from CSV files.
- Process status events from CSV files.
- Process measurement events from CSV files.
- Commit or roll back changes, depending on whether the dry run state is chosen.
Configuration Files
The import program uses JSON configuration files. The main configuration file, named config.json
by convention.
Main Configuration File
The main config file provides some configuration values, and also provides links to other configuration files, which load and configure specific chunks of importable data. This is the file that is selected for loading in the import program.
Name | Type | Required | Description |
---|---|---|---|
iqa_file | string | Yes, if there are annotations. | The configuration file generated by the Annotation Database Import for Annotators app. |
iqpi_file | string | Yes | The configuration file generated by the Annotation Database Import for Principal Investigators app. |
ndst_file | string | Yes | The configuration file generated from the NDST Dive Logging App. |
biigle_configs | string[] | Yes, if there are Biigle annotations | One or more files containing configurations for annotations in Biigle format. |
vm_configs | string[] | Yes, if there are VideoMiner annotations | One or more files containing configurations for annotations in VideoMiner format. |
csv_configs | string[] | Yes, if there are CSV annotations | One or more files containing configurations for annotations in CSV format. |
status_configs | string[] | Yes, if there are status events | One or more files containing configurations for status events in CSV format. |
measurement_configs | string[] | Yes, if there are measurement events | One or more files containing configurations for measurement events in CSV format. |
stream_configs | string[] | Yes, if there are navigation or telemetry streams | One or more files containing configurations for data streams in CSV format. |
comment_configs | string[] | Yes, if there are comments | One or more files containing configurations for comments in CSV format. |
media_path | string | No | The root URL for a site to which relative media paths can be appended to access the media. |
Principal Investigator Import Job Configuration
The Principal Investigator's import job configuration is generated by the Web app using inputs provided by the operator, and is not manually created or edited. The configuration object is contained in a top-level result
property. Unimportant properties are excluded.
Name | Type | Required | Description |
---|---|---|---|
id | integer | True | The database ID of the job. |
mseauser | object | True | Contains the user's Biigle credentials and other information. |
cruise | object | True | Contains information about the cruise and subsidiary objects. |
Name | Type | Required | Description |
---|---|---|---|
id | integer | True | The database ID of the cruise. |
programs | object[] | True | Contains a list of associated programs. |
first_nation_contacts | object[] | True | Contains a list of First Nation contacts related to the cruise. |
dives | object[] | True | Contains a list of dives. |
Name | Type | Required | Description |
---|---|---|---|
id | integer | True | The database ID of the dive. |
name | string | True | The name of the dive. |
start_time | datetime | True | The start time of the dive. |
end_time | datetime | True | The end time of the dive. |
objective | string | False | The objective of the dive. |
summary | string | False | A summary of the dive. |
note | string | False | Notes about the dive. |
cruise | integer | True | The database ID of the cruise. |
sub_config | object | True | Contains the platform configuration for the dive. |
ship_config | object | True | Contains the platform configuration for the ship. |
site | object | True | Contains an object which represents the survey site. |
transects | object[] | True | Contains a list of transects. |
crew | object[] | True | Contains a list of crew members. |
Name | Type | Required | Description |
---|---|---|---|
id | integer | True | The database ID of the transect. |
name | string | True | The name of the transect. |
start_time | datetime | True | The start time of the transect. |
end_time | datetime | True | The end time of the transect. |
objective | string | False | The objective of the transect. |
summary | string | False | A summary of the transect. |
note | string | False | Notes about the transect. |
Name | Type | Required | Description |
---|---|---|---|
id | integer | True | The database ID of the crew item. |
dive | object | True | An object representing the dive. |
person | object | True | An object representing the crew member. |
dive_role | object | True | An object representing the crew member's role. |
note | string | False | A note about this crew member. |
Name | Type | Required | Description |
---|---|---|---|
id | integer | True | The database ID of the platform configuration. |
platform | object | True | An object representing the platform. |
instrument_configs | object[] | True | A list of instrument configurations. |
configuration | object | False | A free-form JSON object containing configured properties of the platform. |
Name | Type | Required | Description |
---|---|---|---|
id | integer | True | The database ID of the platform configuration. |
instrument | object | True | An object representing the instrument. |
configuration | object | False | A free-form JSON object containing configured properties of the instrument. |
Annotator Import Job Configuration
NDST Import Job Configuration
Biigle Annotation Configuration
VideoMiner Annotation Configuration
VideoMiner data is usually stored in Microsoft Access databases. In the standard layout, data are stored in the data
table with lookups starting with the prefix lu_*
. Frequently, scientists will change the structure of the database or add or remove items from the lookups. Frequently, the lookups are removed altogether, which makes reconstructing the dataset very difficult. There are older versions of VideoMiner with different naming constructions.
The VideoMiner configuration file has the following fields. (TBD)
Importing on Linux
A reliable Access driver doesn't currently exist for Linux, but Access databases may be converted to SQLite using DBeaver. In this case the configuration is identical, but the db_file
must point to a file with the extension, .sqlite
.
Note: When converting timestamps using DBeaver, it may be necessary to configure the timezone_offset
property in the configuration, even though it may not be necessary when using the Access file directly (SQLite does not preserve the timezone offset). Also note that some VideoMiner databases place the date and time in separate columns. Of course, this can't work when the fields are converted to integer time stamps. They should be combined into a single column before conversion.
CSV Annotation Configuration
Video and photo annotations can be stored in CSV files (as opposed to VideoMiner, Biigle, etc.) which may not have a formal structure. They can be converted into label trees and mapped using the Label Mapping app. The annotation file and label tree can then be loaded using the csv_configs
section.
Name | Type | Required | Description |
---|---|---|---|
db_file | string | Yes | The name of a file containing observations (of habitat, species, etc.) which have been mapped using the Label Mapping app. The mappings are contained in the iqa_file file.
|
label_tree_file | string | Yes | The name of a JSON file containing labels mapped by the Annotation Database Import for Annotators |
id_column | string | Yes | The name of a column containing the original ID of the row. |
label_column | string | Yes | The name of the column containing the mapped label. |
timestamp_column | string | Yes | The name of the column containing the timestamp in the standard format. |
medium_filename_column | string | No | The name of the column containing the filenames of media referenced by the record. |
Comment Event Configuration
Status Event Configuration
Status events can be represented in a CSV file.
TODO: At present only the on-bottom event is implemented.
Name | Type | Required | Description |
---|---|---|---|
db_file | string | Yes | The name of a file containing status events. |
id_column | string | Yes | The name of a column containing the original ID of the row. |
on_bottom_column | string | No | If the on-bottom status event is used, this column contains a 0 for off the bottom, and 1 for on the bottom. The event is created at the first record, a change of the value, and at the end.
|
timestamp_column | string | Yes | The name of the column containing the timestamp in the standard format. |
Measurement Event Configuration
Measurement events are human-recorded measurements which can be represented in a CSV file. The configuration contains a property, configs
, which is a list of objects with the properties listed below. Each configuration saves a single measurement as a measurement event.
Name | Type | Required | Description |
---|---|---|---|
db_file | string | Yes | The name of a file containing measurement events. |
timestamp_column | string | Yes | The timestamp column, in the standard format. |
configs | object[] | Yes | The list of measurement configuration objects (below). |
Name | Type | Required | Description |
---|---|---|---|
column_name | string | Yes | The name of the column containing the quantity. |
id_column | string | Yes | The name of a column containing the original ID of the row. |
measurement_type | string | Yes | The short code of a measurement type. |
Name | Type | Required | Description |
---|---|---|---|
name | string | Yes | The name of the measurement configuration. |
id_column | string | Yes | The name of a column containing the original ID of the row. |
instrument_config | string | Yes, if instrument_config_map is not configured. | The name of an instrument configuration as created in the NDST Dive Logging App. |
instrument_config_map | object | Yes, if instrument_config is not configured. | A mapping from instruments in a given second column to instrument configurations. Allows a stream of measurements to be generated by multiple instruments, selected for the highest quality. |
data_config | string[] | Yes | A tuple containing the type ("measurement", "position" or "orientation") and the short code of a MeasurementType, PositionType or OrientationType. |
columns | string[] | Yes | The names of the columns containing the quantity. If multiple columns are provided, the quantity is a tuple assembled from multiple values, as in the case of a position or orientation. In most cases, this will contain one item. |
Stream Configuration
Streams of machine-generate navigation, telemetry and water properties data are represented in CSV files. The configuration contains a property, configs
, which is a list of objects with the properties listed below. Each configuration saves a single measurement as a measurement event. There may be many individual measurement and/or navigation configurations.
Name | Type | Required | Description |
---|---|---|---|
db_file | string | Yes | The name of a file containing data. |
timestamp_column | string | Yes | The timestamp column, in the standard format. |
stream_configs | object[] | Yes | The list of measurement configuration objects (below). |
Handlers
Handlers are callable classes that process annotation labels, tags and information associated with them. There are two types of labels, tag labels, and data labels. Each type of handler declares a list of tags which, when they appear, will trigger a call to the handler.
All available handlers are loaded from the tag_handler
s and property_handler
s directories in the importer
constructor.
Tag Handlers
Tag handlers are called when a tag appears and do not expect to handle or process data. Each has a matches()
method, which returns true if the given label satisfies the handler's requirements. The handlers __call__()
method is actually called with the current input row, the event context and other information to trigger configuration of the event.
The current list of tag handlers is:
comment_tag_handler
-- Handles comments in the input data.habitat_tag_handler
-- Handles habitat annotations.ignore_tag_handler
-- Flags the record to be ignored.laser_point_tag_handler
-- Handles a laser point annotation.not_annotated_tag_handler
-- Marks a non-annotated region.observation_tag_handler
-- Handles observation annotations.on_bottom_tag_handler
-- Handles updates on the state of the platform: on or off the bottom.
Property Handlers
Property handlers expect to handle or process data associated with the tag.
The current list of tag data handlers is:
habitat_property_handler
-- Applies habitat data to the habitat event context from the input.observation_property_handler
-- Applies observation data to the habitat event context from the input.status_property_handler
-- Applies status event data to the habitat event context from the input.