Annotation Database Import App: Difference between revisions
No edit summary |
|||
Line 206: | Line 206: | ||
| id_column || string || Yes || The name of a column containing the original ID of the row. | | id_column || string || Yes || The name of a column containing the original ID of the row. | ||
|- | |- | ||
| instrument_config || string || Yes || The name of an instrument configuration as created in the [[Dive Logging App|NDST Dive Logging App]]. | | instrument_config || string || Yes, if instrument_config_map is not configured. || The name of an instrument configuration as created in the [[Dive Logging App|NDST Dive Logging App]]. | ||
|- | |||
| instrument_config_map || object || Yes, if instrument_config is not configured. || A mapping from instruments in a given second column to instrument configurations. Allows a stream of measurements to be generated by multiple instruments, selected for the highest quality. | |||
|- | |- | ||
| data_config || string[] || Yes || A tuple containing the type ("measurement", "position" or "orientation") and the short code of a MeasurementType, PositionType or OrientationType. | | data_config || string[] || Yes || A tuple containing the type ("measurement", "position" or "orientation") and the short code of a MeasurementType, PositionType or OrientationType. |
Revision as of 18:44, 5 January 2024
Background
Data for import is produced by principal investigators, annotators and NDST. As a final step, the importer will combine all of this data into a single importable dataset.
There are three steps for this final import process.
Downloading Import Data
The database import queue page displays lists of jobs configured by principal investigators, annotators and NDST using their respective tools. You will see a table containing jobs for Principal Investigators and Annotators. Click the download button beside each one to download a JSON file containing the job data for the cruise you wish to import.
The NDST section has a button labeled, "Refresh from NDST Database". This will load the data from the Dive Logging App into a transitional table in the main database. If new equipment or personnel have been created in the Dive Logging App, a table for each will appear that will allow you to map the new equipment onto an extant piece of hardware in the database, or a new person into the personnel table. If the person or equipment haven't been created yet, you can create them by clicking the "+" button.
Below this, you will find a table marked "Current Cruises" containing complete metadata for the cruise, dives, transects, personnel and equipment configurations. Download the file for the cruise you'll be importing.
The Import Program
Cruises can be imported on the command line using the importer.py
program, or the graphical utility importer_gui.py
. The program gives a user the ability to load data into the Development, Staging and Production environments using a loaded configuration file, and to select which types of entities are to be dropped and/or (re)created.
The program provides a dry-run feature, so the import can be checked for validity before it is committed.
Handlers
Handlers are callable classes that process labels, tags and information associated with them. There are two types of labels, tag labels, and data labels. Each type of handler declares a list of tags which, when they appear, will trigger calling of the handler.
All available handlers are loaded from the tag_handlers
and tag_data_handlers
directories in the importer
constructor.
Tag Handlers
Tag handlers are called when a tag appears and do not expect to handle or process data. Each has a matches()
method, which returns true if the given label satisfies the handler's requirements. The handlers __call__()
method is actually called with the current input row, the event context and other information to trigger configuration of the event.
The current list of tag handlers is:
comment_tag_handler
-- Handles comments in the input data.habitat_tag_handler
-- Handles habitat annotations.ignore_tag_handler
-- Flags the record to be ignored.laser_point_tag_handler
-- Handles a laser point annotation.not_annotated_tag_handler
-- Marks a non-annotated region.observation_tag_handler
-- Handles observation annotations.on_bottom_tag_handler
-- Handles updates on the state of the platform: on or off the bottom.
Tag Data Handlers
Tag data handlers expect to handle or process data associated with the tag.
The current list of tag data handlers is:
habitat_tag_data_handler
-- Applies habitat data to the habitat event context from the input.observation_tag_data_handler
-- Applies observation data to the habitat event context from the input.status_tag_data_handler
-- Applies status event data to the habitat event context from the input.
Import Program Algorithm
The import program progressed through these steps:
- Load the handlers.
- Load the main configuration file.
- Assemble a list of required steps based on user inputs, such as which entities to replace and/or load.
- Load the import job configuration for principal investigators.
- Create and configure the cruise entity to which all other entities are attached.
- Load the import job configuration for annotators.
- Configure the label tree that will be used to map incoming annotation labels to database entities.
- Create and/or configure the annotation protocol used for annotations represented by the label tree.
- Load the personnel mapping for Biigle annotators.
- Load the import job configuration for NDST.
- Update the cruise with additional metadata.
- Create or update the dives.
- Create or update the transects.
- Create or update the platform (ship, submersible) and equipment configurations.
- Create or the personnel roles on the cruise, dives and transects.
- Generate the interval trees used in subsequent phases to locate dives and transects during which observations have occurred, by temporal correspondence.
- Resolve the label map -- loads the necessary lookups to satisfy the references in mapped labels.
- Process Biigle annotations.
- Process VideoMiner annotations.
- Process CSV annotations.
- Process data streams (navigation, telemetry, water properties, etc.)
- Process comment events from CSV files.
- Process status events from CSV files.
- Process measurement events from CSV files.
- Commit or roll back changes, depending on whether the dry run state is chosen.
Configuration Files
The import program uses JSON configuration files. The main configuration file, named config.json
by convention.
Main Configuration File
The main config file provides some configuration values, and also provides links to other configuration files, which load and configure specific chunks of importable data. This is the file that is selected for loading in the import program.
Name | Type | Required | Description |
---|---|---|---|
iqa_file | string | Yes, if there are annotations. | The configuration file generated by the Annotation Database Import for Annotators app. |
iqpi_file | string | Yes | The configuration file generated by the Annotation Database Import for Principal Investigators app. |
ndst_file | string | Yes | The configuration file generated from the NDST Dive Logging App. |
csv_configs | string[] | Yes, if there are CSV annotations | One or more files containing configurations for annotation in CSV format. See the config file description below. |
status_configs | string[] | Yes, if there are status events | One or more files containing configurations for status events in CSV format. See the config file description below. |
measurement_configs | string[] | Yes, if there are measurement events | One or more files containing configurations for measurement events in CSV format. See the config file description below. |
stream_configs | string[] | Yes, if there are navigation or telemetry streams | One or more files containing configurations for data streams in CSV format. See the config file description below. |
comment_configs | string[] | Yes, if there are comments | One or more files containing configurations for comments in CSV format. See the config file description below. |
media_path | string | No | The root URL for a site to which relative media paths can be appended to access the media. |
CSV Annotation Configuration
Video and photo annotations can be stored in CSV files (as opposed to VideoMiner, Biigle, etc.) which may not have a formal structure. They can be converted into label trees and mapped using the Label Mapping app. The annotation file and label tree can then be loaded using the csv_configs
section.
Name | Type | Required | Description |
---|---|---|---|
db_file | string | Yes | The name of a file containing observations (of habitat, species, etc.) which have been mapped using the Label Mapping app. The mappings are contained in the iqa_file file.
|
label_tree_file | string | Yes | The name of a JSON file containing labels mapped by the Annotation Database Import for Annotators |
id_column | string | Yes | The name of a column containing the original ID of the row. |
label_column | string | Yes | The name of the column containing the mapped label. |
timestamp_column | string | Yes | The name of the column containing the timestamp in the standard format. |
medium_filename_column | string | No | The name of the column containing the filenames of media referenced by the record. |
Status Event Configuration
Status events can be represented in a CSV file.
TODO: At present only the on-bottom event is implemented.
Name | Type | Required | Description |
---|---|---|---|
db_file | string | Yes | The name of a file containing status events. |
id_column | string | Yes | The name of a column containing the original ID of the row. |
on_bottom_column | string | No | If the on-bottom status event is used, this column contains a 0 for off the bottom, and 1 for on the bottom. The event is created at the first record, a change of the value, and at the end.
|
timestamp_column | string | Yes | The name of the column containing the timestamp in the standard format. |
Measurement Event Configuration
Measurement events are human-recorded measurements which can be represented in a CSV file. The configuration contains a property, configs
, which is a list of objects with the properties listed below. Each configuration saves a single measurement as a measurement event.
Name | Type | Required | Description |
---|---|---|---|
db_file | string | Yes | The name of a file containing measurement events. |
timestamp_column | string | Yes | The timestamp column, in the standard format. |
configs | object[] | Yes | The list of measurement configuration objects (below). |
Name | Type | Required | Description |
---|---|---|---|
column_name | string | Yes | The name of the column containing the quantity. |
id_column | string | Yes | The name of a column containing the original ID of the row. |
measurement_type | string | Yes | The short code of a measurement type. |
Stream Configuration
Streams of machine-generate navigation, telemetry and water properties data are represented in CSV files. The configuration contains a property, configs
, which is a list of objects with the properties listed below. Each configuration saves a single measurement as a measurement event. There may be many individual measurement and/or navigation configurations.
Name | Type | Required | Description |
---|---|---|---|
db_file | string | Yes | The name of a file containing data. |
timestamp_column | string | Yes | The timestamp column, in the standard format. |
stream_configs | object[] | Yes | The list of measurement configuration objects (below). |
Name | Type | Required | Description |
---|---|---|---|
name | string | Yes | The name of the measurement configuration. |
id_column | string | Yes | The name of a column containing the original ID of the row. |
instrument_config | string | Yes, if instrument_config_map is not configured. | The name of an instrument configuration as created in the NDST Dive Logging App. |
instrument_config_map | object | Yes, if instrument_config is not configured. | A mapping from instruments in a given second column to instrument configurations. Allows a stream of measurements to be generated by multiple instruments, selected for the highest quality. |
data_config | string[] | Yes | A tuple containing the type ("measurement", "position" or "orientation") and the short code of a MeasurementType, PositionType or OrientationType. |
columns | string[] | Yes | The names of the columns containing the quantity. If multiple columns are provided, the quantity is a tuple assembled from multiple values, as in the case of a position or orientation. In most cases, this will contain one item. |