Annotation Database Import App

From MediaWiki
Revision as of 19:36, 5 December 2023 by Rob (talk | contribs)
Jump to navigation Jump to search

Background

Data for import is produced by principal investigators, annotators and NDST. As a final step, the importer will combine all of this data into a single importable dataset.

There are three steps for this final import process.

Downloading Import Data

The database import queue page displays lists of jobs configured by principal investigators, annotators and NDST using their respective tools. You will see a table containing jobs for Principal Investigators and Annotators. Click the download button beside each one to download a JSON file containing the job data for the cruise you wish to import.

The NDST section has a button labeled, "Refresh from NDST Database". This will load the data from the Dive Logging App into a transitional table in the main database. If new equipment or personnel have been created in the Dive Logging App, a table for each will appear that will allow you to map the new equipment onto an extant piece of hardware in the database, or a new person into the personnel table. If the person or equipment haven't been created yet, you can create them by clicking the "+" button.

Below this, you will find a table marked "Current Cruises" containing complete metadata for the cruise, dives, transects, personnel and equipment configurations. Download the file for the cruise you'll be importing.

Configuration Files

The import program uses JSON configuration files. The main configuration file, named config.json by convention.

Main Configuration File

The main config file provides some configuration values, and also provides links to other configuration files, which load and configure specific chunks of importable data. This is the file that is selected for loading in the import program.

Main Configuration File Contents
Name Type Required Description
iqa_file string Yes, if there are annotations. The configuration file generated by the Annotation Database Import for Annotators app.
iqpi_file string Yes The configuration file generated by the Annotation Database Import for Principal Investigators app.
ndst_file string Yes The configuration file generated from the NDST Dive Logging App.
csv_configs string[] Yes, if there are CSV annotations One or more files containing configurations for annotation in CSV format. See the config file description below.
status_configs string[] Yes, if there are status events One or more files containing configurations for status events in CSV format. See the config file description below.
measurement_configs string[] Yes, if there are measurement events One or more files containing configurations for measurement events in CSV format. See the config file description below.
stream_configs string[] Yes, if there are navigation or telemetry streams One or more files containing configurations for data streams in CSV format. See the config file description below.
comment_configs string[] Yes, if there are comments One or more files containing configurations for comments in CSV format. See the config file description below.
media_path string No The root URL for a site to which relative media paths can be appended to access the media.

CSV Annotation Configuration

Video and photo annotations can be stored in CSV files (as opposed to VideoMiner, Biigle, etc.) which may not have a formal structure. They can be converted into label trees and mapped using the Label Mapping app. The annotation file and label tree can then be loaded using the csv_configs section.

CSV Configuration File Contents
Name Type Required Description
db_file string Yes The name of a file containing observations (of habitat, species, etc.) which have been mapped using the Label Mapping app. The mappings are contained in the iqa_file file.
label_tree_file string Yes The name of a JSON file containing labels mapped by the Annotation Database Import for Annotators
id_column string Yes The name of a column containing the original ID of the row.
label_column string Yes The name of the column containing the mapped label.
timestamp_column string Yes The name of the column containing the timestamp in the standard format.
medium_filename_column string No The name of the column containing the filenames of media referenced by the record.

Status Event Configuration

Status events can be represented in a CSV file.

TODO: At present only the on-bottom event is implemented.

Status Configuration File Contents
Name Type Required Description
db_file string Yes The name of a file containing status events.
id_column string Yes The name of a column containing the original ID of the row.
on_bottom_column string No If the on-bottom status event is used, this column contains a 0 for off the bottom, and 1 for on the bottom. The event is created at the first record, a change of the value, and at the end.
timestamp_column string Yes The name of the column containing the timestamp in the standard format.

Measurement Event Configuration

Measurement events can be represented in a CSV file. The configuration contains a property, configs, which is a list of objects with the properties listed below. Each configuration saves a single measurement as a measurement event.

Measurement Configuration File Contents
Name Type Required Description
db_file string Yes The name of a file containing measurement events.
configs object[] Yes The list of measurement configuration objects (below).


Measurement Configuration File Contents -- Config Objects
Name Type Required Description
column_name string Yes The name of the column containing the quantity.
id_column string Yes The name of a column containing the original ID of the row.
measurement_type string Yes The short code of a measurement type.