Annotation Database Import: Difference between revisions

Revision as of 02:22, 19 March 2025

The database is intended as a retrospective view on data collected by many potentially narrowly-focused surveys. Each survey produces a large number of entities with distinctive forms and purposes that must be mapped onto entities in the database, whose structure is roughly static and which must remain comparable through time.

As a trivial example, one survey might identify a fine substrate as comprising silt, sand and fine gravel while another might identify fine sediment as only silt. Ideally, the database will provide a way to preserve this distinction without introducing so much complexity that it becomes impossible to use and maintain.

The database import process consists of three main steps:

Prepare and clean data.
Map annotation labels to database entities.
Import data.
Regenerate caches.

Data Preparation

Data to be loaded into the database must be cleaned and prepared. There are several types of data that can be imported:

Metadata.
Streams.
1. 3D Positions.
2. 2D Positions.
3. Measurements.
Annotations.
Comments.
Statuses.

Metadata

Metadata are the single type of data that cannot be automatically imported from files, but must be input into the form manually. The entities that can accept metadata are,

The cruise.
Dives.
Transects.
Platform configurations.
Instrument configurations.

The cruise, dives and transects require the same inputs:

Start and end times.
Objectives.
Summaries.
Notes.

The cruise and dives can also have a crew roster and the dive can have attached documents.

Platform and instrument configurations can accept a JSON object which describes the configuration of the associated device. This object has no restrictions on its form. The configuration must also accept an instrument or a platform, respectively.

Streams

(Edit in progress.)

The Chief Scientist or Principal Investigator is responsible for providing high-level metadata concerning the cruise, its scientific objectives, summaries and associated programs, personnel and contacts.

NDST (Non-Destructive Survey Tools) is responsible for providing operational metadata for the cruise, dives and transects, along with platform and instrument configuration information and navigation and water properties data streams. NDST maintains the video/photo media library, but those data are not imported into the database.
Annotators are responsible for providing the video and photo annotations, information about annotation protocols and species guides and for label mapping.
The importer combines the information provided by the other three actors to actually import a cruise and all of its associated data into the database.

Chief Scientists/Principal Investigators

Chief Scientists can use the online tool to enter information about a cruise, its start and end dates, which vessel was used, scientific objectives, operational notes and a summary. The tool permits the assignment of personnel and roles, the optional selection of scientific programs under whose auspices the research is occurring as well as a list of optional First Nations contacts.

More information about the Annotation Database Import for Principal Investigators tool.

Non-Destructive Survey Tools

NDST crews can use the Dive Logging App to record cruise, dive and transect metadata, personnel and equipment configuration. The app is available online, but is intended to be used in the field. The online import tools currently only allow import from the online Dive Logging app, but an issue has been created in Gitlab to allow uploading from field computers.

More information about the Dive Logging App.

Annotators

Annotators can use the online tool to create an annotation project, which includes selecting or creating an annotation protocol and mapping the annotation labels (i.e., those generated by Biigle) to database entities.

More information about the Annotation Database Import for Annotators tool.

Importer

The final step in the import process is for the importer to run a script which combines the information compiled by the other actors into a single unified dataset.

More information about the Annotation Database Importer tool.

@@ Line 1: / Line 1: @@
-= Background =
 The database is intended as a retrospective view on data collected by many potentially narrowly-focused surveys. Each survey produces a large number of entities with distinctive forms and purposes that must be mapped onto entities in the database, whose structure is roughly static and which must remain comparable through time.
 As a trivial example, one survey might identify a fine substrate as comprising silt, sand and fine gravel while another might identify fine sediment as only silt. Ideally, the database will provide a way to preserve this distinction without introducing so much complexity that it becomes impossible to use and maintain.
-The database import process consists of three main steps, each the responsibility of a distinct actor:
+The database import process consists of three main steps:
+# Prepare and clean data.
+# Map annotation labels to database entities.
+# Import data.
+# Regenerate caches.
+== Data Preparation ==
+Data to be loaded into the database must be cleaned and prepared. There are several types of data that can be imported:
+# Metadata.
+# Streams.
+## 3D Positions.
+## 2D Positions.
+## Measurements.
+# Annotations.
+# Comments.
+# Statuses.
+=== Metadata ===
+Metadata are the single type of data that cannot be automatically imported from files, but must be input into the form manually. The entities that can accept metadata are,
-* The Chief Scientist or Principal Investigator is responsible for providing high-level metadata concerning the cruise, its scientific objectives, summaries and associated programs, personnel and contacts.
+* The cruise.
+* Dives.
+* Transects.
+* Platform configurations.
+* Instrument configurations.
+The cruise, dives and transects require the same inputs:
+* Start and end times.
+* Objectives.
+* Summaries.
+* Notes.
+The cruise and dives can also have a crew roster and the dive can have attached documents.
+Platform and instrument configurations can accept a JSON object which describes the configuration of the associated device. This object has no restrictions on its form. The configuration must also accept an instrument or a platform, respectively.
+=== Streams ===
+(Edit in progress.)
+The Chief Scientist or Principal Investigator is responsible for providing high-level metadata concerning the cruise, its scientific objectives, summaries and associated programs, personnel and contacts.
 * NDST (Non-Destructive Survey Tools) is responsible for providing operational metadata for the cruise, dives and transects, along with platform and instrument configuration information and navigation and water properties data streams. NDST maintains the video/photo media library, but those data are not imported into the database.
 * Annotators are responsible for providing the video and photo annotations, information about annotation protocols and species guides and for label mapping.
 * The importer combines the information provided by the other three actors to actually import a cruise and all of its associated data into the database.
 == Chief Scientists/Principal Investigators ==