Preparation of Import Data

From MediaWiki
Revision as of 23:37, 6 May 2025 by Rob (talk | contribs) (Created page with "Data to be loaded into the database must be cleaned and prepared. There are several types of data that can be imported: # Metadata. # Streams. ## 3D Positions. ## 2D Positions. ## Measurements. # Annotations. # Comments. # Statuses. === Metadata === Metadata are the single type of data that cannot be automatically imported from files, but must be input into the import form manually. The entities that can accept metadata are, * Cruises. * Dives. * Tra...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Data to be loaded into the database must be cleaned and prepared. There are several types of data that can be imported:

  1. Metadata.
  2. Streams.
    1. 3D Positions.
    2. 2D Positions.
    3. Measurements.
  3. Annotations.
  4. Comments.
  5. Statuses.

Metadata

Metadata are the single type of data that cannot be automatically imported from files, but must be input into the import form manually. The entities that can accept metadata are,

The cruise, dives and transects require the same inputs:

  • Start and end times.
  • Objectives.
  • Summaries.
  • Notes.

The cruise and dives can also have a crew members and the dive can have attached documents.

Platform and instrument configurations can accept a JSON object which describes the configuration of the associated device. This object has no restrictions on its form. The configuration must also accept an instrument or a platform, respectively.

Annotation protocol and job configurations are described on the annotation protocols page.

Streams

Data streams are usually machine-generated measurements or positions, such as water salinity, depth, temperature, or 2- and 3D coordinates generated by a GPS or USBL. These are loaded as CSV files, each with two columns (in the case of measurements), three (in the case of 2D coordinates) or four (in the case of 3D coordinates).

Every file has a timestamp column in the form 2000-01-01T12:00:00Z.

The header format for each type of file is:

Measurements

The timestamp and value columns are required. The value column is always a floating-point number.

Measurement Data Stream
timestamp value
2000-01-01T12:00:00Z 123.45

2D Coordinates

The timestamp, x and y columns are required. x and y are floating point numbers representing the coordinate as latitude and longitude. The coordinate reference system is assumed to be WGS84.

2D Coordinate Data Stream
timestamp x y
2000-01-01T12:00:00Z -130.56789 48.56789

3D Coordinates

The timestamp, x, y and z columns are required. x, y and z are floating point numbers representing the coordinate as latitude and longitude and depth in meters. The coordinate reference system is assumed to be WGS84.

3D Coordinate Data Stream
timestamp x y z
2000-01-01T12:00:00Z -130.56789 48.56789 123.45

Annotations

Annotations are downloaded from Biigle as CSV reports which are directly imported through the import form, after label mapping for the project is completed.