Annotation Database Import for Annotators: Difference between revisions

From MediaWiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Background =
This application produces a configuration for annotation data that will be used by the importer to import annotation data into the database. The importer will download the information produced by this app and combine it with other sources of data to assemble the import.
This application produces a configuration for annotation data that will be used by the importer to import annotation data into the database. The importer will download the information produced by this app and combine it with other sources of data to assemble the import.


Line 27: Line 25:
== Select Project ==
== Select Project ==


On this page, if the user has provided correct Biigle credentials, the '''Select Biigle Project''' drop-down will be populated with all Biigle projects available to the user. (Otherwise, a pop-up will appear with the message, "Unauthenticated." This can be dismissed and will be removed in the future.)
On this page, if the user has provided correct Biigle credentials, the '''Select Biigle Project''' drop-down will be populated with all Biigle projects available to the user.


The <code>Load a Label Tree</code> button allows the user to load a JSON file representing an artificial label tree derived from a database or spreadsheet file. More information about this can be found on the [[Label Trees]] page. This is not used for Biigle annotations.
The <code>Load a Label Tree</code> button allows the user to load a JSON file representing an artificial label tree derived from a database or spreadsheet file, like those generated by VideoMiner. More information about this can be found on the [[Label Trees]] page. This is not used for Biigle annotations.


== Label Mapping ==
= Label Mapping =


This is the page where most of the work is done.  
This is the page where most of the work is done.  
The first section, '''Tag Documentation''', provides a listing of the tags available for label mapping. Mouse over any of the tags to see a brief description of its purpose.


The '''Project Label Trees''' drop-down provides a listing of label trees in the selected Biigle project. If a non-Biigle label tree file has been loaded, there will be one tree in the list. Select a label tree from the list and click the <code>+</code> button to add it to the app. In most cases it will be desirable to add all of the available label trees for the project.
The '''Project Label Trees''' drop-down provides a listing of label trees in the selected Biigle project. If a non-Biigle label tree file has been loaded, there will be one tree in the list. Select a label tree from the list and click the <code>+</code> button to add it to the app. In most cases it will be desirable to add all of the available label trees for the project.
Line 43: Line 39:
If it is not clear which label tree a specific label comes from, copy the label's ID into the '''Find a Label's Tree''' text box and click the search button. This will return the name of the tree that owns the label, if it exists. You can then add that tree using the '''All Label Trees''' drop-down.
If it is not clear which label tree a specific label comes from, copy the label's ID into the '''Find a Label's Tree''' text box and click the search button. This will return the name of the tree that owns the label, if it exists. You can then add that tree using the '''All Label Trees''' drop-down.


The next section of the page contains the label trees and labels. Each label tree is listed in order, beginning with its title. If the tree has more than 50 labels, a page navigation strip is displayed below the title. Below the page navigation, the '''Show hidden''' checkbox can be used to display individual labels that have been hidden (more on that below); the '''Show label hierarchy''' checkbox toggles whether the full hierarchical path of a label is shown, or just its name; the '''Delete''' button removes the tree from the label mapping app and the '''Save''' button saves your current progress (this also happens automatically every thirty seconds).
The next section of the page contains the label trees and labels. Each label tree is listed in order, beginning with its title. If the tree has more than 50 labels, a page navigation strip is displayed below the title. Below the page navigation, the '''Show hidden''' checkbox can be used to display individual labels that have been hidden (more on that below); the '''Show label hierarchy''' checkbox toggles whether the full hierarchical path of a label is shown, or just its name; the '''Delete''' button removes the tree from the label mapping app and the '''Save''' button saves your current progress (this also happens automatically every thirty seconds). The '''Refresh''' button re-loads the label tree. If the tree has been change (i.e., on Biigle), this will refresh the version shown in the Label Mapper. The '''Download''' and '''Upload''' buttons allow you to download a JSON version of the label map and upload it again. This will be useful if you perform a label mapping job on the staging site and wish to replicate it on the production site without having to re-do the entire thing.


''Note that if you leave the page or delete a label tree from the app, the app will remember the whatever progress has been made on that label: if the tree is added again, the previous configurations should appear.'' When a new label tree is loaded, if any labels have been mapped before, the most recently-used mapping will be pre-populated. Once the mapping is edited, it will maintain the edited state.
''Note that if you leave the page or delete a label tree from the app, the app will remember the whatever progress has been made on that label: if the tree is added again, the previous configurations should appear.'' When a new label tree is loaded, if any labels have been mapped before, the most recently-used mapping will be pre-populated. Once the mapping is edited, it will maintain the edited state.


The labels are displayed in a table with columns displayed depending on which tags are selected in the '''Tag(s)''' drop-down. The static columns are:
The '''Event Type''' drop-down lets you select which type of event you're mapping *to*. Your options are,
* '''Hide''' -- If the button is clicked, the row is hidden.
* '''Ignore''' -- the label is simply ignored.
* '''Label ID''' -- The Biigle label ID.
* '''Not Annotated''' -- the time span covered by this label is not annotated and no events will be recorded.
* '''Source Label''' -- The Biigle label text. If '''Show label hierarchy''' is checked, the full path will be shown.
* '''Observation''' -- an organism or anthropogenic object or impact has been observed.
* '''Tag(s)''' -- A multiple drop-down to which the user will add tags to perform label mapping.
* '''Habitat''' -- observations of habitat characteristics.
* '''General''' -- An experimental feature which will be documented or removed in the future.
* '''Status''' -- events describing the status of the ROV (e.g., on or off the bottom), the mission (e.g., on or off transect) or some other event (e.g., a change in image quality).
* '''Measurement''' -- a user-generated measurement (e.g., field of view).
* '''Comment''' -- signifies the presence of a comment that can be recorded.
 
When an event type is selected, a series of input fields will be displayed, allowing you to enter the relevant information. For example, if you choose '''Habitat''', drop-downs for biocover, coverage and relief will be shown.
 
The '''Tags''' selection box allows you to select multiple tags to attach to a label. For example, if you choose '''Observation''' you can add the tag '''Dead''' to signify that the organism is dead. When you select an event type, the list of tags is pre-populated with suggested tags, or you can click the '''Add''' button to add more.


To begin the mapping process, choose tags from the '''Tag(s)''' drop-down.
=== Observation Events ===


If the label is an observation of a species:
If the label is an observation of an organism:
# Select the ''Observation'' tag. The list of available tags will update to contain choices relevant to that top-level label (see the tags list at the top of the page to see the tag relationships).
 
# Select the ''Species'' tag. The '''Observation > Species''' field set will be added to the table. This includes the scientific and common names of the species, the [https://www.marinespecies.org/about.php Aphia ID], [https://www.inaturalist.org/ iNatualist ID] and Hart code, and an [https://en.wikipedia.org/wiki/Operational_taxonomic_unit OTU].
# Select the '''Observation''' tag. Text boxes and drop-downs will appear in the row to the right.
# Click the search button to the right of the label. You may also select text within the label and then click search -- the search will be performed only on the selected text.
# Select the species name in the original label text and click the '''Search''' button. A panel will appear containing matching records from the WoRMS, iNaturalist and Hart databases. You can also enter text directly into the text field to search for something that is not in the label.
# If the searched text represents the name, or part of the name, of a species, a list of results will be shown, containing the scientific and common names of the species (or, genus, class, etc.), and the [https://www.marinespecies.org/about.php Aphia ID], [https://www.inaturalist.org/ iNatualist ID] and the Hart code, if available. If the search results are unsatisfactory, the user can type any text into the search field and try again, or click the search button next to any of the names in the results list to search on those names. Clicking on the common and scientific names and IDs will place those values into the appropriate fields. The fields can also be populated manually with any appropriate value.
# Click on the best scientific or common name to paste the name into the corresponding text field. Click on the Aphia, iNaturalist or Hart ID to past the ID into the appropriate field. It is advisable to always have an Aphia ID, since this links the taxon into an official taxonomic database.
# Choose additional tags from the '''Tag(s)''' drop-down to modify or specialize the label mapping. If the organism is dead, the ''Dead'' tag can be applied; if the organism appears as a grouping or school that is too large to count, the ''Group'' tag will indicate as much. These labels appear in the database as a list of tags.
# Select or enter values for the other fields as required.
 
For anthropogenic observations, the procedure is the same as for organisms. There are several entries in the Hart database (along with codes) for human-made objects, but in most cases you can just type a description of the object into the common name field.
 
The fields for observation labels have specific purposes which dictate the kinds of information entered into them.
 
<ul>
<li><p>The Scientific Name field is required and intended to hold the accepted taxonomic name (e.g., according to WoRMS) of an organism. Ideally the species or subspecies is known, but this field can contain a higher taxonomic label for organisms that cannot be identified at the species level. For example, if only the genus or phylum can be determined, use that name.</p>
<p>Note: how to deal with standard suffices is not entirely resolved. For example, <em>spp.</em> (a grouping of species) and <em>sp.</em> (a single, unknown species) could be used where an organism cannot be positively identified. However, each label is intended to represent a single organism or aggregation of similar organisms, so the grouping designator seems inappropriate. In any case, if a genus is given without specification, the suffix is implied and can be excluded.</p></li>
 
<li>The Common Name field is required and intended for a vernacular name that a non-biologist might use for a taxon. For example, if the phylum Porifera is entered into the Scientific Name field, an amateur user might not understand what is meant. Putting "Sponge" in the Common Name field makes the label more meaningful to those users. Of course there are many scientific designations that do not have a vernacular counterpart. We do the best we can, either by re-using the scientific name or using whatever label makes the most sense.</li>
<li>The OTU ([https://en.wikipedia.org/wiki/Operational_taxonomic_unit operational taxonomic unit]) field is for alphanumeric codes only.</li>
 
<li>Aphia and iNaturalist IDs and Hart codes are used to identify a taxon with a record in the WoRMS, iNaturalist and Hart databases, respectively (the Hart database is included with VideoMiner but is frequently modified for specific studies). Though the presence of these identifiers is not enforced, at the least, an Aphia ID should be provided to give the record a link to at least one widely-used, up-to-date taxonomic database. Many of the entries in the Hart database use unaccepted or out-of-date names (according to WoRMS), but they do preserve historical data and help tie newer records to older systems of identification.
</ul>
 
Putting extra information (e.g., clarifications or alternative labels in parentheses) in any of the above fields should be avoided. If more information about an identifier exists, it can be located through the annotation protocol or using the OTU. The labels themselves should provide the minimal, most specific means of identifying an organism. This will become important when searching or producing reports of observations that do not include such context as the protocol or cruise report.
 
At some point, it will be possible to perform searchers on the taxonomic tree, for example by searching for any species within a genus or family. Hopefully, careful adherence to a set of rules (even if it isn't this one) will help with that.
 
''Any feedback on the above is very welcome!''
 
=== Habitat Events ===


If the label is a habitat:
If the label is a habitat:
# Select the ''Habitat'' tag.
# Select the ''Habitat'' event type.
# Select the ''Substrate'' tag. Substrate is generally used to describe the underlying substrate, while ''Biocover'' is used to describe living material covering the substrate.
# A list of drop-downs and text fields will appear for the relevant characteristics.
# Select the ''Type'' tag. This will cause a new fieldset, '''Habitat > Substrate > Type''' to appear with one field, '''Substrate'''. This field is a drop-down containing existing substrate types. Select one to match the substrate in the label.
 
There are two main types of habitat observation, substrate and biocover. The fields shown here will enable the configuration of either, and some fields only apply to a habitat type. For example, relief generally only applies to the substrate type, not biocover.
 
It may happen that a label contains multiple characteristics. For example, the label, "Habitat (Subdominant) > Substrate: Mud," describes a substrate with the type, "mud," but describes it as ''subdominant.'' Many survey protocols capture both the dominant and subdominant habitat substrate, so the user can select the ''Subdominant'' or ''Dominant'' tags to flag the habitat. A label may also include complexity or relief information. Selecting the ''Complexity'' or ''Relief'' values from a drop-down will configure those characteristics as well.
 
=== Status Events ===
 
=== Measurement Events ===
 
=== Comment Events ===
 
 
== Taxonomic Databases ==


It may happen that a label contains multiple characteristics. For example, the label, "Habitat (Subdominant) > Substrate: Mud," describes a substrate with the type, "mud," but but describes it as ''subdominant.'' Many survey protocols capture both the dominant and subdominant habitat substrate, so the user can select the ''Subdominant''  or ''Dominant'' tags to flag the habitat. A label may also include complexity or relief information. Selecting the ''Complexity'' or ''Relief'' tags will display a drop-down, allowing the user to configure those characteristics as well.
The Label Mapper creates references to external taxonomic databases:
# The [https://marinespecies.org WoRMS]/[https://obis.org OBIS] databases are linked by the Aphia ID. The data for the local table (taxonomy.worms_taxon) are extracted from the [https://obis.org/node/7dfb2d90-9317-434d-8d4e-64adf324579a OBIS Canada Node], which is maintained by DFO.
# The [https://inaturalist.ca iNaturalist] database is linked by the iNaturalist ID. The taxa are drawn from the [https://inaturalist.ca/projects/marine-life-of-the-northeast-pacific Marine Life of the Northeast Pacific] collection managed by members of DFO. The specific query used to extract entries for the local table (taxonomy.inaturalist_taxon) is [https://inaturalist.ca/observations/export?quality_grade=any&identifications=any&projects%5B%5D=marine-life-of-the-northeast-pacific here].
# The Hart database is the lookup used in VideoMiner. This database has been modified and extended over time by individual researchers. The version in the local table (taxonomy.hart_taxon) is extracted from one of these.


== Personnel ==
== Personnel ==

Latest revision as of 16:42, 17 June 2024

This application produces a configuration for annotation data that will be used by the importer to import annotation data into the database. The importer will download the information produced by this app and combine it with other sources of data to assemble the import.

The database contains a standardized set of fields and tags related to observations of species in real time from video feeds, or from video and photo annotations, in particular, those generated using Biigle. The labels and descriptions used in the field or during annotation have to be mapped to entities in the database in order to create a coherent dataset across multiple surveys.

The labeling strategy is documented in the annotation protocol -- configured on the Annotation Protocol page -- which tells future consumers of this data something about the annotators' objectives and provides some context to the results.

The Database Import for Annotators app consists of several sequential pages which are completed in order. When the current page's requirements are satisfied, the Next button becomes enabled. The Next, Previous and intermediate buttons can be used to navigate through the app.

Note: this application is under development and changes will probably occur. Feedback is always welcome.

Start

The start page provides a brief description of the app and what is required from the user.

Sign In

Here, the user will provide their login credentials and their Biigle username and API token.

Biigle API tokens are created on the Access Tokens page of the Biigle user profile.

Once the user logs in, a list of previously-completed import jobs is displayed at the bottom of the page. They can be viewed, edited and deleted.

To create a new project, click the New Import Job button.

Select Project

On this page, if the user has provided correct Biigle credentials, the Select Biigle Project drop-down will be populated with all Biigle projects available to the user.

The Load a Label Tree button allows the user to load a JSON file representing an artificial label tree derived from a database or spreadsheet file, like those generated by VideoMiner. More information about this can be found on the Label Trees page. This is not used for Biigle annotations.

Label Mapping

This is the page where most of the work is done.

The Project Label Trees drop-down provides a listing of label trees in the selected Biigle project. If a non-Biigle label tree file has been loaded, there will be one tree in the list. Select a label tree from the list and click the + button to add it to the app. In most cases it will be desirable to add all of the available label trees for the project.

The All Label Trees list contains all available label trees in Biigle. In instances where labels have been used in a project but then removed, it will be necessary to map that tree even though it is no longer in the project label trees list. You may add it from this list if it still exists. (There is no current solution for missing label trees. It is a bad idea to delete them before this process is complete!)

If it is not clear which label tree a specific label comes from, copy the label's ID into the Find a Label's Tree text box and click the search button. This will return the name of the tree that owns the label, if it exists. You can then add that tree using the All Label Trees drop-down.

The next section of the page contains the label trees and labels. Each label tree is listed in order, beginning with its title. If the tree has more than 50 labels, a page navigation strip is displayed below the title. Below the page navigation, the Show hidden checkbox can be used to display individual labels that have been hidden (more on that below); the Show label hierarchy checkbox toggles whether the full hierarchical path of a label is shown, or just its name; the Delete button removes the tree from the label mapping app and the Save button saves your current progress (this also happens automatically every thirty seconds). The Refresh button re-loads the label tree. If the tree has been change (i.e., on Biigle), this will refresh the version shown in the Label Mapper. The Download and Upload buttons allow you to download a JSON version of the label map and upload it again. This will be useful if you perform a label mapping job on the staging site and wish to replicate it on the production site without having to re-do the entire thing.

Note that if you leave the page or delete a label tree from the app, the app will remember the whatever progress has been made on that label: if the tree is added again, the previous configurations should appear. When a new label tree is loaded, if any labels have been mapped before, the most recently-used mapping will be pre-populated. Once the mapping is edited, it will maintain the edited state.

The Event Type drop-down lets you select which type of event you're mapping *to*. Your options are,

  • Ignore -- the label is simply ignored.
  • Not Annotated -- the time span covered by this label is not annotated and no events will be recorded.
  • Observation -- an organism or anthropogenic object or impact has been observed.
  • Habitat -- observations of habitat characteristics.
  • Status -- events describing the status of the ROV (e.g., on or off the bottom), the mission (e.g., on or off transect) or some other event (e.g., a change in image quality).
  • Measurement -- a user-generated measurement (e.g., field of view).
  • Comment -- signifies the presence of a comment that can be recorded.

When an event type is selected, a series of input fields will be displayed, allowing you to enter the relevant information. For example, if you choose Habitat, drop-downs for biocover, coverage and relief will be shown.

The Tags selection box allows you to select multiple tags to attach to a label. For example, if you choose Observation you can add the tag Dead to signify that the organism is dead. When you select an event type, the list of tags is pre-populated with suggested tags, or you can click the Add button to add more.

Observation Events

If the label is an observation of an organism:

  1. Select the Observation tag. Text boxes and drop-downs will appear in the row to the right.
  2. Select the species name in the original label text and click the Search button. A panel will appear containing matching records from the WoRMS, iNaturalist and Hart databases. You can also enter text directly into the text field to search for something that is not in the label.
  3. Click on the best scientific or common name to paste the name into the corresponding text field. Click on the Aphia, iNaturalist or Hart ID to past the ID into the appropriate field. It is advisable to always have an Aphia ID, since this links the taxon into an official taxonomic database.
  4. Select or enter values for the other fields as required.

For anthropogenic observations, the procedure is the same as for organisms. There are several entries in the Hart database (along with codes) for human-made objects, but in most cases you can just type a description of the object into the common name field.

The fields for observation labels have specific purposes which dictate the kinds of information entered into them.

  • The Scientific Name field is required and intended to hold the accepted taxonomic name (e.g., according to WoRMS) of an organism. Ideally the species or subspecies is known, but this field can contain a higher taxonomic label for organisms that cannot be identified at the species level. For example, if only the genus or phylum can be determined, use that name.

    Note: how to deal with standard suffices is not entirely resolved. For example, spp. (a grouping of species) and sp. (a single, unknown species) could be used where an organism cannot be positively identified. However, each label is intended to represent a single organism or aggregation of similar organisms, so the grouping designator seems inappropriate. In any case, if a genus is given without specification, the suffix is implied and can be excluded.

  • The Common Name field is required and intended for a vernacular name that a non-biologist might use for a taxon. For example, if the phylum Porifera is entered into the Scientific Name field, an amateur user might not understand what is meant. Putting "Sponge" in the Common Name field makes the label more meaningful to those users. Of course there are many scientific designations that do not have a vernacular counterpart. We do the best we can, either by re-using the scientific name or using whatever label makes the most sense.
  • The OTU (operational taxonomic unit) field is for alphanumeric codes only.
  • Aphia and iNaturalist IDs and Hart codes are used to identify a taxon with a record in the WoRMS, iNaturalist and Hart databases, respectively (the Hart database is included with VideoMiner but is frequently modified for specific studies). Though the presence of these identifiers is not enforced, at the least, an Aphia ID should be provided to give the record a link to at least one widely-used, up-to-date taxonomic database. Many of the entries in the Hart database use unaccepted or out-of-date names (according to WoRMS), but they do preserve historical data and help tie newer records to older systems of identification.

Putting extra information (e.g., clarifications or alternative labels in parentheses) in any of the above fields should be avoided. If more information about an identifier exists, it can be located through the annotation protocol or using the OTU. The labels themselves should provide the minimal, most specific means of identifying an organism. This will become important when searching or producing reports of observations that do not include such context as the protocol or cruise report.

At some point, it will be possible to perform searchers on the taxonomic tree, for example by searching for any species within a genus or family. Hopefully, careful adherence to a set of rules (even if it isn't this one) will help with that.

Any feedback on the above is very welcome!

Habitat Events

If the label is a habitat:

  1. Select the Habitat event type.
  2. A list of drop-downs and text fields will appear for the relevant characteristics.

There are two main types of habitat observation, substrate and biocover. The fields shown here will enable the configuration of either, and some fields only apply to a habitat type. For example, relief generally only applies to the substrate type, not biocover.

It may happen that a label contains multiple characteristics. For example, the label, "Habitat (Subdominant) > Substrate: Mud," describes a substrate with the type, "mud," but describes it as subdominant. Many survey protocols capture both the dominant and subdominant habitat substrate, so the user can select the Subdominant or Dominant tags to flag the habitat. A label may also include complexity or relief information. Selecting the Complexity or Relief values from a drop-down will configure those characteristics as well.

Status Events

Measurement Events

Comment Events

Taxonomic Databases

The Label Mapper creates references to external taxonomic databases:

  1. The WoRMS/OBIS databases are linked by the Aphia ID. The data for the local table (taxonomy.worms_taxon) are extracted from the OBIS Canada Node, which is maintained by DFO.
  2. The iNaturalist database is linked by the iNaturalist ID. The taxa are drawn from the Marine Life of the Northeast Pacific collection managed by members of DFO. The specific query used to extract entries for the local table (taxonomy.inaturalist_taxon) is here.
  3. The Hart database is the lookup used in VideoMiner. This database has been modified and extended over time by individual researchers. The version in the local table (taxonomy.hart_taxon) is extracted from one of these.

Personnel

On this page, a list of annotators will appear alongside a list of matching users already stored in the database. If the user doesn't yet exist in the database, they can be added. The page will make an attempt to automatically map names, but the user may have to verify and adjust the result.

Note: there are no personnel for label trees loaded through a label tree file (yet).

Annotation Protocol

This section describes the protocol used for annotation. Fields are available for the protocol name, the person that originated the protocol, the observation interval, etc. There is space at the bottom of the form for uploading annotation protocol documents, species guides or other files.

Complete

Here, the user can enter a unique name for the annotation job, and the name of the cruise to which it applies. Notes can be added for future reference or as a guide to the person who does the final import into the database.

Click Submit to submit the job. The app will switch back to the sign-in page, where the job will appear in the list.