Skip to content Learn about the access keys available for Unofficial ONDC Registry

Concept help - Data Set

A Data Set describes a record of data, including any location or time boundaries for the data, that has been captured and is available for use under a specific licence. A Data Set may be included in a Data Catalog, and can reference multiple Distributions that record different parts or formats of the data that are available to download.

A a dataset in DCAT is defined as a "collection of data, published or curated by a single agent, and available for access or download in one or more formats". A dataset does not have to be available as a downloadable file. For example, a dataset that is available via an API can be defined as an instance of dcat:Dataset and the API can be defined as an instance of dcat:Distribution. DCAT itself does not define properties specific to APIs description. These are considered out of the scope of this version of the vocabulary. Nevertheless, this can be defined as a profile of the DCAT vocabulary.

Fields available on this metadata type

Field ISO definition and Registry Help (where available)
Name The primary name used for human identification purposes.

The title is a unique, clear and descriptive name for the data asset. People searching for data should gain a basic understanding of the business use/intent of the data asset from the title. If a new data asset record is required due to an error in the data, the title should note that it is a revised version.

Free text (max. 200 char) e.g. Team productivity modelling based on APS Employee Census results for 2000-2022

Definition Representation of a concept by a descriptive statement which serves to differentiate it from related concepts. (3.2.39)

Easy to read information about the data asset to enable users to find and evaluate the data asset for their needs. The Description attribute is typically several sentences long and is used to search for the data asset so keywords should be carefully considered. Agencies are encouraged to include field names (e.g. gender/sex, age, address) collected within the data asset. This will help answer any specific research, policy or program questions a user may have, and help manage requests for additional information about the data asset received by your agency.

This attribute is supplemented by the Title, Keyword and Purpose attributes in the ONDC data catalogue.

Free text (max. 500 char) e.g. This model provides the breakdown of team productivity across the APS by the team job functions, providing management with richer insights into employee perceptions on a range of key indicators. These Census indicators include staff engagement, leadership, communication and change management, workplace conditions, health and wellbeing among others.

Target Stewardship Organisation Stewardship Organisation the user selected when they create the item, which is where they wanted to item to end up in.
Is Federated
Is Not Federable
Version Unique version identifier of this metadata item.
References Significant documents that contributed to the development of the metadata item which were not the direct source for the metadata content.
Origin The source (e.g. document, project, discipline or model) for the item (8.1.2.2.3.5)
Comments Descriptive comments about the metadata item (8.1.2.2.3.4)
Deleted The date after which the item has been soft deleted and is no longer visible in the registry
License Information about the license document under which the dataset is made available.

A legal document under which the data asset can be distributed or is made available.
This information may be sourced through the agency’s legal department. If information is not yet available, fill in “To be determined” and update the licence once it becomes available.

Creative Commons Attribution 3.0 Australia.

e.g. Creative Commons Attribution 4.0 International Licence

GNU Free Documentation License

1.3 with no cover tests and no variant sections

MIT license (MIT)

Other (Copyright/Closed)

The BSD License

Rights Information about rights held in and over the dataset.

Specifies access (or restrictions) to the data asset. Access is based on the agency’s privacy, security, or other policy approaches that apply to the data asset. Access can be: • Open - Data that is publicly accessible online (account registration may be required). • Conditional - Data that is publicly accessible subject to condition(s) that the user must meet to access the data. For example: a fee-for-service model applies to access the data; the user must have a .gov.au email to create an account and access the data; or the data is only accessible at a specific physical location. • Restricted - Data access is limited for reasons such as legal, privacy and sensitivity. For example: during an embargo period; security classification is PROTECTED and above; access can only be provided under the DATA Scheme. This attribute is supplemented by the Security Classification and Sensitive Data attributes.

Choose term from: Open, Conditional, Restricted

Release Date Date of formal publication of the dataset.

The date which the underlying data asset was made available for use, consumption, or analysis. This date should constantly change whenever the underlying data is updated. This is not to be confused with Date Modified which is the recorded date of metadata in an agency’s data inventory.

Date/Time in format: AS/NZS ISO 8601.1:2021

e.g. 1973-09

1973-09-17

1973-09-17T23:20:30+04:00

Modification Date Most recent date on which the dataset was changed, updated or modified.

The most recent date the data asset record was either created, changed, updated or modified. This date refers to the date in which the metadata of the data asset changes or is first recorded in the data inventory, not a date pertaining to the underlying data asset itself. This attribute is critical for agencies in managing their data assets and supplemented by the ONDC Publish Date attribute.

Date/Time in format: AS/NZS ISO 8601.1:2021
e.g. 2023-09
2023-09-17
2023-09-17T23:20:30+04:00

Frequency The frequency at which dataset is published.

The frequency at which new, revised, or updated versions of the underlying data are made available. For data assets that are regularly released, one data asset record will represent a series of underlying data. Agencies will determine when a new record is required for a data asset, based on changes in methodology, collection and related policies or to correct typographical errors in the underlying data. Further information can be found in Dublin Core™ Collection Description Frequency Vocabulary.

Choose term from:
Triennial
Biennial
Annual
Semiannual
Three times a year
Quarterly
Bimonthly
Monthly
Semimonthly
Biweekly
Three times a month
Weekly
Semiweekly
Daily
Continuous
Irregular 

Spatial Coverage Spatial or geographic coverage of the dataset.

The geographic area or location that the data asset covers. Location represents the geographic area of the entire data asset (e.g. “Australia”). It is not intended to represent specific location values contained within the data asset itself. For example, if the data asset is about installation of solar panels, the location would describe the entire coverage of the data asset such as “Australia” or “Australian Capital Territory” or a mesh block from the Australian Statistical Geography Standard. Location values contained in the data asset such as specific suburbs or regions can be captured within the Keyword, Description or Purpose attributes.

Choose term from: Australia New South Wales Victoria Queensland South Australia Western Australia Tasmania Northern Territory Australian Capital Territory Other Territories* International *Other territories include Jervis Bay Territory, Territory of Christmas Island, Territory of the Cocos (Keeling) Islands and Norfolk Island

OR

Provide at least one area from: Australian Statistical Geography Standard (ASGS) Edition 3, July 2021 - June 2026

Temporal Coverage The temporal or time period that the dataset covers.

Combination of the two ONDC fields Temporal coverage from/to:

The start period for the underlying data. Temporal coverage refers to the time period that the data asset covers. This field is related to the attribute Temporal coverage to. 

The end period for the underlying data. Temporal coverage refers to the time period that the data asset covers. The data asset may not have an end date if it is being continually added to, in which case, a value is not required. This field is related to the attribute Temporal coverage from.

Catalog An entity responsible for making the dataset available.
Landing Page A Web page that can be navigated to in a Web browser to gain access to the dataset, its distributions and/or additional information

The Uniform Resource Locator (URL) that links to the data asset. If the Access Rights of the data asset is “open”, this could be a publicly accessible permanent URL that provides direct access to the data asset. If the Access Rights of the data asset is “conditional” or “restricted”, some agencies may choose to have one URL to a website explaining how to access the data asset for external users (including through Dataplace) and a second URL to an internal system location for internal users.

Contact Point Relevant contact information for the Dataset.

An email address or a contact web form for users to request additional information related to the data asset. A group email address or contact web form is preferred because it is generic and enduring compared to an individual’s contact. This minimises the need to regularly update this attribute. Some agencies may choose to have a different point of contact for the Australian Government Data Catalogue and for internal purposes.

Group Email (or URL to contact web form) for the point of contact e.g. data.discovery@finance.gov.au

Conforming Specification An established standard to which the described resource conforms.
Item Base

Custom Fields

Field Short definition Long definition
Security Classification Choose term from: UNOFFICIAL OFFICIAL OFFICIAL: Sensitive PROTECTED SECRET TOP SECRET

The security classification applied to the data asset is specified by the Australian Government Protective Security Policy Framework - Policy 8.

The originator of the data asset is responsible for applying the relevant Security Classification. This attribute is supplemented by the Sensitive Data and Access Rights attributes.

Data Custodian Free text selected from the following: • For Government department and agency • For non-government organisation • For research organisation e.g. Department of Finance

The data custodian(s) is the agency that is responsible for the data asset and has the authority for sharing and disclosure. The data custodian can differ from the publisher (see Publisher attribute).

An agency may also be a data custodian under the Data Availability and Transparency Act 2022 if:

(a) “[they are] a Commonwealth body; and

(b) [are] not an excluded entity; and

(c) either:

      (i) controls public sector data (whether alone or jointly with another entity),             including by having the right to deal with that data; or

      (ii) has become the data custodian of output of a project in accordance                   with  section 20F.”

The data custodian value must be consistent with the Government Directory, Non-Government Organisation (NGO) List or Research Organisations Register. This field is related to Publisher attribute.

Keyword Free text Australian Governments’ Interactive Functions Thesaurus (AGIFT) high-level terms are: Business Support and Regulation Civic Infrastructure Communications Community Services Cultural Affairs Defence Education and Training Employment Environment Finance Management Governance Health Care Immigration Indigenous Affairs International Relations Justice Administration Maritime Services Natural Resources Primary Industries Science Security Sport and Recreation Statistical Services Tourism Trade Transport Data assets with information on people should include (if applicable): Gender Sex Disability First Nations people Aboriginal and Torres Strait Islander e.g. A data asset containing APS Employee Census results can have the keywords: Governance, Public service, Gender, Disability, First Nations people, Aboriginal and Torres Strait Islander, APS, census

Word(s) or terms that describe the data asset subject matter.

It answers the question “what is in this data asset?” and supports the discovery of the data asset.

This is a critical component in helping users find your data asset. Careful consideration of keywords should be applied and use as many keywords as you can.

As an absolute minimum, agencies must include:

• At least one high-level term from the Australian Governments’ Interactive Functions Thesaurus (AGIFT) to enhance the user search experience in the Australian Government Data Catalogue, followed by detailed AGIFT terms that better describe the data asset.

• Terms that describe enduring Government priorities.
For example:
▪ Indigenous related data are tagged with the keywords ‘First Nations people’ and ‘Aboriginal and Torres Strait Islander’
▪ Disability related data are tagged with the keyword ‘Disability’ ▪ Data assets containing sex (information collected on sex characteristics observed at birth or infancy) or gender (information collected as a result of gender identify, expression and/or experience) are tagged with the relevant term. Further information can be found on ABS Standard for Sex, Gender, Variations of Sex Characteristics and Sexual Orientation Variables.

Other keywords to supplement the AGIFT terms can be selected from:
• Vocabularies used by your agency
• ANZSRC Field Of Research Code 2020
• Description of functions and sub-functions (Department of Finance).

Where multiple keywords apply, separate the terms with a comma.

Resource Type Choose a term from: collection dataset event image interactive resource model physical object party physical object place service software sound text

Specifies the type of data asset.

The most common types of data asset applicable are listed below with their definitions. Further information can be found in Dublin Core - List of Resource Types.

collection
an aggregation of items. The term collection means that the resource is described as a group; its parts may be separately described and navigated.

dataset
structured information encoded in lists, tables, databases, etc., which will normally be in a format available for direct machine processing. For example - spreadsheets, databases, GIS data, midi data. Note that unstructured numbers and words would be considered as text.

image
the content is primarily symbolic visual representation other than text. For example - images and photographs of physical objects, paintings, prints, drawings, other images and graphics, animations and moving pictures, film, diagrams, maps, musical notation. Note that image may include both electronic and physical representations.

interactive resource
a resource which requires interaction from the user to be understood, executed, or experienced. For example - forms on web pages, applets, multimedia learning objects, virtual reality.

model
an abstraction of the real thing, i.e. some generalisation and interpretation. Models could be considered a symbolic representation. Examples include performance models, cost models, mechanical models, etc.

This attribute could be supplemented by attribute Format.

Purpose Free text (max. 500 char) e.g. The APS Employee Census results for 2000-2022 enables richer insights to transform the APS and improve productivity.

A descriptive summary of the intentions which the data asset was developed and proposed to be used for.

This field supplements the attribute Description.

Sensitive Data Choose term from: N/A [e.g. open data] Legislative secrecy Personal privacy Legal privilege

The type of sensitivity of the data asset, where applicable.

If the Security Classification is “OFFICIAL: Sensitive” or above, the type of sensitivity should be provided.

For further guidance, refer to Australian Government Protective Security Policy Framework - Policy 8.

This attribute is supplemented by the Security Classification and Access Rights attributes.

Demographic
Related Entities Organisations or individuals involved in the creation, ownership or administration of the underlying data asset

This attribute has been added to help Catalogue users filter search results to the data sources in which they are interested. 

Legal Authority Free text (max. 200 char) For Legislation as the legal authority, select from https://www.legislation.gov.au/ e.g. Legal authority for APS Employee Census results is Public Service Act 1999.

All legal mandates under which the data asset was collected, created, received, used or disclosed.

Legal mandates could include Memorandum of Understanding; Legislation; Machinery of Government; Government policies or acts; etc.

Where multiple legal mandates exist, separate their URLs with a comma.

This information may be sourced through the agency’s legal department.

If information is not yet available, fill in “To be determined” and update the legal authority once it becomes available.

Disposal Free text: e.g. “Destroy 7 years after last entry” “Destroy 75 years after date of birth of employee” “Retain as national archives”

Information on the correct retention or disposal action of the data asset.

This is important because agencies are legally required to appropriately manage and dispose of their data.

Where multiple disposal actions exist within the data asset, provide the longest retention period.

For further guidance, refer to “18.3 Disposal Action” within Australian Government Recordkeeping Metadata Standard.

This information may be sourced through the agency’s legal department. If information is not yet available, fill in “To be determined” and update the disposal date as soon as it becomes known.

Data Status Choose term from: Under Development Completed

This refers to the status of the data asset registration within the data inventory, not the status of the underlying data asset itself.

• Under Development - Registration is in progress (i.e. metadata has not been cleared by the relevant decision-maker or not all core metadata attributes are populated in the data inventory)

Completed - Registration complete.

File size Free text e.g. 4MB 5GB TB PB 10 data tables in SQL

A measure of the digital storage needed by the data asset.

For digital assets, ideally you should provide a number and the unit. If the file size is constantly changing, then you can provide an indicative size or an indicative size range for your data asset. If size cannot be determined, fill in the number of data tables stored for the data asset.

Additionally, if the data asset is a data service or interactive resource, this field may not be relevant (fill in N/A)

This information may be sourced through the agency’s IT or data management departments.

Format Free text e.g. CSV DataCube GeographicData JPEG MP4 WebPage WebApplication

The distribution format of the data asset.

This information may be sourced through your agency’s IT or data management departments.

This attribute supplements the Resource Type attribute.

Language Free text selected from the following: Australian Standard Classification of Languages (ASCL) e.g. English

Refers to the language used within the data asset - e.g. "English”.

The default value may be set to “English”. Some agencies may have assets containing languages other than English, in which case the Australian Standard Classification of Languages (ASCL) can be used.

Publisher Free text selected from the following: • For Government department and agency • For non-government organisation • For research organisation e.g. Department of Finance

The agency that made the data asset formally available and may control any future version release. The publisher can differ to the data custodian (see Data Custodian attribute).

The publisher value must be consistent with the Government Directory, NonGovernment Organisation (NGO) List or Research Organisations Register.

This field is related to Data Custodian attribute.

Date Modified
Temporal coverage from
Temporal coverage to
Publish date
Identifier Free text (max. 200 char) e.g. FIN000077

The identifier is used to distinguish the data asset as unique and different to another data asset.

It is key to finding the data asset and ensuring the specific data asset can be referenced without confusion.

Ideally it is globally unique, such as a Digital Object Identifier (DOI). However, the identifier can start with an acronym relevant to your agency followed by letters, numbers or symbols.

Official Definition

A representation of a dataset in a catalog. Data Catalog Vocabulary (DCAT): 5.3 Class: Dataset