Concept help - Data Set
Table of Contents
A a dataset in DCAT is defined as a "collection of data, published or curated by a single agent, and available for access or download in one or more formats". A dataset does not have to be available as a downloadable file. For example, a dataset that is available via an API can be defined as an instance of dcat:Dataset and the API can be defined as an instance of dcat:Distribution. DCAT itself does not define properties specific to APIs description. These are considered out of the scope of this version of the vocabulary. Nevertheless, this can be defined as a profile of the DCAT vocabulary.
Fields available on this metadata type
| Field | ISO definition and Registry Help (where available) |
|---|---|
| Name |
The primary name used for human identification purposes.
The title is a unique, clear and descriptive name for the data asset. People searching for data should gain a basic understanding of the business use/intent of the data asset from the title. If a new data asset record is required due to an error in the data, the title should note that it is a revised version. |
| Definition |
Representation of a concept by a descriptive statement which serves to differentiate it from related concepts. (3.2.39)
Easy to read information about the data asset to enable users to find and evaluate the data asset for their needs. The Description attribute is typically several sentences long and is used to search for the data asset so keywords should be carefully considered. Agencies are encouraged to include field names (e.g. gender/sex, age, address) collected within the data asset. This will help answer any specific research, policy or program questions a user may have, and help manage requests for additional information about the data asset received by your agency. |
| Target Stewardship Organisation | Stewardship Organisation the user selected when they create the item, which is where they wanted to item to end up in. |
| Is Federated | |
| Is Not Federable | |
| Version | Unique version identifier of this metadata item. |
| References | Significant documents that contributed to the development of the metadata item which were not the direct source for the metadata content. |
| Origin | The source (e.g. document, project, discipline or model) for the item (8.1.2.2.3.5) |
| Comments | Descriptive comments about the metadata item (8.1.2.2.3.4) |
| Deleted | The date after which the item has been soft deleted and is no longer visible in the registry |
| License |
Information about the license document under which the dataset is made available.
A legal document under which the data asset can be distributed or is made available. Creative Commons Attribution 3.0 Australia. e.g. Creative Commons Attribution 4.0 International Licence GNU Free Documentation License 1.3 with no cover tests and no variant sections MIT license (MIT) Other (Copyright/Closed) The BSD License |
| Rights |
Information about rights held in and over the dataset.
Specifies access (or restrictions) to the data asset. Access is based on the agency’s privacy, security, or other policy approaches that apply to the data asset. Access can be: • Open - Data that is publicly accessible online (account registration may be required). • Conditional - Data that is publicly accessible subject to condition(s) that the user must meet to access the data. For example: a fee-for-service model applies to access the data; the user must have a .gov.au email to create an account and access the data; or the data is only accessible at a specific physical location. • Restricted - Data access is limited for reasons such as legal, privacy and sensitivity. For example: during an embargo period; security classification is PROTECTED and above; access can only be provided under the DATA Scheme. This attribute is supplemented by the Security Classification and Sensitive Data attributes. |
| Release Date |
Date of formal publication of the dataset.
The date which the underlying data asset was made available for use, consumption, or analysis. This date should constantly change whenever the underlying data is updated. This is not to be confused with Date Modified which is the recorded date of metadata in an agency’s data inventory. Date/Time in format: AS/NZS ISO 8601.1:2021 e.g. 1973-09 1973-09-17 1973-09-17T23:20:30+04:00 |
| Modification Date |
Most recent date on which the dataset was changed, updated or modified.
The most recent date the data asset record was either created, changed, updated or modified. This date refers to the date in which the metadata of the data asset changes or is first recorded in the data inventory, not a date pertaining to the underlying data asset itself. This attribute is critical for agencies in managing their data assets and supplemented by the ONDC Publish Date attribute. |
| Frequency |
The frequency at which dataset is published.
The frequency at which new, revised, or updated versions of the underlying data are made available. For data assets that are regularly released, one data asset record will represent a series of underlying data. Agencies will determine when a new record is required for a data asset, based on changes in methodology, collection and related policies or to correct typographical errors in the underlying data. Further information can be found in Dublin Core™ Collection Description Frequency Vocabulary. |
| Spatial Coverage |
Spatial or geographic coverage of the dataset.
The geographic area or location that the data asset covers. Location represents the geographic area of the entire data asset (e.g. “Australia”). It is not intended to represent specific location values contained within the data asset itself. For example, if the data asset is about installation of solar panels, the location would describe the entire coverage of the data asset such as “Australia” or “Australian Capital Territory” or a mesh block from the Australian Statistical Geography Standard. Location values contained in the data asset such as specific suburbs or regions can be captured within the Keyword, Description or Purpose attributes. OR Provide at least one area from: Australian Statistical Geography Standard (ASGS) Edition 3, July 2021 - June 2026 |
| Temporal Coverage |
The temporal or time period that the dataset covers.
Combination of the two ONDC fields Temporal coverage from/to: |
| Catalog | An entity responsible for making the dataset available. |
| Landing Page |
A Web page that can be navigated to in a Web browser to gain access to the dataset, its distributions and/or additional information
The Uniform Resource Locator (URL) that links to the data asset. If the Access Rights of the data asset is “open”, this could be a publicly accessible permanent URL that provides direct access to the data asset. If the Access Rights of the data asset is “conditional” or “restricted”, some agencies may choose to have one URL to a website explaining how to access the data asset for external users (including through Dataplace) and a second URL to an internal system location for internal users. |
| Contact Point |
Relevant contact information for the Dataset.
An email address or a contact web form for users to request additional information related to the data asset. A group email address or contact web form is preferred because it is generic and enduring compared to an individual’s contact. This minimises the need to regularly update this attribute. Some agencies may choose to have a different point of contact for the Australian Government Data Catalogue and for internal purposes. |
| Conforming Specification | An established standard to which the described resource conforms. |
| Item Base |
Custom Fields
| Field | Short definition | Long definition |
|---|---|---|
| Security Classification | Choose term from: UNOFFICIAL OFFICIAL OFFICIAL: Sensitive PROTECTED SECRET TOP SECRET |
The security classification applied to the data asset is specified by the Australian Government Protective Security Policy Framework - Policy 8. The originator of the data asset is responsible for applying the relevant Security Classification. This attribute is supplemented by the Sensitive Data and Access Rights attributes. |
| Data Custodian | Free text selected from the following: • For Government department and agency • For non-government organisation • For research organisation e.g. Department of Finance |
The data custodian(s) is the agency that is responsible for the data asset and has the authority for sharing and disclosure. The data custodian can differ from the publisher (see Publisher attribute). An agency may also be a data custodian under the Data Availability and Transparency Act 2022 if: (a) “[they are] a Commonwealth body; and (b) [are] not an excluded entity; and (c) either: (i) controls public sector data (whether alone or jointly with another entity), including by having the right to deal with that data; or (ii) has become the data custodian of output of a project in accordance with section 20F.” The data custodian value must be consistent with the Government Directory, Non-Government Organisation (NGO) List or Research Organisations Register. This field is related to Publisher attribute. |
| Keyword | Free text Australian Governments’ Interactive Functions Thesaurus (AGIFT) high-level terms are: Business Support and Regulation Civic Infrastructure Communications Community Services Cultural Affairs Defence Education and Training Employment Environment Finance Management Governance Health Care Immigration Indigenous Affairs International Relations Justice Administration Maritime Services Natural Resources Primary Industries Science Security Sport and Recreation Statistical Services Tourism Trade Transport Data assets with information on people should include (if applicable): Gender Sex Disability First Nations people Aboriginal and Torres Strait Islander e.g. A data asset containing APS Employee Census results can have the keywords: Governance, Public service, Gender, Disability, First Nations people, Aboriginal and Torres Strait Islander, APS, census |
Word(s) or terms that describe the data asset subject matter. It answers the question “what is in this data asset?” and supports the discovery of the data asset. This is a critical component in helping users find your data asset. Careful consideration of keywords should be applied and use as many keywords as you can. As an absolute minimum, agencies must include: • At least one high-level term from the Australian Governments’ Interactive Functions Thesaurus (AGIFT) to enhance the user search experience in the Australian Government Data Catalogue, followed by detailed AGIFT terms that better describe the data asset. • Terms that describe enduring Government priorities. Other keywords to supplement the AGIFT terms can be selected from: Where multiple keywords apply, separate the terms with a comma. |
| Resource Type | Choose a term from: collection dataset event image interactive resource model physical object party physical object place service software sound text |
Specifies the type of data asset. The most common types of data asset applicable are listed below with their definitions. Further information can be found in Dublin Core - List of Resource Types. collection dataset image interactive resource model This attribute could be supplemented by attribute Format. |
| Purpose | Free text (max. 500 char) e.g. The APS Employee Census results for 2000-2022 enables richer insights to transform the APS and improve productivity. |
A descriptive summary of the intentions which the data asset was developed and proposed to be used for. This field supplements the attribute Description. |
| Sensitive Data | Choose term from: N/A [e.g. open data] Legislative secrecy Personal privacy Legal privilege |
The type of sensitivity of the data asset, where applicable. If the Security Classification is “OFFICIAL: Sensitive” or above, the type of sensitivity should be provided. For further guidance, refer to Australian Government Protective Security Policy Framework - Policy 8. This attribute is supplemented by the Security Classification and Access Rights attributes. |
| Demographic | ||
| Related Entities | Organisations or individuals involved in the creation, ownership or administration of the underlying data asset |
This attribute has been added to help Catalogue users filter search results to the data sources in which they are interested. |
| Legal Authority | Free text (max. 200 char) For Legislation as the legal authority, select from https://www.legislation.gov.au/ e.g. Legal authority for APS Employee Census results is Public Service Act 1999. |
All legal mandates under which the data asset was collected, created, received, used or disclosed. Legal mandates could include Memorandum of Understanding; Legislation; Machinery of Government; Government policies or acts; etc. Where multiple legal mandates exist, separate their URLs with a comma. This information may be sourced through the agency’s legal department. If information is not yet available, fill in “To be determined” and update the legal authority once it becomes available. |
| Disposal | Free text: e.g. “Destroy 7 years after last entry” “Destroy 75 years after date of birth of employee” “Retain as national archives” |
Information on the correct retention or disposal action of the data asset. This is important because agencies are legally required to appropriately manage and dispose of their data. Where multiple disposal actions exist within the data asset, provide the longest retention period. For further guidance, refer to “18.3 Disposal Action” within Australian Government Recordkeeping Metadata Standard. This information may be sourced through the agency’s legal department. If information is not yet available, fill in “To be determined” and update the disposal date as soon as it becomes known. |
| Data Status | Choose term from: Under Development Completed |
This refers to the status of the data asset registration within the data inventory, not the status of the underlying data asset itself. • Under Development - Registration is in progress (i.e. metadata has not been cleared by the relevant decision-maker or not all core metadata attributes are populated in the data inventory) • Completed - Registration complete. |
| File size | Free text e.g. 4MB 5GB TB PB 10 data tables in SQL |
A measure of the digital storage needed by the data asset. For digital assets, ideally you should provide a number and the unit. If the file size is constantly changing, then you can provide an indicative size or an indicative size range for your data asset. If size cannot be determined, fill in the number of data tables stored for the data asset. Additionally, if the data asset is a data service or interactive resource, this field may not be relevant (fill in N/A) This information may be sourced through the agency’s IT or data management departments. |
| Format | Free text e.g. CSV DataCube GeographicData JPEG MP4 WebPage WebApplication |
The distribution format of the data asset. This information may be sourced through your agency’s IT or data management departments. This attribute supplements the Resource Type attribute. |
| Language | Free text selected from the following: Australian Standard Classification of Languages (ASCL) e.g. English |
Refers to the language used within the data asset - e.g. "English”. The default value may be set to “English”. Some agencies may have assets containing languages other than English, in which case the Australian Standard Classification of Languages (ASCL) can be used. |
| Publisher | Free text selected from the following: • For Government department and agency • For non-government organisation • For research organisation e.g. Department of Finance |
The agency that made the data asset formally available and may control any future version release. The publisher can differ to the data custodian (see Data Custodian attribute). The publisher value must be consistent with the Government Directory, NonGovernment Organisation (NGO) List or Research Organisations Register. This field is related to Data Custodian attribute. |
| Date Modified | ||
| Temporal coverage from | ||
| Temporal coverage to | ||
| Publish date | ||
| Identifier | Free text (max. 200 char) e.g. FIN000077 |
The identifier is used to distinguish the data asset as unique and different to another data asset. It is key to finding the data asset and ensuring the specific data asset can be referenced without confusion. Ideally it is globally unique, such as a Digital Object Identifier (DOI). However, the identifier can start with an acronym relevant to your agency followed by letters, numbers or symbols. |
Official Definition
A representation of a dataset in a catalog. Data Catalog Vocabulary (DCAT): 5.3 Class: Dataset
