Some of the best practices for contribution are as follows:
- Data should be stored in widely used file formats that are suitable for machine processing.
- Released dataset should clearly reflect “what is recorded about a particular subject”.
- Timely release of datasets is one of the important factors to maximize the utility of information people can obtain.
- Data should be provided in freely available formats which can be accessed without the need for a software license.
- Data elements should be in de-normalized form
The main activities of SODAAP , PMU would be to manage the Open Government Data Platform India, provide Technical Advice to the departments, handhold for dataset contribution as well as capacity building of Data Contributors and Chief Data Officers.
Different types of datasets generated both in geospatial and non-spatial form by Ministries/Departments shall be classified as shareable data and non-shareable data.
The derived statistics like national accounts statistics, indicators like price index, databases from census and surveys are the types of data produced by a statistical mechanism. However, the geospatial data consists primarily of satellite data, maps, etc. In such a mechanism, it becomes important to maintain standards in respect of metadata, data layout and data access policy.
Title (Required): A unique name of the Dataset viz. Current Population Survey <Year>, Consumer Price Index <Year>, Variety-wise Daily Market Prices Data, State-wise Construction of Deep Tubewells over the years, etc.
Description (Required): Provide a detailed description of the Dataset e.g., an abstract determining the nature and purpose of the catalog.
Keywords (Required): It is a list of terms, separated by commas, describing and indicating at the content of the dataset. Example: rainfall, weather, monthly statistics.
Group Name: This field allows agencies to provide a Group Name to closely related catalogs in order to show that they may be presented as a group or a set.
Sector & Sub-Sector (Required): Choose the sectors(s)/sub-sector(s) those most closely apply (ies) to your catalog.
Asset Jurisdiction (Required): This is a required field to identify the exact location or area to which the catalog and resources (dataset/apps) caters to viz. entire country, state/province, district, city, etc.
2 Resources (Datasets/Apps)
Category (Required): Choose from the drop down options. Is it a Dataset or an Application.
Title (Required): A unique name of the resource viz. Consumer Price Index for
Access Method (Required): This could be “Upload a Dataset” or “Single Click Link to Dataset”.
Reference URLs: This may include description to the study design, instrumentation, implementation, limitations, and appropriate use of the dataset or tool. In the case of multiple documents or URLs, please delimit with commas or enter in separate lines.
Access Type: It mentions the type of access viz. Open, Priced, Registered Access or Restricted Access (G2G).
Date Released: It mentions the release date of the Dataset/App.
Note: It mentions the anymore information the contributor/controller wishes to provide to the data consumer or about the resource.
NDSAP Policy Compliance: This field is to indicate if this dataset is in conformity with the National Data Sharing and Access Policy of the Govt. of India.
SODAAP recommends that datasets has to be published in an open format. It should be machine readable. Considering the current analysis of data formats prevalent in Government, it is proposed that data should be published in any of the following formats:
CSV (Comma separated values)
XLS (Spread sheet - Excel)
ODS/OTS (Open Document Formats for Spreadsheets)
XML (Extensive Markup Language)
RDF (Resources Description Framework)
KML (Keyhole Markup Language used for Maps)
GML (Geography Markup Language)
RSS/ATOM (Fast changing data e.g. hourly/daily)
Data about the data is called metadata. Information about the datasets being published through a standard structure comprising of controlled vocabularies on government sectors, dataset types, jurisdictions, access mode, etc. Apart from facilitating easy access to data, it is useful for federation & integration of data catalogs.
The main role & responsibilities of a Chief Data Officer are as follows:
- Lead the open data initiative of Department/Organization
- Nominate Data Contributors
- Create Data Contributors login id. using Chief Data Officer’s login account
- Take initiative to release as many datasets as possible on proactive basis.
- Identify the High Value Datasets and schedule their release on OGD Platform.
- Prepare the Negative List for the Department as per the directions in NDSAP.
- Ensures that the datasets being published, through a workflow process, are in compliance with NDSAP.
- Periodically monitor the release of datasets as per predefined schedule.
- Take relevant action on the feedback/suggestion received from the citizen for the datasets belonging to the Ministry/Department/Organization.
- Ensure the correctness of his contact details on the OGD Platform by sending a mail/ letter to ndsap [at] gov [dot] in, in case of any change.
- Take action on Suggestions on new datasets made by public through the OGD Platform.
In order to cater to the contribution of the datasets from offices/organization under the Ministries/ Departments, the Chief Data Officer can nominate a number of Data Contributors who would be responsible in contributing the datasets along with their metadata.
Data Contributor could be an officer of the Ministry/Department who would be responsible for his/her unit/division.
The responsibilities of the Data Contributor are as follows:
- Responsible for ensuring quality and correctness datasets of his/her unit/division.
- Preparing and contributing the metadata in the predefined format for the datasets.
As per SODAAP , every Department has to identify datasets by the following categories:
Negative List: The datasets, which are confidential in nature and would compromise to the county’s security if made public, are put into this list. The datasets which contain personal information are also included in this list.
Open List: This list comprises of datasets which don’t fall under negative list. These datasets shall be prioritized into high value datasets and non-high values datasets.
The principles on which data sharing and accessibility need to be based include: Openness, Flexibility, Transparency, Quality, Security and Machine-readable.
Login functionality is enabled using central auth account.