# Preparing and publishing data

## Ways to publish data on HDX <a href="#docs-internal-guid-627def8e-7fff-c624-c499-2c416b79ca1a" id="docs-internal-guid-627def8e-7fff-c624-c499-2c416b79ca1a"></a>

Organizations can choose between three ways to publish data on HDX. These dictate who can access the data and how it is shared. Organizations should choose the appropriate setting based on the sensitivity of the data, the intended audience, and internal policies or restrictions. All data on HDX must be published with comprehensive metadata (information about the data).&#x20;

#### Open data publishing

**Public:** Public datasets are visible to and can be downloaded by anyone visiting HDX. This option is the most common way to publish data on HDX.

Opt for ‘public’ when:&#x20;

* A resource contains non-sensitive data
* You want your data to be visible and available for download to anyone visiting HDX

#### Controlled access publishing

**HDX Connect / By Request:** HDX Connect is a feature that allows organizations to publish only the metadata for a resource. By using HDX Connect, organizations can signal that data exists, while ensuring access is adequately controlled in line with the data’s sensitivity.&#x20;

Registered HDX users can request access to the resource through a “Request Access” button. The contributing organization can then decide whether to share the data that has been requested. The data is not hosted on HDX servers, and therefore, if access is approved, the contributing organization will share the data directly with the requester. The Centre recommends using secure (encrypted) channels for data sharing between contributing organizations and requesters.

Opt for HDX connect / ‘by request’ when: &#x20;

* A resource contains sensitive data
* You want to approve access requests individually

Example: [IOM Cameroon multi-sector needs assessment results](https://data.humdata.org/dataset/2025-cameroon-multi-sector-needs-assessment-msna-dataset) (September 2025). Registered users can request access to the underlying data by clicking the ‘Request Access’ button. They will be asked to share information about who they are and how the data will be used in order to inform the decision by the contributing organization on whether to share the requested data.&#x20;

**Private:** Private datasets are only visible by members of your HDX organization, when they are logged in. They are not included in HDX search results or listed under datasets. Private datasets may not contain personal data or other data that is classified as ‘strictly confidential’ in an Information Sharing Protocol.

Opt for ‘private’ when:

* You are sharing data solely for internal collaboration&#x20;
* Datasets are not approved for public release

#### Comparing data publication options

When selecting how to share data on HDX, it is important to consider the data sensitivity and potential risks to affected people. You can also change how the data is shared at any time.&#x20;

<table data-header-hidden data-full-width="false"><thead><tr><th></th><th width="186"></th><th></th><th></th></tr></thead><tbody><tr><td><br></td><td><strong>Open data</strong></td><td><strong>Controlled access</strong></td><td></td></tr><tr><td><strong>Sharing option</strong></td><td><strong>Public</strong></td><td><strong>HDX Connect (By request)</strong></td><td><strong>Private</strong></td></tr><tr><td>Who can see the dataset? </td><td>Data and metadata visible and accessible to anyone visiting HDX</td><td>Metadata visible to anyone visiting HDX</td><td>Metadata and data visible only to members of the contributing organization account on HDX</td></tr><tr><td>Is data downloadable from HDX?</td><td>Yes</td><td>No. The data stays on the  organization’s infrastructure and is shared directly with approved users</td><td>Yes, only for logged-in members of the organization.</td></tr><tr><td>What is the data sensitivity level?</td><td>Low / No - data is unlikely to cause any harm to affected people or humanitarian organizations</td><td>High / Severe - data is likely to cause significant harm or negative impacts to affected people, humanitarian organizations or a response</td><td>Moderate - data is likely to cause minor harm or negative impacts to affected people, humanitarian organizations or a response</td></tr><tr><td>Best for</td><td>Non-sensitive, open data</td><td>Sensitive or proprietary data</td><td>Internal collaboration or draft data</td></tr><tr><td>Examples</td><td>Road network data; population statistics</td><td>Locations of at-risk communities; detailed data on humanitarian access incidents </td><td>Early drafts; internal analysis</td></tr></tbody></table>

## Preparing data to publish

Before uploading data to HDX, take time to prepare your data and metadata. Well-prepared data is easier to find, understand, and use. Two questions to consider:

#### Is the data safe to publish?

When uploading data to HDX, consider whether the data is sensitive, i.e. whether exposure of the data could lead to harm or negative impacts for affected people or impede the work of humanitarian organizations.

The [HDX Terms of Service](https://data.humdata.org/faqs/terms) informs support users how to share data responsibly. Review section 7 under ‘Data Scope and Criteria’ for instructions on responsible data sharing via HDX.

**How HDX defines data sensitivity**

* **Low or No sensitivity:** exposure of the data is unlikely to cause any harm to affected people or humanitarian organizations. Data at this sensitivity level is suitable for sharing publicly on HDX. Examples include data on road networks, geospatial information on (undisputed) boundaries, and country-level population statistics.&#x20;
* **Moderate sensitivity:** exposure of the data is likely to cause minor harm or negative impacts and/or be disadvantageous to affected people or humanitarian organizations. Data at this sensitivity level may only be shared privately or through HDX Connect. Examples include preliminary results of assessments that have not been cleared for publication, as well as operational presence mapping data at the district-level for some response contexts.
* **High sensitivity:** exposure of the data is likely to cause serious harm or negative impacts to affected people or humanitarian organizations, and /or damage a humanitarian response. Data at this sensitivity level may only be shared through HDX Connect. Examples include data on the movement of humanitarian convoys, and community-level survey results covering sensitive subjects such as Gender-Based Violence.
* **Severe sensitivity:** exposure of the data is likely to cause severe harm or negative impacts to affected people or humanitarian organizations, and /or damage a humanitarian response. Data at this sensitivity level may only be shared through HDX Connect and following an approved request should only be shared with strict data security measures in place, as well as a data sharing agreement if required. Examples include personal data of beneficiaries, locations of at-risk communities, and detailed accounts of access incidents

**How to assess and manage risk**

The sensitivity of datasets varies depending on the crisis context. For example, the exact location of a health facility may be safe to share in a natural disaster, while sharing similar data in certain conflict zones can lead to significant harm.

To assess and manage risks:&#x20;

* Review the Information Sharing Protocol (ISP) for the response context to understand the sensitivity level of your data. An overview of endorsed ISPs is available [here](https://centre.humdata.org/data-responsibility/). If no ISP is available for your response context and you are not sure about the sensitivity level, reach out to the HDX team via <hdx@un.org> for support.
* For microdata, apply Statistical Disclosure Control (SDC). This reduces re-identification risk. HDX uses the open-source tool sdcMicro. The Centre for Humanitarian Data’s [Guidance Note on SDC](https://centre.humdata.org/guidance-note-statistical-disclosure-control/?_gl=1*j95yp3*_ga*OTU2MzMzNjgxLjE2NzQ2OTkwMzc.*_ga_E60ZNX2F68*czE3NjMxMDEyNTQkbzQ1OCRnMSR0MTc2MzEwMTMxMCRqNCRsMCRoMA..) outlines the process, tools, and best practices for managing microdata responsibly.

#### Check the quality of the data before publishing

Key dataset preparation tips to consider:

* **File naming:** Use descriptive filenames (e.g. population\_by\_admin2\_2024\_Q1.csv) to help users understand what each file contains. Write a clear dataset title and description.
* **File formats:** Use open formats like CSV, XLSX, Geotiffs, GeoJSON or shapefiles where possible and try to avoid proprietary formats.
* **Structure:** Use clear, consistent column headers. Do not mix data types in a single column. Avoid merged cells or heavy formatting.
* **Clarity:** Ensure clear caveats and limitations are explained and documented. Provide a link to the methodology, codebook, data dictionary or any existing documentation that can help the user make the best use of your data.
* **P-codes:** Where appropriate, use [p-codes](https://knowledge.base.unocha.org/wiki/spaces/imtoolbox/pages/222265609/P-codes?_gl=1*1omi4so*_ga*OTU2MzMzNjgxLjE2NzQ2OTkwMzc.*_ga_E60ZNX2F68*czE3NjMxMDEyNTQkbzQ1OCRnMSR0MTc2MzEwMTQwOCRqNTgkbDAkaDA.) to ensure interoperability. You can access the global p-code list here: [Global P-Codes Dataset](https://data.humdata.org/dataset/global-pcodes).

#### Check the quality of the metadata before publishing

Prepare metadata carefully. Accurate and complete metadata helps users find, understand and trust your data.

Here are the most important metadata fields to complete:

* **Description:** Provide specific details about the dataset, including its purpose, context, and potential use.&#x20;
* **Source of the data:** Clearly name the entity that created or collected the data.
* **Time period covered:** Add the start and end dates that apply to all files in the dataset.
* **Expected Update frequency:** Indicate how often the data is expected to be updated (e.g. daily, monthly, quarterly, never). The dataset will show a green leaf mark when it is up to date based on the update frequency provided and on the time period end date. Data maintainers will receive reminders to update based on this frequency.
* **License:** Confirm you have the right to share the data and select an appropriate open data license.
* **Methodology:** Briefly explain how the data was  collected or produced. Link to related documentation if available.
* **Tags and locations:** Add tags and geographic locations..

## Getting data onto HDX

There are two ways to publish data on HDX:

* Manually
* Programmatically&#x20;

#### Publishing data on HDX manually (how to use the share data form)

To publish manually, you will need to upload the data file directly to the HDX platform through the share data form, following the steps outlined below:  &#x20;

Please note, to add data, you must be logged in and belong to an HDX organization with publishing permissions. To start, click on ‘Add Data’ from anywhere on HDX.

{% stepper %}
{% step %}
**Choose how you want to share your data**

<div align="left" data-full-width="false" data-with-frame="true"><figure><img src="/files/1NvQ53HdpijPHv6BKPS8" alt=""><figcaption><p>Choose how to share</p></figcaption></figure></div>

You have three options:

**Public:** Data is published openly on the platform.

**HDX Connect:** Only the metadata will be available on HDX and no files will be uploaded to the platform. If selected you only need to complete steps 1, 2, 3 and 7.

**Private:** Data is only visible to members of your organization.
{% endstep %}

{% step %}
**Describe the dataset**

Provide a clear and concise Title and Description for your dataset.

<div align="left" data-full-width="false" data-with-frame="true"><figure><img src="/files/v2dSLqNA8Yyvs5onGtKv" alt=""><figcaption><p>Describe dataset</p></figcaption></figure></div>
{% endstep %}

{% step %}
**Include additional information**

Provide additional information to help users of HDX understand the origins and limitations of your data.

<div data-with-frame="true"><figure><img src="/files/V6WzJORcCtmd6QbPBwKH" alt=""><figcaption><p>Add additional information</p></figcaption></figure></div>

**Source:** The source of the data is the entity that collected or created it, rather than the organization that published the data. Use acronyms rather than full names of the source where possible. In cases where there are multiple sources, list all the sources separated by commas. Enter ‘Multiple sources’ in this field if the list of sources is long, over five sources.

**Organization:** The organization sharing this dataset on HDX.

**Maintainer:** The data maintainer is the point person for a specific dataset. They will receive any messages from users who have questions or comments about that dataset.&#x20;

**Time Period:** The time period covered by the dataset or the data collection period. This time period should cover all files in the dataset, from the earliest start date to the latest end date.&#x20;

**Update Frequency:** Choose how often this dataset is expected to be updated on HDX.

<div data-with-frame="true"><figure><img src="/files/FxSJz7qQkkzit82sIhQ5" alt=""><figcaption><p>Location and Licence</p></figcaption></figure></div>

**Location:** Detail the location(s) or countries  that this dataset covers.

**License:** Choose the license under which the data can be used. In general, we recommend the CC BY-IGO or CC-BY licenses for data on HDX, as they allow others to use data freely as long as they credit your organization.

<div data-with-frame="true"><figure><img src="/files/0SAuhEinasxDC5rIul1L" alt=""><figcaption><p>Methodology, Caveats and Tags</p></figcaption></figure></div>

**Methodology:** Select how this data was collected or generated.

**Caveats / Comments:** Note any limitations or special considerations that relate to this data.

**Tags:** Add tags to help others find your dataset. Tags are auto-generated from an [approved list](https://docs.google.com/spreadsheets/d/1fTO8T8ZVXU9eoh3EIrw490Z2pX7E59MhHmCvT_cXmNs/edit?gid=1261258630#gid=1261258630).
{% endstep %}

{% step %}
**Click the ‘Add more files’ button**

<div data-with-frame="true"><figure><img src="/files/d3Sxf5BoluQSFagcsBgS" alt=""><figcaption><p>Upload files</p></figcaption></figure></div>

The files you add here will be published in a list, in the order you arrange them. You can upload multiple files by clicking the ‘Add More Files’ button. Order the files to  ensure that the latest resource is the first file. We accept all machine readable data formats but only offer preview features for: CSV, TXT, XLS, JSON, zipped shapefiles, KML, and GeoJSON.\
\
There are a number of ways to upload your file (s):

**Upload file (default):** You can drag and drop files from your computer into HDX. Select your files and drop them into the light grey upload area. A new dataset form will open with some fields pre-filled based on your files. You can also click ‘Browse’ and locate the files you want to upload.

**Import from URL:** HDX can host your data directly, but it works equally well with externally hosted data. If your organization already has a platform, API, or repository for downloads, you can include a link to that file as a resource in your HDX dataset. The HDX version will automatically point to the live version, keeping everything up to date without re-uploading files.

<div data-with-frame="true"><figure><img src="/files/kN2dYa7IL8XCAyUOa8j3" alt=""><figcaption><p>Upload files - Import from URL</p></figcaption></figure></div>

1. Add URL to Remote file location. (mandatory)
2. Input File name and File format (mandatory)
3. Add notes about this file (optional)
4. Check ‘contains personal data’ if the data includes information that could identify people.
5. Check ‘contains Microdata’ if the data includes information such as detailed survey responses or disaggregated needs assessment data.

**Dropbox:** HDX can link to and preview files stored in Dropbox, including CSV or XLS files. To do this:

1. Log in to Dropbox via the web
2. Locate and select the desired file
3. Click “Share link” and copy the generated link

Dropbox links usually end in dl=0. To ensure the file is downloadable and previewable in HDX, change dl=0 to dl=1 before adding it to the dataset. For example: <https://www.dropbox.com/etc/etc/your_file_name.csv?dl=1>

The HDX resource will reflect updates made to the original file as long as the filename and path stay the same.

**Google Drive:** To share data from Google Drive, make sure your files are set to be publicly visible or accessible via link.&#x20;

How to make a file shareable in Google Drive:

1. Open Google Drive and locate the file you want to share.
2. Right-click the file and select ‘Share’.
3. In the pop-up window, under General access, click the dropdown menu.
4. Select ‘Anyone with the link’ and choose ‘Viewer’.
5. Click Copy link to get the shareable link.
6. Click Done.

In HDX, select the Google Drive option. A pop-up window will let you browse and select your files. These files aren’t copied to HDX, the Download button on HDX will link directly to the live version on Google Drive.

The HDX Resource Picker will only access your Google Drive file list during the file selection process. You can revoke this access anytime through the Google Drive App Manager, although this won’t affect files already published on HDX.
{% endstep %}

{% step %}
**Click the tick box to enable automatic data previews (Optional)**

<div data-with-frame="true"><figure><img src="/files/0nfqsG1RBa6wgYTxIMR1" alt=""><figcaption><p>Add automatic Data Previews and other data visualizations</p></figcaption></figure></div>

1. \[check box] Enable Data Preview.
2. Select the file to preview - the default selection is the first resource.
3. Add link to other data visualizations (Optional).\
   \
   If you have published visualizations of this data elsewhere, these can be embedded on the HDX dataset page, e.g., the URL (only URLs starting with ‘https’ are supported; embed code cannot be used) from a public PowerBI dashboard. Your visualization will appear under ‘Interactive Data’ above the files you have shared. Go back to edit if you would like to add more or remove the existing data visualization.<br>
4. Click the tick box to confirm the dataset does not contain personal data or otherwise violate the HDX Terms of Service.
   {% endstep %}

{% step %}
**Click submit dataset**
{% endstep %}
{% endstepper %}

## Publishing data by API&#x20;

If you manage frequently updated datasets, you can automate publishing and updating data through the HDX Python API. This library, built by HDX, offers a simple, Pythonic way to interact with the HDX endpoints. It makes it easier to push, pull, and manage datasets programmatically.

Why share data via HDX Python API?

* Automatically manages metadata structure for HDX datasets and resources
* Integrates easily into existing ETL or data pipeline workflows
* Simplifies code for dataset and resource creation and updates

See full documentation [here](https://hdx-python-api.readthedocs.io/en/latest/) in the HDX Python API Library. You can also find the HDX Github organization [here](https://github.com/ocha-dap?q=scraper\&type=all\&language=\&sort=) with a list of example scrapers that use the HDX Python API to get data onto HDX.

We are in the process of expanding our technical documentation to include step-by-step examples of how to use the HDX development tools, such as the Python API. These resources will help users better understand how to automate updates and integrate data pipelines with HDX.

## After data is uploaded

#### HDX Quality Assurance (QA) Process

When a new dataset is published on HDX, it goes through a quality assurance (QA) review to ensure it meets HDX standards for quality, completeness, and responsible data sharing.

Every resource is checked against standard QA criteria:

* Metadata completeness (title, description, source, dates, etc.)
* No  personally identifiable information
* File integrity (accessible, readable, appropriate format)

If it contains microdata, it will be placed Under Review while HDX assesses the disclosure risk before public release.

This process exists to ensure compliance with the [HDX Terms of Service](https://docs.humdata.org/about/hdx-terms-of-service), which prohibit the sharing of personal data.

What we do when data is Under Review

Datasets flagged as sensitive are placed Under Review and assessed for:

* Metadata completeness
* Relevance to humanitarian action
* File integrity
* Sensitivity (personal data, microdata, etc.)

Microdata is assessed in a secure environment and based on a disclosure risk threshold determined by the HDX team. The default global disclosure risk threshold is 3%, and the HDX team may adjust this threshold for specific datasets based on the content of the data and the response context. If the global disclosure risk is below the HDX threshold the dataset is approved for public sharing. If the risk is above the threshold, the HDX team will contact the data contributor. HDX works with the data contributor to reduce the risk by applying Statistical Disclosure Control or publishing via HDX Connect.

We can work with you to:

* Improve metadata or formatting
* Apply risk mitigation techniques to microdata
* Switch to HDX Connect if needed

#### **QA resources**

* [HDX QA Checklist](https://data.humdata.org/dataset/2048a947-5714-4220-905b-e662cbcd14c8/resource/658d5c4f-1680-4cb5-9fbf-10a0a64e2c39/download/hdx-qa-checklist.pdf) - An overview of the quality criteria reviewed by the HDX team during the QA process.&#x20;
* [HDX Statistical Disclosure Control Process](https://humanitarian.atlassian.net/wiki/spaces/HDXKB/pages/1381498881/Statistical+Disclosure+Control+Process+to+handle+individual+survey+data+on+HDX) - An overview of why and how the HDX team applies Statistical Disclosure Control to all microdata uploaded on the platform.&#x20;

<br>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.humdata.org/publish/publish-data/preparing-and-publishing-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
