EurostatAPI.jl

A Julia package for accessing and processing data from the Eurostat API.

Overview

EurostatAPI.jl provides a simple and robust interface to fetch data from the Eurostat API. It handles all the complexities of working with the Eurostat SDMX (Statistical Data and Metadata eXchange) API, including:

Making HTTP requests to Eurostat API endpoints with automatic retries
Parsing complex JSON responses from the SDMX format
Correctly interpreting dimension references and metadata
Converting raw API data to clean, structured DataFrames
Handling special values and missing data appropriately

Eurostat is the statistical office of the European Union, providing high-quality statistics on Europe covering areas such as:

Economy and finance
Population and social conditions
Industry, trade and services
Agriculture and fisheries
Environment and energy
Science, technology and digital society

This package provides programmatic access to any Eurostat dataset available through their unified SDMX API.

Features

Simple Interface: Fetch any Eurostat dataset with just a dataset ID and year
Advanced Filtering: Filter by indicators, geographic regions, and product codes to reduce data size
Automatic Fallback: Handles "response too large" errors by automatically applying smart filters
Chunked Fetching: Fetch very large datasets in manageable chunks
Robust Error Handling: Automatic retries, timeout handling, and informative error messages
Efficient Data Processing: Handles large datasets with progress logging and memory-efficient processing
Clean Data Output: Automatic conversion to DataFrame format with proper data types
Special Value Handling: Proper interpretation of confidential (:C), not available (:) and not applicable (-) values
Flexible Time Periods: Support for any time period supported by the underlying dataset
Metadata Support: Query dataset dimensions and available codes before fetching
Comprehensive Logging: Detailed processing information for transparency and debugging

Installation

using Pkg
Pkg.add("EurostatAPI")

Or from the NILU Julia registry:

using Pkg
Pkg.Registry.add(url="https://git.nilu.no/julia/registry")
Pkg.add("EurostatAPI")

Quick Start

using EurostatAPI
using DataFrames

# Fetch a dataset for a specific year
# Example: European GDP data
df = fetch_dataset("nama_10_gdp", 2022)

# Fetch with filters - only specific countries
df_filtered = fetch_dataset("nama_10_gdp", 2022; geo=["EU27_2020", "DE", "FR"])

# Automatic handling of large datasets
df_auto = fetch_with_fallback("nama_10_gdp", 2022)  # Adds filters if response too large

# Fetch large dataset in chunks
df_chunked = fetch_dataset_chunked("nama_10_gdp", 2022; chunk_by=:geo, chunk_size=10)

# Display the first few rows
first(df, 5)

# Check the structure of the data
describe(df)

# Basic analysis examples
# Count records by country (if geo dimension exists)
if :geo in names(df)
    country_counts = combine(groupby(df, :geo), nrow => :count)
    sort!(country_counts, :count, rev=true)
    println("Top 5 countries by record count:")
    println(first(country_counts, 5))
end

# Find non-missing values
non_missing_data = filter(row -> !ismissing(row.value), df)
println("Records with actual values: $(nrow(non_missing_data))")

Finding Dataset IDs

To use EurostatAPI.jl, you need to know the Eurostat dataset ID. You can find these:

Browse the Eurostat Data Explorer
Look at the URL or dataset information - the ID is typically shown
Check the Eurostat API documentation

Common dataset patterns:

nama_* - National accounts
demo_* - Demography and migration
env_* - Environment
nrg_* - Energy
t2020_* - Europe 2020 indicators

Understanding the Data

Eurostat datasets are multi-dimensional, typically organized along dimensions such as:

Geographic units (geo): Countries, regions, etc.
Time periods (time): Years, quarters, months
Statistical indicators (indic_*): What is being measured
Economic sectors (nace_*): Industry classifications
Demographics (age, sex): Population breakdowns

Special Values

Eurostat data uses special codes for missing or restricted data:

:C or :c - Confidential data
: - Not available
- - Not applicable
0 - Zero or rounded to zero

These special values are converted to missing in the DataFrame, but the original codes are preserved in the original_value column when present.

Data Structure

The DataFrame returned by fetch_dataset contains:

Column	Description
`dataset`	The Eurostat dataset ID
`year`	The year requested
`value`	The actual data value (numeric or missing)
`original_value`	Original string value for special codes
`fetch_date`	When the data was retrieved
`original_key`	Internal API key reference
Various dimensions	Depends on dataset (geo, time, indicators, etc.)

Advanced Usage

Filtering Large Datasets

The enhanced fetch_dataset function now supports filtering to reduce response size:

# Filter by geographic regions
df_germany = fetch_dataset("nama_10_gdp", 2022; geo=["DE"])

# Filter by indicators for economic data
df_indicators = fetch_dataset("nama_10_gdp", 2022; 
                             geo=["DE", "FR"])

# Combine multiple filters
df_filtered = fetch_dataset("nama_10_gdp", 2022;
                           geo=["EU27_2020", "DE", "FR"])

Automatic Fallback

Use fetch_with_fallback to automatically handle "response too large" errors:

# Automatically adds filters if the response is too large
df = fetch_with_fallback("nama_10_gdp", 2022)

Chunked Fetching

For extremely large datasets, fetch data in chunks:

# Get metadata first
metadata = get_dataset_metadata("nama_10_gdp")
println("Total geographic codes: $(length(metadata.geo_codes))")

# Fetch in chunks
df_chunked = fetch_dataset_chunked("nama_10_gdp", 2022;
                                  chunk_by=:geo,
                                  chunk_size=10)

Error Handling

using EurostatAPI

try
    df = fetch_dataset("nama_10_gdp", 2023)
    println("Successfully fetched $(nrow(df)) records")
catch e
    if isa(e, HTTP.ExceptionRequest.StatusError)
        println("HTTP error: ", e.status)
        if e.status == 404
            println("Dataset not found - check the dataset ID")
        elseif e.status == 400
            println("Bad request - check the year parameter")
        end
    elseif isa(e, HTTP.TimeoutError)
        println("Request timed out - try again or check your connection")
    else
        println("Unexpected error: ", e)
    end
end

Working with Time Series

# Get data for multiple recent years
years = [2020, 2021, 2022, 2023]
all_data = DataFrame()

for year in years
    try
        yearly_data = fetch_dataset("nama_10_gdp", year)
        append!(all_data, yearly_data)
        println("Added data for $year: $(nrow(yearly_data)) records")
    catch e
        println("Failed to get data for $year: $e")
    end
end

println("Total records collected: $(nrow(all_data))")

Memory Management

For very large datasets:

# Monitor memory usage
using Base: gc

println("Memory before: $(Base.gc_bytes() / 1024^2) MB")
df = fetch_dataset("large_dataset_id", 2022)
gc()  # Force garbage collection
println("Memory after: $(Base.gc_bytes() / 1024^2) MB")
println("Dataset size: $(nrow(df)) rows")

Available Years

Most Eurostat datasets provide historical data, but availability varies:

# Check what years might be available (returns conservative estimate)
available_years = get_dataset_years("nama_10_gdp")
println("Potentially available years: $(first(available_years, 5))...$(last(available_years, 5))")

Note: get_dataset_years provides a conservative estimate. The actual available years depend on the specific dataset and are determined by Eurostat's data release schedule.

Performance Considerations

Large datasets: Some Eurostat datasets contain millions of observations
Network timeouts: The package includes automatic retries with 120-second timeouts
Memory usage: Large datasets may require substantial RAM
API limits: Eurostat may have rate limiting (though not typically restrictive)

Boero, R. (2025). EurostatAPI.jl: A Julia package for accessing Eurostat data.

And acknowledge the data source:

Eurostat. European Statistics. https://ec.europa.eu/eurostat

Documentation

Contents: