EurostatAPI.jl
A Julia package for accessing and processing data from the Eurostat API.
Overview
EurostatAPI.jl provides a simple and robust interface to fetch data from the Eurostat API. It handles all the complexities of working with the Eurostat SDMX (Statistical Data and Metadata eXchange) API, including:
- Making HTTP requests to Eurostat API endpoints with automatic retries
- Parsing complex JSON responses from the SDMX format
- Correctly interpreting dimension references and metadata
- Converting raw API data to clean, structured DataFrames
- Handling special values and missing data appropriately
Eurostat is the statistical office of the European Union, providing high-quality statistics on Europe covering areas such as:
- Economy and finance
- Population and social conditions
- Industry, trade and services
- Agriculture and fisheries
- Environment and energy
- Science, technology and digital society
This package provides programmatic access to any Eurostat dataset available through their unified SDMX API.
Features
- Simple Interface: Fetch any Eurostat dataset with just a dataset ID and year
- Advanced Filtering: Filter by indicators, geographic regions, and product codes to reduce data size
- Automatic Fallback: Handles "response too large" errors by automatically applying smart filters
- Chunked Fetching: Fetch very large datasets in manageable chunks
- Robust Error Handling: Automatic retries, timeout handling, and informative error messages
- Efficient Data Processing: Handles large datasets with progress logging and memory-efficient processing
- Clean Data Output: Automatic conversion to DataFrame format with proper data types
- Special Value Handling: Proper interpretation of confidential (
:C
), not available (:
) and not applicable (-
) values - Flexible Time Periods: Support for any time period supported by the underlying dataset
- Metadata Support: Query dataset dimensions and available codes before fetching
- Comprehensive Logging: Detailed processing information for transparency and debugging
Installation
using Pkg
Pkg.add("EurostatAPI")
Or from the NILU Julia registry:
using Pkg
Pkg.Registry.add(url="https://git.nilu.no/julia/registry")
Pkg.add("EurostatAPI")
Quick Start
using EurostatAPI
using DataFrames
# Fetch a dataset for a specific year
# Example: European GDP data
df = fetch_dataset("nama_10_gdp", 2022)
# Fetch with filters - only specific countries
df_filtered = fetch_dataset("nama_10_gdp", 2022; geo=["EU27_2020", "DE", "FR"])
# Automatic handling of large datasets
df_auto = fetch_with_fallback("nama_10_gdp", 2022) # Adds filters if response too large
# Fetch large dataset in chunks
df_chunked = fetch_dataset_chunked("nama_10_gdp", 2022; chunk_by=:geo, chunk_size=10)
# Display the first few rows
first(df, 5)
# Check the structure of the data
describe(df)
# Basic analysis examples
# Count records by country (if geo dimension exists)
if :geo in names(df)
country_counts = combine(groupby(df, :geo), nrow => :count)
sort!(country_counts, :count, rev=true)
println("Top 5 countries by record count:")
println(first(country_counts, 5))
end
# Find non-missing values
non_missing_data = filter(row -> !ismissing(row.value), df)
println("Records with actual values: $(nrow(non_missing_data))")
Finding Dataset IDs
To use EurostatAPI.jl, you need to know the Eurostat dataset ID. You can find these:
- Browse the Eurostat Data Explorer
- Look at the URL or dataset information - the ID is typically shown
- Check the Eurostat API documentation
Common dataset patterns:
nama_*
- National accountsdemo_*
- Demography and migrationenv_*
- Environmentnrg_*
- Energyt2020_*
- Europe 2020 indicators
Understanding the Data
Eurostat datasets are multi-dimensional, typically organized along dimensions such as:
- Geographic units (
geo
): Countries, regions, etc. - Time periods (
time
): Years, quarters, months - Statistical indicators (
indic_*
): What is being measured - Economic sectors (
nace_*
): Industry classifications - Demographics (
age
,sex
): Population breakdowns
Special Values
Eurostat data uses special codes for missing or restricted data:
:C
or:c
- Confidential data:
- Not available-
- Not applicable0
- Zero or rounded to zero
These special values are converted to missing
in the DataFrame, but the original codes are preserved in the original_value
column when present.
Data Structure
The DataFrame returned by fetch_dataset
contains:
Column | Description |
---|---|
dataset | The Eurostat dataset ID |
year | The year requested |
value | The actual data value (numeric or missing) |
original_value | Original string value for special codes |
fetch_date | When the data was retrieved |
original_key | Internal API key reference |
Various dimensions | Depends on dataset (geo, time, indicators, etc.) |
Advanced Usage
Filtering Large Datasets
The enhanced fetch_dataset
function now supports filtering to reduce response size:
# Filter by geographic regions
df_germany = fetch_dataset("nama_10_gdp", 2022; geo=["DE"])
# Filter by indicators for economic data
df_indicators = fetch_dataset("nama_10_gdp", 2022;
geo=["DE", "FR"])
# Combine multiple filters
df_filtered = fetch_dataset("nama_10_gdp", 2022;
geo=["EU27_2020", "DE", "FR"])
Automatic Fallback
Use fetch_with_fallback
to automatically handle "response too large" errors:
# Automatically adds filters if the response is too large
df = fetch_with_fallback("nama_10_gdp", 2022)
Chunked Fetching
For extremely large datasets, fetch data in chunks:
# Get metadata first
metadata = get_dataset_metadata("nama_10_gdp")
println("Total geographic codes: $(length(metadata.geo_codes))")
# Fetch in chunks
df_chunked = fetch_dataset_chunked("nama_10_gdp", 2022;
chunk_by=:geo,
chunk_size=10)
Error Handling
using EurostatAPI
try
df = fetch_dataset("nama_10_gdp", 2023)
println("Successfully fetched $(nrow(df)) records")
catch e
if isa(e, HTTP.ExceptionRequest.StatusError)
println("HTTP error: ", e.status)
if e.status == 404
println("Dataset not found - check the dataset ID")
elseif e.status == 400
println("Bad request - check the year parameter")
end
elseif isa(e, HTTP.TimeoutError)
println("Request timed out - try again or check your connection")
else
println("Unexpected error: ", e)
end
end
Working with Time Series
# Get data for multiple recent years
years = [2020, 2021, 2022, 2023]
all_data = DataFrame()
for year in years
try
yearly_data = fetch_dataset("nama_10_gdp", year)
append!(all_data, yearly_data)
println("Added data for $year: $(nrow(yearly_data)) records")
catch e
println("Failed to get data for $year: $e")
end
end
println("Total records collected: $(nrow(all_data))")
Memory Management
For very large datasets:
# Monitor memory usage
using Base: gc
println("Memory before: $(Base.gc_bytes() / 1024^2) MB")
df = fetch_dataset("large_dataset_id", 2022)
gc() # Force garbage collection
println("Memory after: $(Base.gc_bytes() / 1024^2) MB")
println("Dataset size: $(nrow(df)) rows")
Available Years
Most Eurostat datasets provide historical data, but availability varies:
# Check what years might be available (returns conservative estimate)
available_years = get_dataset_years("nama_10_gdp")
println("Potentially available years: $(first(available_years, 5))...$(last(available_years, 5))")
Note: get_dataset_years
provides a conservative estimate. The actual available years depend on the specific dataset and are determined by Eurostat's data release schedule.
Performance Considerations
- Large datasets: Some Eurostat datasets contain millions of observations
- Network timeouts: The package includes automatic retries with 120-second timeouts
- Memory usage: Large datasets may require substantial RAM
- API limits: Eurostat may have rate limiting (though not typically restrictive)
Contributing
This package is part of the CirQuant project. Contributions are welcome via the project's GitLab repository.
License
Licensed under the MIT License. See the LICENSE file for details.
Citation
If you use EurostatAPI.jl in your research, please cite:
Boero, R. (2025). EurostatAPI.jl: A Julia package for accessing Eurostat data.
And acknowledge the data source:
Eurostat. European Statistics. https://ec.europa.eu/eurostat
Documentation
Contents: