API Reference

This section provides detailed documentation for all functions available in EurostatAPI.jl.

Core Functions

Data Retrieval

EurostatAPI.fetch_dataset — Function

fetch_dataset(dataset::String, year::Int; indicators=String[], geo=String[], prodcode=String[])

Fetches Eurostat SDMX data from Eurostat API for the specified dataset and year.

Parameters:

dataset: The Eurostat dataset ID (e.g., "DS-056120", "DS-045409")
year: The year to fetch data for (e.g., 2023)
indicators: Optional vector of indicator codes to filter by
geo: Optional vector of geographic codes to filter by
prodcode: Optional vector of product codes to filter by

Returns a DataFrame containing the processed data.

source

EurostatAPI.fetch_with_fallback — Function

fetch_with_fallback(dataset::String, year::Int; kwargs...)

Fetches dataset with automatic retry using filters if response is too large (413 error).

source

EurostatAPI.fetch_dataset_chunked — Function

fetch_dataset_chunked(dataset::String, year::Int; chunk_by=:prodcode, chunk_size=100, kwargs...)

Fetches a large dataset in chunks to avoid size limitations.

Parameters:

dataset: The Eurostat dataset ID
year: The year to fetch data for
chunk_by: Dimension to chunk by (:prodcode or :geo)
chunk_size: Number of codes to include in each chunk
kwargs...: Additional filter parameters passed to fetch_dataset

Returns a combined DataFrame containing all chunks.

source

EurostatAPI.get_dataset_metadata — Function

get_dataset_metadata(dataset::String)

Fetches metadata for a dataset to get available dimensions. Returns a NamedTuple with available codes for each dimension.

source

The main function for retrieving data from Eurostat. This function handles all the complexity of:

Making HTTP requests with proper error handling and retries
Parsing the SDMX JSON response format
Converting multi-dimensional data to a flat DataFrame structure
Handling special values and missing data

Parameters

dataset::String: The Eurostat dataset identifier (e.g., "nama10gdp", "demo_pjan")
year::Int: The year for which to retrieve data (e.g., 2023, 2022)

Returns

A DataFrame containing the processed dataset with the following standard columns:

dataset: The dataset ID that was requested
year: The year that was requested
value: The actual data values (may contain missing for special codes)
original_value: For missing values, contains the original Eurostat code
fetch_date: Timestamp when the data was retrieved
original_key: Internal reference key from the API response
Additional columns for each dimension in the dataset (varies by dataset)

Example

using EurostatAPI
using DataFrames

# Fetch European GDP data for 2022
df = fetch_dataset("nama_10_gdp", 2022)

# Fetch with filters - only specific geographic regions
df_filtered = fetch_dataset("nama_10_gdp", 2022; geo=["EU27_2020", "DE", "FR"])

# Fetch PRODCOM data with specific indicators
df_prodcom = fetch_dataset("DS-056120", 2022; 
                          indicators=["PRODQNT", "QNTUNIT"],
                          prodcode=["10110000", "10120000"])

# Examine the structure
println("Dataset shape: $(size(df))")
println("Column names: $(names(df))")

# Look at first few rows
first(df, 3)

# Filter for a specific country (if geo dimension exists)
if :geo in names(df)
    germany_data = filter(row -> row.geo == "DE", df)
    println("Germany records: $(nrow(germany_data))")
end

# Find records with actual values (not missing)
actual_values = filter(row -> !ismissing(row.value), df)
println("Records with values: $(nrow(actual_values))")

Dataset Information

EurostatAPI.get_dataset_years — Function

get_dataset_years(dataset::String)

Returns typical available years for Eurostat datasets.

source

Returns a range of years that are typically available for Eurostat datasets. This provides a conservative estimate since actual data availability varies by dataset.

Parameters

dataset::String: The Eurostat dataset identifier

Returns

An array of integers representing years from 1995 to the current year.

Example

# Get the range of potentially available years
years = get_dataset_years("nama_10_gdp")
println("Year range: $(first(years)) to $(last(years))")

# Try to fetch data for recent years
recent_years = years[end-2:end]  # Last 3 years
for year in recent_years
    try
        df = fetch_dataset("nama_10_gdp", year)
        println("Year $year: $(nrow(df)) records")
    catch e
        println("Year $year: Not available ($e)")
    end
end

Internal Functions

The following functions are used internally by EurostatAPI.jl but are documented here for completeness and for users who may want to extend the package functionality.

EurostatAPI.process_eurostat_data — Function

process_eurostat_data(data, dataset, year)

Generic Eurostat SDMX JSON parser to DataFrame.

source

This function handles the core logic of converting Eurostat's SDMX JSON format into a structured DataFrame. It:

Extracts dimension information and value mappings
Converts linear indices to multi-dimensional coordinates
Handles special values and missing data codes
Creates a clean DataFrame with proper column types

Parameters

data: The parsed JSON response from the Eurostat API
dataset::String: The dataset identifier for metadata
year::Int: The year for metadata

Returns

A processed DataFrame with all dimensions properly mapped.

EurostatAPI.linear_index_to_nd_indices — Function

linear_index_to_nd_indices(idx, dimensions)

Convert a linear index to n-dimensional indices based on the provided dimensions.

source

Utility function for converting linear array indices (as used in the API response) to multi-dimensional indices corresponding to the dataset's dimension structure.

Parameters

idx::Int: The linear index to convert
dimensions::Vector{Int}: Array of dimension sizes

Returns

A vector of indices corresponding to each dimension.

Example

# Convert linear index to multi-dimensional coordinates
dimensions = [3, 4, 2]  # 3×4×2 array structure
coords = EurostatAPI.linear_index_to_nd_indices(15, dimensions)
println("Linear index 15 maps to coordinates: $coords")

Data Processing Details

Dimension Handling

EurostatAPI.jl automatically processes the multi-dimensional structure of Eurostat data:

Dimension Detection: Automatically identifies all dimensions in the dataset
Index Mapping: Converts 0-based API indices to 1-based Julia indices
Label Resolution: Maps dimension codes to human-readable labels when available
Missing Handling: Creates placeholder values for missing dimension mappings

Special Value Processing

The package handles Eurostat's special value codes:

Original Code	Meaning	Converted To
`:C` or `:c`	Confidential	`missing`
`:`	Not available	`missing`
`-`	Not applicable	`missing`
Numeric values	Actual data	Preserved as numbers

Error Handling

The package implements several layers of error handling:

HTTP Errors: Network timeouts, server errors, invalid responses
Data Parsing Errors: Malformed JSON, unexpected data structures
Index Conversion Errors: Out-of-bounds indices, dimension mismatches
Type Conversion Errors: Invalid data types, parsing failures

Common Error Scenarios

# Dataset not found (404 error)
try
    df = fetch_dataset("invalid_dataset_id", 2022)
catch e
    if isa(e, HTTP.ExceptionRequest.StatusError) && e.status == 404
        println("Dataset not found - check the ID")
    end
end

# Handle 413 errors (response too large) automatically
df = fetch_with_fallback("DS-056120", 2022)  # Automatically adds filters if needed

# Fetch large datasets in chunks
df_chunked = fetch_dataset_chunked("DS-056120", 2022; 
                                  chunk_by=:prodcode,
                                  chunk_size=100)

# Get dataset metadata to see available dimensions
metadata = get_dataset_metadata("DS-056120")
println("Available product codes: $(length(metadata.prodcodes))")
println("Available geo codes: $(metadata.geo_codes)")

# Year not available (may return empty dataset)
df = fetch_dataset("nama_10_gdp", 1900)  # Very old year
if nrow(df) == 0
    println("No data available for this year")
end

# Network timeout
try
    df = fetch_dataset("large_dataset", 2022)
catch e
    if isa(e, HTTP.TimeoutError)
        println("Request timed out - try again later")
    end
end

Performance Optimization

Memory Management

For large datasets:

# Monitor memory usage during processing
function fetch_with_monitoring(dataset, year)
    println("Memory before: $(Base.gc_bytes() ÷ 1024^2) MB")
    
    df = fetch_dataset(dataset, year)
    
    println("Memory after: $(Base.gc_bytes() ÷ 1024^2) MB")
    println("Retrieved $(nrow(df)) rows, $(ncol(df)) columns")
    
    return df
end

Processing Large Datasets

The package includes progress logging for large datasets:

# Enable detailed logging
using Logging
global_logger(ConsoleLogger(stderr, Logging.Info))

# Fetch large dataset with progress updates
df = fetch_dataset("large_dataset_id", 2022)

Batch Processing

For multiple years or datasets:

function fetch_multiple_years(dataset, years)
    results = DataFrame()
    
    for year in years
        try
            yearly_data = fetch_dataset(dataset, year)
            if nrow(yearly_data) > 0
                append!(results, yearly_data)
                println("✓ Year $year: $(nrow(yearly_data)) records")
            else
                println("⚠ Year $year: No data")
            end
        catch e
            println("✗ Year $year: Failed ($e)")
        end
        
        # Optional: add delay between requests
        sleep(1)
    end
    
    return results
end

# Usage
multi_year_data = fetch_multiple_years("nama_10_gdp", 2020:2023)

Module Structure

EurostatAPI.jl is organized as a single module with three main exported functions:

fetch_dataset(): Primary data retrieval function with optional filtering
fetch_with_fallback(): Automatic retry with filters on 413 errors
fetch_dataset_chunked(): Fetch large datasets in manageable chunks
get_dataset_metadata(): Retrieve available dimensions and codes
get_dataset_years(): Helper for year ranges
process_eurostat_data(): Core data processing (also exported for advanced users)

The module depends on:

HTTP.jl: For API requests
JSON3.jl: For JSON parsing
DataFrames.jl: For data structure output
Dates.jl: For timestamp handling

Extending the Package

Advanced users can extend EurostatAPI.jl by:

Custom Processing: Using process_eurostat_data() with modified JSON data
Additional Endpoints: Building on the HTTP request patterns
Data Transformations: Post-processing the returned DataFrames
Caching: Implementing local data caching for frequently accessed datasets

Example of custom processing:

using EurostatAPI
using HTTP, JSON3

# Custom function for a specific dataset type
function fetch_with_custom_processing(dataset, year)
    # Use the same HTTP request pattern
    url = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/$dataset?time=$year"
    response = HTTP.get(url, readtimeout=120)
    
    if response.status == 200
        data = JSON3.read(response.body)
        
        # Apply custom processing here
        # Then use the standard processor
        df = process_eurostat_data(data, dataset, year)
        
        # Additional custom transformations
        # ...
        
        return df
    else
        error("Request failed with status $(response.status)")
    end
end