API Reference

This section provides detailed documentation for all functions available in EurostatAPI.jl.

Core Functions

Data Retrieval

EurostatAPI.fetch_datasetFunction
fetch_dataset(dataset::String, year::Int; indicators=String[], geo=String[], prodcode=String[])

Fetches Eurostat SDMX data from Eurostat API for the specified dataset and year.

Parameters:

  • dataset: The Eurostat dataset ID (e.g., "DS-056120", "DS-045409")
  • year: The year to fetch data for (e.g., 2023)
  • indicators: Optional vector of indicator codes to filter by
  • geo: Optional vector of geographic codes to filter by
  • prodcode: Optional vector of product codes to filter by

Returns a DataFrame containing the processed data.

source
EurostatAPI.fetch_with_fallbackFunction
fetch_with_fallback(dataset::String, year::Int; kwargs...)

Fetches dataset with automatic retry using filters if response is too large (413 error).

source
EurostatAPI.fetch_dataset_chunkedFunction
fetch_dataset_chunked(dataset::String, year::Int; chunk_by=:prodcode, chunk_size=100, kwargs...)

Fetches a large dataset in chunks to avoid size limitations.

Parameters:

  • dataset: The Eurostat dataset ID
  • year: The year to fetch data for
  • chunk_by: Dimension to chunk by (:prodcode or :geo)
  • chunk_size: Number of codes to include in each chunk
  • kwargs...: Additional filter parameters passed to fetch_dataset

Returns a combined DataFrame containing all chunks.

source
EurostatAPI.get_dataset_metadataFunction
get_dataset_metadata(dataset::String)

Fetches metadata for a dataset to get available dimensions. Returns a NamedTuple with available codes for each dimension.

source

The main function for retrieving data from Eurostat. This function handles all the complexity of:

  • Making HTTP requests with proper error handling and retries
  • Parsing the SDMX JSON response format
  • Converting multi-dimensional data to a flat DataFrame structure
  • Handling special values and missing data

Parameters

  • dataset::String: The Eurostat dataset identifier (e.g., "nama10gdp", "demo_pjan")
  • year::Int: The year for which to retrieve data (e.g., 2023, 2022)

Returns

A DataFrame containing the processed dataset with the following standard columns:

  • dataset: The dataset ID that was requested
  • year: The year that was requested
  • value: The actual data values (may contain missing for special codes)
  • original_value: For missing values, contains the original Eurostat code
  • fetch_date: Timestamp when the data was retrieved
  • original_key: Internal reference key from the API response
  • Additional columns for each dimension in the dataset (varies by dataset)

Example

using EurostatAPI
using DataFrames

# Fetch European GDP data for 2022
df = fetch_dataset("nama_10_gdp", 2022)

# Fetch with filters - only specific geographic regions
df_filtered = fetch_dataset("nama_10_gdp", 2022; geo=["EU27_2020", "DE", "FR"])

# Fetch PRODCOM data with specific indicators
df_prodcom = fetch_dataset("DS-056120", 2022; 
                          indicators=["PRODQNT", "QNTUNIT"],
                          prodcode=["10110000", "10120000"])

# Examine the structure
println("Dataset shape: $(size(df))")
println("Column names: $(names(df))")

# Look at first few rows
first(df, 3)

# Filter for a specific country (if geo dimension exists)
if :geo in names(df)
    germany_data = filter(row -> row.geo == "DE", df)
    println("Germany records: $(nrow(germany_data))")
end

# Find records with actual values (not missing)
actual_values = filter(row -> !ismissing(row.value), df)
println("Records with values: $(nrow(actual_values))")

Dataset Information

Returns a range of years that are typically available for Eurostat datasets. This provides a conservative estimate since actual data availability varies by dataset.

Parameters

  • dataset::String: The Eurostat dataset identifier

Returns

An array of integers representing years from 1995 to the current year.

Example

# Get the range of potentially available years
years = get_dataset_years("nama_10_gdp")
println("Year range: $(first(years)) to $(last(years))")

# Try to fetch data for recent years
recent_years = years[end-2:end]  # Last 3 years
for year in recent_years
    try
        df = fetch_dataset("nama_10_gdp", year)
        println("Year $year: $(nrow(df)) records")
    catch e
        println("Year $year: Not available ($e)")
    end
end

Internal Functions

The following functions are used internally by EurostatAPI.jl but are documented here for completeness and for users who may want to extend the package functionality.

This function handles the core logic of converting Eurostat's SDMX JSON format into a structured DataFrame. It:

  1. Extracts dimension information and value mappings
  2. Converts linear indices to multi-dimensional coordinates
  3. Handles special values and missing data codes
  4. Creates a clean DataFrame with proper column types

Parameters

  • data: The parsed JSON response from the Eurostat API
  • dataset::String: The dataset identifier for metadata
  • year::Int: The year for metadata

Returns

A processed DataFrame with all dimensions properly mapped.

Utility function for converting linear array indices (as used in the API response) to multi-dimensional indices corresponding to the dataset's dimension structure.

Parameters

  • idx::Int: The linear index to convert
  • dimensions::Vector{Int}: Array of dimension sizes

Returns

A vector of indices corresponding to each dimension.

Example

# Convert linear index to multi-dimensional coordinates
dimensions = [3, 4, 2]  # 3×4×2 array structure
coords = EurostatAPI.linear_index_to_nd_indices(15, dimensions)
println("Linear index 15 maps to coordinates: $coords")

Data Processing Details

Dimension Handling

EurostatAPI.jl automatically processes the multi-dimensional structure of Eurostat data:

  1. Dimension Detection: Automatically identifies all dimensions in the dataset
  2. Index Mapping: Converts 0-based API indices to 1-based Julia indices
  3. Label Resolution: Maps dimension codes to human-readable labels when available
  4. Missing Handling: Creates placeholder values for missing dimension mappings

Special Value Processing

The package handles Eurostat's special value codes:

Original CodeMeaningConverted To
:C or :cConfidentialmissing
:Not availablemissing
-Not applicablemissing
Numeric valuesActual dataPreserved as numbers

Error Handling

The package implements several layers of error handling:

  1. HTTP Errors: Network timeouts, server errors, invalid responses
  2. Data Parsing Errors: Malformed JSON, unexpected data structures
  3. Index Conversion Errors: Out-of-bounds indices, dimension mismatches
  4. Type Conversion Errors: Invalid data types, parsing failures

Common Error Scenarios

# Dataset not found (404 error)
try
    df = fetch_dataset("invalid_dataset_id", 2022)
catch e
    if isa(e, HTTP.ExceptionRequest.StatusError) && e.status == 404
        println("Dataset not found - check the ID")
    end
end

# Handle 413 errors (response too large) automatically
df = fetch_with_fallback("DS-056120", 2022)  # Automatically adds filters if needed

# Fetch large datasets in chunks
df_chunked = fetch_dataset_chunked("DS-056120", 2022; 
                                  chunk_by=:prodcode,
                                  chunk_size=100)

# Get dataset metadata to see available dimensions
metadata = get_dataset_metadata("DS-056120")
println("Available product codes: $(length(metadata.prodcodes))")
println("Available geo codes: $(metadata.geo_codes)")

# Year not available (may return empty dataset)
df = fetch_dataset("nama_10_gdp", 1900)  # Very old year
if nrow(df) == 0
    println("No data available for this year")
end

# Network timeout
try
    df = fetch_dataset("large_dataset", 2022)
catch e
    if isa(e, HTTP.TimeoutError)
        println("Request timed out - try again later")
    end
end

Performance Optimization

Memory Management

For large datasets:

# Monitor memory usage during processing
function fetch_with_monitoring(dataset, year)
    println("Memory before: $(Base.gc_bytes() ÷ 1024^2) MB")
    
    df = fetch_dataset(dataset, year)
    
    println("Memory after: $(Base.gc_bytes() ÷ 1024^2) MB")
    println("Retrieved $(nrow(df)) rows, $(ncol(df)) columns")
    
    return df
end

Processing Large Datasets

The package includes progress logging for large datasets:

# Enable detailed logging
using Logging
global_logger(ConsoleLogger(stderr, Logging.Info))

# Fetch large dataset with progress updates
df = fetch_dataset("large_dataset_id", 2022)

Batch Processing

For multiple years or datasets:

function fetch_multiple_years(dataset, years)
    results = DataFrame()
    
    for year in years
        try
            yearly_data = fetch_dataset(dataset, year)
            if nrow(yearly_data) > 0
                append!(results, yearly_data)
                println("✓ Year $year: $(nrow(yearly_data)) records")
            else
                println("⚠ Year $year: No data")
            end
        catch e
            println("✗ Year $year: Failed ($e)")
        end
        
        # Optional: add delay between requests
        sleep(1)
    end
    
    return results
end

# Usage
multi_year_data = fetch_multiple_years("nama_10_gdp", 2020:2023)

Module Structure

EurostatAPI.jl is organized as a single module with three main exported functions:

  • fetch_dataset(): Primary data retrieval function with optional filtering
  • fetch_with_fallback(): Automatic retry with filters on 413 errors
  • fetch_dataset_chunked(): Fetch large datasets in manageable chunks
  • get_dataset_metadata(): Retrieve available dimensions and codes
  • get_dataset_years(): Helper for year ranges
  • process_eurostat_data(): Core data processing (also exported for advanced users)

The module depends on:

  • HTTP.jl: For API requests
  • JSON3.jl: For JSON parsing
  • DataFrames.jl: For data structure output
  • Dates.jl: For timestamp handling

Extending the Package

Advanced users can extend EurostatAPI.jl by:

  1. Custom Processing: Using process_eurostat_data() with modified JSON data
  2. Additional Endpoints: Building on the HTTP request patterns
  3. Data Transformations: Post-processing the returned DataFrames
  4. Caching: Implementing local data caching for frequently accessed datasets

Example of custom processing:

using EurostatAPI
using HTTP, JSON3

# Custom function for a specific dataset type
function fetch_with_custom_processing(dataset, year)
    # Use the same HTTP request pattern
    url = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/$dataset?time=$year"
    response = HTTP.get(url, readtimeout=120)
    
    if response.status == 200
        data = JSON3.read(response.body)
        
        # Apply custom processing here
        # Then use the standard processor
        df = process_eurostat_data(data, dataset, year)
        
        # Additional custom transformations
        # ...
        
        return df
    else
        error("Request failed with status $(response.status)")
    end
end