API Reference
This section provides detailed documentation for all functions available in EurostatAPI.jl.
Core Functions
Data Retrieval
EurostatAPI.fetch_dataset
— Functionfetch_dataset(dataset::String, year::Int; indicators=String[], geo=String[], prodcode=String[])
Fetches Eurostat SDMX data from Eurostat API for the specified dataset and year.
Parameters:
dataset
: The Eurostat dataset ID (e.g., "DS-056120", "DS-045409")year
: The year to fetch data for (e.g., 2023)indicators
: Optional vector of indicator codes to filter bygeo
: Optional vector of geographic codes to filter byprodcode
: Optional vector of product codes to filter by
Returns a DataFrame containing the processed data.
EurostatAPI.fetch_with_fallback
— Functionfetch_with_fallback(dataset::String, year::Int; kwargs...)
Fetches dataset with automatic retry using filters if response is too large (413 error).
EurostatAPI.fetch_dataset_chunked
— Functionfetch_dataset_chunked(dataset::String, year::Int; chunk_by=:prodcode, chunk_size=100, kwargs...)
Fetches a large dataset in chunks to avoid size limitations.
Parameters:
dataset
: The Eurostat dataset IDyear
: The year to fetch data forchunk_by
: Dimension to chunk by (:prodcode or :geo)chunk_size
: Number of codes to include in each chunkkwargs...
: Additional filter parameters passed to fetch_dataset
Returns a combined DataFrame containing all chunks.
EurostatAPI.get_dataset_metadata
— Functionget_dataset_metadata(dataset::String)
Fetches metadata for a dataset to get available dimensions. Returns a NamedTuple with available codes for each dimension.
The main function for retrieving data from Eurostat. This function handles all the complexity of:
- Making HTTP requests with proper error handling and retries
- Parsing the SDMX JSON response format
- Converting multi-dimensional data to a flat DataFrame structure
- Handling special values and missing data
Parameters
dataset::String
: The Eurostat dataset identifier (e.g., "nama10gdp", "demo_pjan")year::Int
: The year for which to retrieve data (e.g., 2023, 2022)
Returns
A DataFrame
containing the processed dataset with the following standard columns:
dataset
: The dataset ID that was requestedyear
: The year that was requestedvalue
: The actual data values (may containmissing
for special codes)original_value
: For missing values, contains the original Eurostat codefetch_date
: Timestamp when the data was retrievedoriginal_key
: Internal reference key from the API response- Additional columns for each dimension in the dataset (varies by dataset)
Example
using EurostatAPI
using DataFrames
# Fetch European GDP data for 2022
df = fetch_dataset("nama_10_gdp", 2022)
# Fetch with filters - only specific geographic regions
df_filtered = fetch_dataset("nama_10_gdp", 2022; geo=["EU27_2020", "DE", "FR"])
# Fetch PRODCOM data with specific indicators
df_prodcom = fetch_dataset("DS-056120", 2022;
indicators=["PRODQNT", "QNTUNIT"],
prodcode=["10110000", "10120000"])
# Examine the structure
println("Dataset shape: $(size(df))")
println("Column names: $(names(df))")
# Look at first few rows
first(df, 3)
# Filter for a specific country (if geo dimension exists)
if :geo in names(df)
germany_data = filter(row -> row.geo == "DE", df)
println("Germany records: $(nrow(germany_data))")
end
# Find records with actual values (not missing)
actual_values = filter(row -> !ismissing(row.value), df)
println("Records with values: $(nrow(actual_values))")
Dataset Information
EurostatAPI.get_dataset_years
— Functionget_dataset_years(dataset::String)
Returns typical available years for Eurostat datasets.
Returns a range of years that are typically available for Eurostat datasets. This provides a conservative estimate since actual data availability varies by dataset.
Parameters
dataset::String
: The Eurostat dataset identifier
Returns
An array of integers representing years from 1995 to the current year.
Example
# Get the range of potentially available years
years = get_dataset_years("nama_10_gdp")
println("Year range: $(first(years)) to $(last(years))")
# Try to fetch data for recent years
recent_years = years[end-2:end] # Last 3 years
for year in recent_years
try
df = fetch_dataset("nama_10_gdp", year)
println("Year $year: $(nrow(df)) records")
catch e
println("Year $year: Not available ($e)")
end
end
Internal Functions
The following functions are used internally by EurostatAPI.jl but are documented here for completeness and for users who may want to extend the package functionality.
EurostatAPI.process_eurostat_data
— Functionprocess_eurostat_data(data, dataset, year)
Generic Eurostat SDMX JSON parser to DataFrame.
This function handles the core logic of converting Eurostat's SDMX JSON format into a structured DataFrame. It:
- Extracts dimension information and value mappings
- Converts linear indices to multi-dimensional coordinates
- Handles special values and missing data codes
- Creates a clean DataFrame with proper column types
Parameters
data
: The parsed JSON response from the Eurostat APIdataset::String
: The dataset identifier for metadatayear::Int
: The year for metadata
Returns
A processed DataFrame
with all dimensions properly mapped.
EurostatAPI.linear_index_to_nd_indices
— Functionlinear_index_to_nd_indices(idx, dimensions)
Convert a linear index to n-dimensional indices based on the provided dimensions.
Utility function for converting linear array indices (as used in the API response) to multi-dimensional indices corresponding to the dataset's dimension structure.
Parameters
idx::Int
: The linear index to convertdimensions::Vector{Int}
: Array of dimension sizes
Returns
A vector of indices corresponding to each dimension.
Example
# Convert linear index to multi-dimensional coordinates
dimensions = [3, 4, 2] # 3×4×2 array structure
coords = EurostatAPI.linear_index_to_nd_indices(15, dimensions)
println("Linear index 15 maps to coordinates: $coords")
Data Processing Details
Dimension Handling
EurostatAPI.jl automatically processes the multi-dimensional structure of Eurostat data:
- Dimension Detection: Automatically identifies all dimensions in the dataset
- Index Mapping: Converts 0-based API indices to 1-based Julia indices
- Label Resolution: Maps dimension codes to human-readable labels when available
- Missing Handling: Creates placeholder values for missing dimension mappings
Special Value Processing
The package handles Eurostat's special value codes:
Original Code | Meaning | Converted To |
---|---|---|
:C or :c | Confidential | missing |
: | Not available | missing |
- | Not applicable | missing |
Numeric values | Actual data | Preserved as numbers |
Error Handling
The package implements several layers of error handling:
- HTTP Errors: Network timeouts, server errors, invalid responses
- Data Parsing Errors: Malformed JSON, unexpected data structures
- Index Conversion Errors: Out-of-bounds indices, dimension mismatches
- Type Conversion Errors: Invalid data types, parsing failures
Common Error Scenarios
# Dataset not found (404 error)
try
df = fetch_dataset("invalid_dataset_id", 2022)
catch e
if isa(e, HTTP.ExceptionRequest.StatusError) && e.status == 404
println("Dataset not found - check the ID")
end
end
# Handle 413 errors (response too large) automatically
df = fetch_with_fallback("DS-056120", 2022) # Automatically adds filters if needed
# Fetch large datasets in chunks
df_chunked = fetch_dataset_chunked("DS-056120", 2022;
chunk_by=:prodcode,
chunk_size=100)
# Get dataset metadata to see available dimensions
metadata = get_dataset_metadata("DS-056120")
println("Available product codes: $(length(metadata.prodcodes))")
println("Available geo codes: $(metadata.geo_codes)")
# Year not available (may return empty dataset)
df = fetch_dataset("nama_10_gdp", 1900) # Very old year
if nrow(df) == 0
println("No data available for this year")
end
# Network timeout
try
df = fetch_dataset("large_dataset", 2022)
catch e
if isa(e, HTTP.TimeoutError)
println("Request timed out - try again later")
end
end
Performance Optimization
Memory Management
For large datasets:
# Monitor memory usage during processing
function fetch_with_monitoring(dataset, year)
println("Memory before: $(Base.gc_bytes() ÷ 1024^2) MB")
df = fetch_dataset(dataset, year)
println("Memory after: $(Base.gc_bytes() ÷ 1024^2) MB")
println("Retrieved $(nrow(df)) rows, $(ncol(df)) columns")
return df
end
Processing Large Datasets
The package includes progress logging for large datasets:
# Enable detailed logging
using Logging
global_logger(ConsoleLogger(stderr, Logging.Info))
# Fetch large dataset with progress updates
df = fetch_dataset("large_dataset_id", 2022)
Batch Processing
For multiple years or datasets:
function fetch_multiple_years(dataset, years)
results = DataFrame()
for year in years
try
yearly_data = fetch_dataset(dataset, year)
if nrow(yearly_data) > 0
append!(results, yearly_data)
println("✓ Year $year: $(nrow(yearly_data)) records")
else
println("⚠ Year $year: No data")
end
catch e
println("✗ Year $year: Failed ($e)")
end
# Optional: add delay between requests
sleep(1)
end
return results
end
# Usage
multi_year_data = fetch_multiple_years("nama_10_gdp", 2020:2023)
Module Structure
EurostatAPI.jl is organized as a single module with three main exported functions:
fetch_dataset()
: Primary data retrieval function with optional filteringfetch_with_fallback()
: Automatic retry with filters on 413 errorsfetch_dataset_chunked()
: Fetch large datasets in manageable chunksget_dataset_metadata()
: Retrieve available dimensions and codesget_dataset_years()
: Helper for year rangesprocess_eurostat_data()
: Core data processing (also exported for advanced users)
The module depends on:
HTTP.jl
: For API requestsJSON3.jl
: For JSON parsingDataFrames.jl
: For data structure outputDates.jl
: For timestamp handling
Extending the Package
Advanced users can extend EurostatAPI.jl by:
- Custom Processing: Using
process_eurostat_data()
with modified JSON data - Additional Endpoints: Building on the HTTP request patterns
- Data Transformations: Post-processing the returned DataFrames
- Caching: Implementing local data caching for frequently accessed datasets
Example of custom processing:
using EurostatAPI
using HTTP, JSON3
# Custom function for a specific dataset type
function fetch_with_custom_processing(dataset, year)
# Use the same HTTP request pattern
url = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/$dataset?time=$year"
response = HTTP.get(url, readtimeout=120)
if response.status == 200
data = JSON3.read(response.body)
# Apply custom processing here
# Then use the standard processor
df = process_eurostat_data(data, dataset, year)
# Additional custom transformations
# ...
return df
else
error("Request failed with status $(response.status)")
end
end