본문으로 건너뛰기
본문으로 건너뛰기

DataStore Configuration

DataStore provides comprehensive configuration options for execution engine selection, compatibility mode, logging, caching, profiling, and dtype correction.

Quick Reference

from chdb.datastore.config import config

# Quick setup presets
config.enable_debug()           # Enable verbose logging
config.use_chdb()               # Force ClickHouse engine
config.use_pandas()             # Force pandas engine
config.use_auto()               # Auto-select engine (default)
config.use_performance_mode()   # SQL-first, max throughput
config.use_pandas_compat()      # Full pandas compatibility (default)
config.enable_profiling()       # Enable performance profiling

All Configuration Options

CategoryOptionValuesDefaultDescription
Logginglog_levelDEBUG/INFO/WARNING/ERRORWARNINGLog verbosity
log_format"simple", "verbose""simple"Log message format
Cachecache_enabledTrue/FalseTrueEnable result caching
cache_ttlfloat (seconds)0.0Cache time-to-live
Engineexecution_engine"auto", "chdb", "pandas""auto"Execution engine
cross_datastore_engine"auto", "chdb", "pandas""auto"Cross-DataStore operations
Compatcompat_mode"pandas", "performance""pandas"Pandas compatibility vs SQL-first throughput
Profilingprofiling_enabledTrue/FalseFalseEnable profiling
Dtypecorrection_levelNONE/CRITICAL/HIGH/MEDIUM/ALLHIGHDtype correction level

Configuration Methods

Logging Configuration

from chdb.datastore.config import config
import logging

# Set log level
config.set_log_level(logging.DEBUG)
config.set_log_level(logging.INFO)
config.set_log_level(logging.WARNING)  # Default
config.set_log_level(logging.ERROR)

# Set log format
config.set_log_format("simple")   # Default
config.set_log_format("verbose")  # More details

# Quick enable debug mode
config.enable_debug()  # Sets DEBUG level + verbose format

See Logging for details.

Cache Configuration

# Enable/disable caching
config.set_cache_enabled(True)   # Default
config.set_cache_enabled(False)  # Disable caching

# Set cache TTL (time-to-live)
config.set_cache_ttl(60.0)  # Cache expires after 60 seconds
config.set_cache_ttl(0.0)   # No expiration (default)

# Check current settings
print(config.cache_enabled)
print(config.cache_ttl)

Engine Configuration

# Set execution engine
config.set_execution_engine('auto')    # Auto-select (default)
config.set_execution_engine('chdb')    # Force ClickHouse
config.set_execution_engine('pandas')  # Force pandas

# Quick presets
config.use_auto()     # Auto-select
config.use_chdb()     # Force ClickHouse
config.use_pandas()   # Force pandas

# Cross-DataStore engine (for operations between different DataStores)
config.set_cross_datastore_engine('auto')
config.set_cross_datastore_engine('chdb')
config.set_cross_datastore_engine('pandas')

# Check current engine
print(config.execution_engine)

See Execution Engine for details.

Compatibility Mode

# Performance mode: SQL-first, no pandas compatibility overhead
config.use_performance_mode()
# or: config.set_compat_mode('performance')

# Pandas compatibility mode (default)
config.use_pandas_compat()
# or: config.set_compat_mode('pandas')

# Check current mode
print(config.compat_mode)  # 'pandas' or 'performance'

See Performance Mode for details.

Profiling Configuration

# Enable profiling
config.enable_profiling()
config.set_profiling_enabled(True)

# Disable profiling
config.set_profiling_enabled(False)

# Check if profiling is enabled
print(config.profiling_enabled)

See Profiling for details.

Dtype Correction

from chdb.datastore.dtype_correction.config import CorrectionLevel

# Set correction level
config.set_correction_level(CorrectionLevel.NONE)      # No correction
config.set_correction_level(CorrectionLevel.CRITICAL)  # Critical types only
config.set_correction_level(CorrectionLevel.HIGH)      # Default
config.set_correction_level(CorrectionLevel.MEDIUM)    # More corrections
config.set_correction_level(CorrectionLevel.ALL)       # All corrections

Using config Object

The config object is a singleton that manages all settings:

from chdb.datastore.config import config

# Read settings
print(config.log_level)
print(config.execution_engine)
print(config.cache_enabled)
print(config.profiling_enabled)

# Modify settings
config.set_log_level(logging.DEBUG)
config.set_execution_engine('chdb')
config.set_cache_enabled(False)
config.enable_profiling()

Configuration in Code

Per-Script Configuration

from chdb import datastore as pd
from chdb.datastore.config import config

# Configure at script start
config.enable_debug()
config.use_chdb()
config.enable_profiling()

# Your DataStore code
ds = pd.read_csv("data.csv")
result = ds.filter(ds['age'] > 25).groupby('city').agg({'salary': 'mean'})

Context Manager (Future)

# Planned feature: temporary configuration
with config.override(execution_engine='pandas'):
    result = ds.process()
# Original settings restored

Common Configuration Scenarios

Development/Debugging

from chdb.datastore.config import config

config.enable_debug()        # Verbose logging
config.enable_profiling()    # Performance tracking
config.set_cache_enabled(False)  # Disable caching for fresh results

Production

from chdb.datastore.config import config
import logging

config.set_log_level(logging.WARNING)  # Minimal logging
config.set_execution_engine('auto')    # Optimal engine selection
config.set_cache_enabled(True)         # Enable caching
config.set_profiling_enabled(False)    # Disable profiling overhead

Maximum Throughput

from chdb.datastore.config import config

config.use_performance_mode()    # SQL-first, no pandas overhead
config.set_cache_enabled(False)  # Disable cache for streaming

Performance Testing

from chdb.datastore.config import config

config.use_chdb()            # Force ClickHouse for benchmarks
config.enable_profiling()    # Track performance
config.set_cache_enabled(False)  # Disable cache for accurate timing

Pandas Compatibility Testing

from chdb.datastore.config import config

config.use_pandas()          # Force pandas engine
config.enable_debug()        # See what operations are used