Configuration¶
This guide covers all configuration options for customizing faker-news behavior.
Provider Configuration¶
NewsProvider Parameters¶
Configure the provider when creating it:
from faker import Faker
from faker_news import NewsProvider, LLMClientConfig
fake = Faker()
provider = NewsProvider(
fake,
db_path="/custom/path/cache.sqlite3", # Custom cache location
min_headline_pool=50, # Minimum unused headlines threshold
headline_batch=60, # Headlines per batch
intro_batch=30, # Intros per batch
article_batch=15, # Articles per batch
llm_config=None # LLM configuration (optional)
)
fake.add_provider(provider)
Configuration Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
db_path |
str |
Platform-specific | SQLite database file path |
min_headline_pool |
int |
30 |
Minimum unused headlines before auto-refill |
headline_batch |
int |
40 |
Headlines generated per batch |
intro_batch |
int |
20 |
Intros generated per batch |
article_batch |
int |
10 |
Articles generated per batch |
llm_config |
LLMClientConfig |
None |
LLM client configuration |
Default Database Paths¶
If db_path is not specified, platform-specific defaults are used:
| Platform | Default Path |
|---|---|
| Linux | ~/.cache/faker-news/cache.sqlite3 |
| macOS | ~/Library/Caches/faker-news/cache.sqlite3 |
| Windows | %LOCALAPPDATA%\faker-news\cache\cache.sqlite3 |
LLM Configuration¶
LLMClientConfig Parameters¶
Configure the LLM client:
from faker_news import LLMClientConfig
llm_config = LLMClientConfig(
api_key="sk-your-api-key", # API key
base_url="https://api.openai.com/v1", # API endpoint
model_headlines="gpt-4o-mini", # Model for headlines
model_writing="gpt-4o-mini" # Model for intros/articles
)
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
Auto-detected | LLM API key |
base_url |
str |
Auto-detected | API endpoint URL |
model_headlines |
str |
Provider-specific | Model for headline generation |
model_writing |
str |
Provider-specific | Model for intro/article generation |
Auto-Detection¶
If not specified, values are auto-detected:
- API Key: Checked in order:
- System keyring (
keyring.get_password("faker-news", "openai")) - Environment variable (
OPENAI_API_KEYorDASHSCOPE_API_KEY) -
Explicit config
-
Base URL: Checked in order:
- Environment variable (
OPENAI_BASE_URLorDASHSCOPE_BASE_URL) -
Default based on detected provider
-
Models: Auto-selected based on provider:
- OpenAI:
gpt-4o-mini - DashScope/Qwen:
qwen-flash
Configuration Examples¶
Development Setup¶
Fast, cheap models for development:
from faker import Faker
from faker_news import NewsProvider, LLMClientConfig
llm_config = LLMClientConfig(
model_headlines="gpt-4o-mini", # Fast and cheap
model_writing="gpt-4o-mini"
)
fake = Faker()
provider = NewsProvider(
fake,
llm_config=llm_config,
min_headline_pool=20, # Lower threshold
headline_batch=30, # Smaller batches
db_path="/tmp/dev-cache.sqlite3"
)
fake.add_provider(provider)
Production Setup¶
Better models for production:
llm_config = LLMClientConfig(
model_headlines="gpt-4o-mini", # Fast for headlines
model_writing="gpt-4o" # Better quality for articles
)
fake = Faker()
provider = NewsProvider(
fake,
llm_config=llm_config,
min_headline_pool=100, # Larger buffer
headline_batch=150, # Bigger batches
db_path="/var/cache/faker-news/cache.sqlite3"
)
fake.add_provider(provider)
High-Volume Setup¶
Optimized for generating lots of content:
provider = NewsProvider(
fake,
min_headline_pool=200, # Large buffer
headline_batch=250, # Large batches
intro_batch=50,
article_batch=25,
db_path="/data/faker-news-cache.sqlite3"
)
Testing Setup¶
Isolated environment for testing:
import tempfile
from pathlib import Path
# Use temporary database
temp_db = Path(tempfile.mkdtemp()) / "test-cache.sqlite3"
provider = NewsProvider(
fake,
db_path=str(temp_db),
min_headline_pool=10, # Small for tests
headline_batch=15
)
Multi-Model Setup¶
Use different models for different content types:
llm_config = LLMClientConfig(
api_key="sk-...",
model_headlines="gpt-4o-mini", # Fast, simple model for headlines
model_writing="gpt-4o" # Better model for full articles
)
provider = NewsProvider(fake, llm_config=llm_config)
Tuning Performance¶
Batch Sizes¶
Larger batches = fewer API calls but longer waits:
| Use Case | Batch Size | Pros | Cons |
|---|---|---|---|
| Development | 20-40 | Fast feedback | More API calls |
| Production | 100-200 | Fewer API calls | Longer waits |
| High-volume | 200+ | Optimal efficiency | Memory usage |
# Conservative (more API calls, but faster feedback)
provider = NewsProvider(
fake,
headline_batch=30,
intro_batch=15,
article_batch=8
)
# Aggressive (fewer API calls, but longer waits)
provider = NewsProvider(
fake,
headline_batch=200,
intro_batch=50,
article_batch=30
)
Pool Threshold¶
Controls when auto-refill happens:
# Low threshold (more frequent refills)
provider = NewsProvider(fake, min_headline_pool=10)
# Refills when < 10 unused headlines
# High threshold (less frequent refills, but larger buffer)
provider = NewsProvider(fake, min_headline_pool=100)
# Refills when < 100 unused headlines
Recommendations:
- Interactive use: Low threshold (10-30)
- Batch processing: High threshold (50-200)
- Background jobs: Very high threshold (200+)
Memory Considerations¶
Large batches use more memory during generation:
# Memory-efficient (smaller batches)
provider = NewsProvider(
fake,
headline_batch=40,
article_batch=10
)
# Memory-intensive (larger batches)
provider = NewsProvider(
fake,
headline_batch=500,
article_batch=100
)
Provider-Specific Configuration¶
OpenAI¶
llm_config = LLMClientConfig(
api_key="sk-...",
base_url="https://api.openai.com/v1",
model_headlines="gpt-4o-mini",
model_writing="gpt-4o"
)
Alibaba DashScope/Qwen¶
llm_config = LLMClientConfig(
api_key="sk-...",
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
model_headlines="qwen-flash", # Fast model
model_writing="qwen-plus" # Better quality
)
Azure OpenAI¶
llm_config = LLMClientConfig(
api_key="your-azure-key",
base_url="https://your-resource.openai.azure.com/openai/deployments/your-deployment",
model_headlines="gpt-4o-mini",
model_writing="gpt-4"
)
Custom OpenAI-Compatible API¶
llm_config = LLMClientConfig(
api_key="your-api-key",
base_url="https://custom-api.example.com/v1",
model_headlines="custom-model-fast",
model_writing="custom-model-quality"
)
Environment Variables¶
Setting Credentials¶
Instead of hardcoding, use environment variables:
# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://api.openai.com/v1" # Optional
# DashScope
export DASHSCOPE_API_KEY="sk-..."
export DASHSCOPE_BASE_URL="https://dashscope-intl.aliyuncs.com/compatible-mode/v1" # Optional
Then use without explicit config:
from faker import Faker
from faker_news import NewsProvider
# Auto-detects from environment
fake = Faker()
fake.add_provider(NewsProvider(fake))
Advanced Configuration¶
Multiple Providers¶
Use different configurations for different purposes:
from faker import Faker
from faker_news import NewsProvider, LLMClientConfig
# Fast provider for headlines only
fast_config = LLMClientConfig(model_headlines="gpt-4o-mini")
fast_provider = NewsProvider(
Faker(),
llm_config=fast_config,
db_path="/tmp/headlines.sqlite3"
)
# Quality provider for articles
quality_config = LLMClientConfig(model_writing="gpt-4o")
quality_provider = NewsProvider(
Faker(),
llm_config=quality_config,
db_path="/tmp/articles.sqlite3"
)
Dynamic Configuration¶
Adjust configuration at runtime:
def create_provider(env="development"):
"""Create provider based on environment."""
if env == "production":
return NewsProvider(
fake,
min_headline_pool=100,
headline_batch=200,
db_path="/var/cache/faker-news.sqlite3"
)
else:
return NewsProvider(
fake,
min_headline_pool=20,
headline_batch=30,
db_path="/tmp/dev-cache.sqlite3"
)
# Use it
provider = create_provider(env="production")
fake.add_provider(provider)
Next Steps¶
- LLM Providers - Detailed guides for specific LLM providers
- API Reference - Complete API documentation
- Architecture - Understanding the internal architecture