Quickwit 使用指南 - 云原生搜索引擎完整实践
Quickwit 是一个云原生的搜索引擎,专为可观测性数据(日志、追踪)设计。它是 Datadog、Elasticsearch、Loki 和 Tempo 的开源替代方案,基于 Rust 开发,底层使用 Tantivy 全文搜索引擎库。
- 亚秒级搜索:在对象存储上实现亚秒级查询性能
- 云原生架构:原生支持 S3、Azure Blob、GCS 等对象存储
- 成本优化:相比传统方案可节省 10 倍以上存储成本
- 分布式搜索:支持水平扩展的分布式搜索集群
- 多租户支持:通过标签和分区实现高效的多租户隔离
graph TB
subgraph "数据摄入层"
A[日志源] --> B[Quickwit Indexer]
C[追踪数据] --> B
D[OTEL Collector] --> B
end
subgraph "Quickwit 集群"
B --> E[Metastore]
B --> F[对象存储 S3/OSS]
G[Searcher 1] --> E
H[Searcher 2] --> E
I[Searcher 3] --> E
G --> F
H --> F
I --> F
end
subgraph "查询层"
J[REST API] --> G
J --> H
J --> I
K[Grafana] --> J
L[应用程序] --> J
end
style B fill:#f9f,stroke:#333
style E fill:#bbf,stroke:#333
style F fill:#bfb,stroke:#333
1. 安装 Quickwit
Section titled “1. 安装 Quickwit”方式一:使用安装脚本
Section titled “方式一:使用安装脚本”# 下载并安装curl -L https://install.quickwit.io | shcd quickwit-v*/
# 验证安装./quickwit --version方式二:使用 Docker
Section titled “方式二:使用 Docker”# 创建数据目录mkdir qwdata
# 运行 Docker 容器docker run --rm quickwit/quickwit --version
# Apple Silicon Mac 需要指定平台docker run --rm --platform linux/amd64 quickwit/quickwit --version2. 启动 Quickwit 服务
Section titled “2. 启动 Quickwit 服务”# 本地启动./quickwit run
# 使用 Docker 启动docker run --rm \ -v $(pwd)/qwdata:/quickwit/qwdata \ -p 127.0.0.1:7280:7280 \ quickwit/quickwit run访问 UI:http://localhost:7280
3. 创建索引
Section titled “3. 创建索引”创建索引配置文件 index-config.yaml:
version: 0.7index_id: logsdoc_mapping: field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: level type: text tokenizer: raw - name: message type: text tokenizer: default record: position - name: service type: text tokenizer: raw tag_fields: [service] timestamp_field: timestampsearch_settings: default_search_fields: [message]创建索引:
./quickwit index create --index-config index-config.yaml4. 摄入数据
Section titled “4. 摄入数据”准备 NDJSON 格式的日志数据 logs.json:
{"timestamp": 1710691200, "level": "INFO", "message": "Application started", "service": "api"}{"timestamp": 1710691201, "level": "ERROR", "message": "Database connection failed", "service": "api"}{"timestamp": 1710691202, "level": "WARN", "message": "High memory usage detected", "service": "worker"}摄入数据:
./quickwit index ingest \ --index logs \ --input-path logs.json \ --force对象存储集成
Section titled “对象存储集成”Quickwit 的核心优势之一是原生支持对象存储,可以大幅降低存储成本。
支持的存储提供商
Section titled “支持的存储提供商”- Amazon S3 及 S3 兼容存储(MinIO、Garage 等)
- 阿里云 OSS(通过 S3 兼容 API)
- Azure Blob Storage
- Google Cloud Storage
- 本地文件系统
配置 S3/OSS 存储
Section titled “配置 S3/OSS 存储”1. 创建配置文件
Section titled “1. 创建配置文件”创建 config.yaml:
version: 0.7node_id: quickwit-node-1listen_address: 0.0.0.0
# Metastore 存储位置metastore_uri: s3://my-bucket/quickwit/indexes
# 默认索引根目录default_index_root_uri: s3://my-bucket/quickwit/indexes
# S3 存储配置storage: s3: region: us-east-1 # 可选:自定义端点(用于 OSS 或 MinIO) endpoint: https://oss-cn-hangzhou.aliyuncs.com # 可选:强制路径风格访问(MinIO 需要) force_path_style_access: false2. 配置阿里云 OSS
Section titled “2. 配置阿里云 OSS”阿里云 OSS 通过 S3 兼容 API 访问:
version: 0.7node_id: quickwit-oss-nodelisten_address: 0.0.0.0
metastore_uri: s3://my-oss-bucket/quickwit/indexesdefault_index_root_uri: s3://my-oss-bucket/quickwit/indexes
storage: s3: region: cn-hangzhou endpoint: https://oss-cn-hangzhou.aliyuncs.com access_key_id: ${OSS_ACCESS_KEY_ID} secret_access_key: ${OSS_SECRET_ACCESS_KEY}3. 配置环境变量
Section titled “3. 配置环境变量”# 设置 OSS 访问凭证export OSS_ACCESS_KEY_ID="your-access-key-id"export OSS_SECRET_ACCESS_KEY="your-secret-access-key"
# 或使用 AWS 环境变量(S3 兼容)export AWS_ACCESS_KEY_ID="your-access-key-id"export AWS_SECRET_ACCESS_KEY="your-secret-access-key"export AWS_REGION="cn-hangzhou"
# 启动 Quickwit./quickwit run --config config.yaml4. MinIO 配置示例
Section titled “4. MinIO 配置示例”version: 0.7node_id: quickwit-miniolisten_address: 0.0.0.0
metastore_uri: s3://quickwit-indexes/indexesdefault_index_root_uri: s3://quickwit-indexes/indexes
storage: s3: flavor: minio endpoint: http://minio:9000 access_key_id: minioadmin secret_access_key: minioadmin5. Google Cloud Storage 配置
Section titled “5. Google Cloud Storage 配置”version: 0.7node_id: quickwit-gcslisten_address: 0.0.0.0
metastore_uri: s3://my-gcs-bucket/quickwitdefault_index_root_uri: s3://my-gcs-bucket/quickwit
storage: s3: flavor: gcs region: us-east1 endpoint: https://storage.googleapis.comDocker Compose 部署示例
Section titled “Docker Compose 部署示例”创建 docker-compose.yml:
version: '3.8'
services: # MinIO 对象存储 minio: image: minio/minio:latest command: server /data --console-address ":9001" ports: - "9000:9000" - "9001:9001" environment: MINIO_ROOT_USER: minioadmin MINIO_ROOT_PASSWORD: minioadmin volumes: - minio-data:/data healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"] interval: 30s timeout: 20s retries: 3
# Quickwit 服务 quickwit: image: quickwit/quickwit:latest command: run ports: - "7280:7280" environment: AWS_ACCESS_KEY_ID: minioadmin AWS_SECRET_ACCESS_KEY: minioadmin QW_S3_ENDPOINT: http://minio:9000 volumes: - ./config.yaml:/quickwit/config.yaml depends_on: - minio
volumes: minio-data:配置文件 config.yaml:
version: 0.7node_id: quickwit-dockerlisten_address: 0.0.0.0
metastore_uri: s3://quickwit/indexesdefault_index_root_uri: s3://quickwit/indexes
storage: s3: flavor: minio endpoint: http://minio:9000启动服务:
# 启动所有服务docker-compose up -d
# 查看日志docker-compose logs -f quickwit
# 创建 MinIO bucketdocker-compose exec minio mc mb /data/quickwit全文搜索功能
Section titled “全文搜索功能”Quickwit 提供强大的全文搜索能力,支持复杂的查询语法。
1. 简单查询
Section titled “1. 简单查询”# 搜索包含 "error" 的文档curl "http://localhost:7280/api/v1/logs/search?query=error"
# 搜索特定字段curl "http://localhost:7280/api/v1/logs/search?query=level:ERROR"
# 使用 CLI./quickwit index search --index logs --query "error"2. 布尔查询
Section titled “2. 布尔查询”# AND 查询curl "http://localhost:7280/api/v1/logs/search?query=error+AND+database"
# OR 查询curl "http://localhost:7280/api/v1/logs/search?query=error+OR+warning"
# NOT 查询curl "http://localhost:7280/api/v1/logs/search?query=error+NOT+timeout"
# 组合查询curl "http://localhost:7280/api/v1/logs/search?query=(error+OR+warning)+AND+service:api"3. 短语搜索
Section titled “3. 短语搜索”# 精确短语匹配curl "http://localhost:7280/api/v1/logs/search?query=\"database+connection+failed\""1. 时间范围查询
Section titled “1. 时间范围查询”# 使用时间戳过滤curl "http://localhost:7280/api/v1/logs/search?query=error&start_timestamp=1710691200&end_timestamp=1710777600"2. 聚合查询
Section titled “2. 聚合查询”使用 JSON 格式进行聚合查询:
curl -X POST "http://localhost:7280/api/v1/logs/search" \ -H 'Content-Type: application/json' \ -d '{ "query": "*", "max_hits": 0, "aggs": { "log_levels": { "terms": { "field": "level", "size": 10 } } } }'响应示例:
{ "num_hits": 1000, "hits": [], "aggregations": { "log_levels": { "buckets": [ {"key": "ERROR", "doc_count": 450}, {"key": "WARN", "doc_count": 300}, {"key": "INFO", "doc_count": 250} ] } }}3. 多维度聚合
Section titled “3. 多维度聚合”curl -X POST "http://localhost:7280/api/v1/logs/search" \ -H 'Content-Type: application/json' \ -d '{ "query": "level:ERROR", "max_hits": 10, "aggs": { "by_service": { "terms": { "field": "service", "size": 5 }, "aggs": { "by_hour": { "date_histogram": { "field": "timestamp", "fixed_interval": "1h" } } } } } }'4. 范围查询
Section titled “4. 范围查询”# 数值范围curl "http://localhost:7280/api/v1/logs/search?query=response_time:[100+TO+500]"
# 日期范围curl "http://localhost:7280/api/v1/logs/search?query=timestamp:[2024-01-01+TO+2024-12-31]"5. 通配符和正则表达式
Section titled “5. 通配符和正则表达式”# 通配符查询curl "http://localhost:7280/api/v1/logs/search?query=message:data*"
# 模糊匹配curl "http://localhost:7280/api/v1/logs/search?query=message:databse~1"搜索流式 API
Section titled “搜索流式 API”对于大量结果,使用流式 API 更高效:
# 流式返回所有匹配的文档curl "http://localhost:7280/api/v1/logs/search/stream?query=level:ERROR&output_format=json"查询优化技巧
Section titled “查询优化技巧”1. 使用标签字段加速查询
Section titled “1. 使用标签字段加速查询”在索引配置中定义标签字段:
doc_mapping: field_mappings: - name: tenant_id type: u64 - name: service type: text tokenizer: raw tag_fields: [tenant_id, service]查询时 Quickwit 会自动过滤不相关的分片:
curl "http://localhost:7280/api/v1/logs/search?query=tenant_id:123+AND+error"2. 时间分片优化
Section titled “2. 时间分片优化”设置 timestamp_field 启用时间分片:
doc_mapping: timestamp_field: timestamp查询时使用时间范围可以大幅减少扫描的数据量:
curl "http://localhost:7280/api/v1/logs/search?query=error&start_timestamp=1710691200&end_timestamp=1710777600"3. 分区策略
Section titled “3. 分区策略”对于多租户场景,使用分区键:
doc_mapping: partition_key: tenant_id max_num_partitions: 200或组合分区:
doc_mapping: partition_key: tenant_id,service max_num_partitions: 500分布式搜索集群部署
Section titled “分布式搜索集群部署”AWS S3 分布式部署
Section titled “AWS S3 分布式部署”1. 准备工作
Section titled “1. 准备工作”# 设置 S3 路径export S3_PATH=s3://my-bucket/quickwit/indexes
# 配置 AWS 凭证export AWS_ACCESS_KEY_ID="your-access-key"export AWS_SECRET_ACCESS_KEY="your-secret-key"export AWS_REGION="us-east-1"2. 配置第一个节点
Section titled “2. 配置第一个节点”创建 config-node1.yaml:
version: 0.7node_id: searcher-1listen_address: 0.0.0.0rest: listen_port: 7280
metastore_uri: ${S3_PATH}default_index_root_uri: ${S3_PATH}启动节点:
./quickwit run --config config-node1.yaml3. 配置其他节点
Section titled “3. 配置其他节点”节点 2 配置 config-node2.yaml:
version: 0.7node_id: searcher-2listen_address: 0.0.0.0rest: listen_port: 7280
metastore_uri: ${S3_PATH}default_index_root_uri: ${S3_PATH}
# 连接到第一个节点peer_seeds: - <node1-ip>:7280节点 3 配置 config-node3.yaml:
version: 0.7node_id: searcher-3listen_address: 0.0.0.0rest: listen_port: 7280
metastore_uri: ${S3_PATH}default_index_root_uri: ${S3_PATH}
peer_seeds: - <node1-ip>:7280启动其他节点:
# 在节点 2 上./quickwit run --service searcher --config config-node2.yaml
# 在节点 3 上./quickwit run --service searcher --config config-node3.yaml4. 验证集群状态
Section titled “4. 验证集群状态”# 查看集群成员curl "http://localhost:7280/api/v1/cluster"Kubernetes 部署
Section titled “Kubernetes 部署”创建 quickwit-deployment.yaml:
apiVersion: v1kind: ConfigMapmetadata: name: quickwit-configdata: config.yaml: | version: 0.7 node_id: ${POD_NAME} listen_address: 0.0.0.0
metastore_uri: s3://my-bucket/quickwit/indexes default_index_root_uri: s3://my-bucket/quickwit/indexes
peer_seeds: - quickwit-0.quickwit-headless:7280 - quickwit-1.quickwit-headless:7280 - quickwit-2.quickwit-headless:7280
---apiVersion: v1kind: Servicemetadata: name: quickwit-headlessspec: clusterIP: None selector: app: quickwit ports: - name: rest port: 7280 - name: grpc port: 7281
---apiVersion: v1kind: Servicemetadata: name: quickwitspec: type: LoadBalancer selector: app: quickwit ports: - name: rest port: 7280 targetPort: 7280
---apiVersion: apps/v1kind: StatefulSetmetadata: name: quickwitspec: serviceName: quickwit-headless replicas: 3 selector: matchLabels: app: quickwit template: metadata: labels: app: quickwit spec: containers: - name: quickwit image: quickwit/quickwit:latest command: ["quickwit", "run", "--service", "searcher"] env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: name: aws-credentials key: access-key-id - name: AWS_SECRET_ACCESS_KEY valueFrom: secretKeyRef: name: aws-credentials key: secret-access-key ports: - containerPort: 7280 name: rest - containerPort: 7281 name: grpc volumeMounts: - name: config mountPath: /quickwit/config.yaml subPath: config.yaml resources: requests: memory: "2Gi" cpu: "1" limits: memory: "4Gi" cpu: "2" volumes: - name: config configMap: name: quickwit-config
---apiVersion: v1kind: Secretmetadata: name: aws-credentialstype: OpaquestringData: access-key-id: "your-access-key-id" secret-access-key: "your-secret-access-key"部署到 Kubernetes:
# 应用配置kubectl apply -f quickwit-deployment.yaml
# 查看 Pod 状态kubectl get pods -l app=quickwit
# 查看日志kubectl logs -f quickwit-0
# 访问服务kubectl port-forward svc/quickwit 7280:7280生产环境最佳实践
Section titled “生产环境最佳实践”1. 性能优化
Section titled “1. 性能优化”索引配置优化
Section titled “索引配置优化”version: 0.7index_id: production-logs
doc_mapping: field_mappings: - name: timestamp type: datetime fast: true fast_precision: seconds - name: level type: text tokenizer: raw fast: true # 启用快速字段用于聚合 - name: message type: text tokenizer: default record: position # 启用位置信息用于短语查询 - name: trace_id type: text tokenizer: raw indexed: true stored: true
# 标签字段用于分片剪枝 tag_fields: [service, environment, tenant_id]
# 时间字段用于时间分片 timestamp_field: timestamp
# 分区策略 partition_key: tenant_id max_num_partitions: 200
indexing_settings: # 提交超时 commit_timeout_secs: 60
# 分片大小(字节) split_num_docs_target: 10000000
# 合并策略 merge_policy: type: stable_log merge_factor: 10 max_merge_factor: 12
search_settings: default_search_fields: [message]节点配置优化
Section titled “节点配置优化”version: 0.7node_id: production-node
# 监听配置listen_address: 0.0.0.0rest: listen_port: 7280
# 存储配置metastore_uri: s3://production-bucket/quickwit/indexesdefault_index_root_uri: s3://production-bucket/quickwit/indexes
# 索引器配置indexer: # CPU 核心数 cpu_capacity: 8 # 最大并发分片数 max_concurrent_split_uploads: 4
# 搜索器配置searcher: # 快速字段缓存大小(字节) fast_field_cache_capacity: 10GB # 分片元数据缓存大小(字节) split_footer_cache_capacity: 1GB # 部分请求缓存大小(字节) partial_request_cache_capacity: 512MB # 最大并发流式搜索数 max_num_concurrent_split_streams: 100
# 存储配置storage: s3: region: us-east-1 # 限制并发请求数 max_concurrent_requests: 1002. 监控和可观测性
Section titled “2. 监控和可观测性”暴露 Prometheus 指标
Section titled “暴露 Prometheus 指标”Quickwit 自动在 /metrics 端点暴露 Prometheus 指标:
curl http://localhost:7280/metricsGrafana 集成
Section titled “Grafana 集成”创建 Prometheus 数据源配置:
apiVersion: 1datasources: - name: Quickwit-Metrics type: prometheus access: proxy url: http://quickwit:7280常用监控指标:
quickwit_indexing_throughput_bytes- 索引吞吐量quickwit_search_request_duration_seconds- 搜索延迟quickwit_split_count- 分片数量quickwit_cache_hit_rate- 缓存命中率
3. 安全配置
Section titled “3. 安全配置”启用 TLS
Section titled “启用 TLS”version: 0.7node_id: secure-node
rest: listen_port: 7280 tls: enabled: true cert_path: /path/to/cert.pem key_path: /path/to/key.pem使用反向代理(如 Nginx)实现认证:
upstream quickwit { server quickwit-1:7280; server quickwit-2:7280; server quickwit-3:7280;}
server { listen 443 ssl; server_name quickwit.example.com;
ssl_certificate /path/to/cert.pem; ssl_certificate_key /path/to/key.pem;
location / { auth_basic "Quickwit Access"; auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://quickwit; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; }}4. 备份和恢复
Section titled “4. 备份和恢复”备份索引配置
Section titled “备份索引配置”# 导出索引配置./quickwit index describe --index logs > logs-index-backup.yaml
# 备份到 Gitgit add logs-index-backup.yamlgit commit -m "Backup index configuration"由于数据存储在对象存储中,恢复非常简单:
# 1. 确保对象存储数据完整# 2. 重新创建索引(使用备份的配置)./quickwit index create --index-config logs-index-backup.yaml
# 3. 启动 Quickwit,自动从对象存储加载数据./quickwit run --config config.yaml5. 成本优化建议
Section titled “5. 成本优化建议”对象存储成本优化
Section titled “对象存储成本优化”# 使用生命周期策略归档旧数据# AWS S3 示例{ "Rules": [ { "Id": "ArchiveOldLogs", "Status": "Enabled", "Transitions": [ { "Days": 30, "StorageClass": "STANDARD_IA" }, { "Days": 90, "StorageClass": "GLACIER" } ] } ]}# 减少存储字段doc_mapping: field_mappings: - name: message type: text indexed: true stored: false # 不存储原始内容,仅索引
# 使用更激进的合并策略indexing_settings: merge_policy: type: stable_log merge_factor: 15 # 增加合并因子案例 1:日志分析系统
Section titled “案例 1:日志分析系统”完整的日志分析系统部署:
# 1. 创建索引cat > app-logs-config.yaml <<EOFversion: 0.7index_id: app-logsdoc_mapping: field_mappings: - name: timestamp type: datetime fast: true - name: level type: text tokenizer: raw - name: service type: text tokenizer: raw - name: message type: text tokenizer: default - name: trace_id type: text tokenizer: raw tag_fields: [service, level] timestamp_field: timestampsearch_settings: default_search_fields: [message]EOF
./quickwit index create --index-config app-logs-config.yaml
# 2. 使用 Filebeat 采集日志cat > filebeat.yml <<EOFfilebeat.inputs: - type: log paths: - /var/log/app/*.log json.keys_under_root: true
output.http: hosts: ["http://quickwit:7280"] index: "app-logs" path: "/api/v1/%{[index]}/ingest"EOF
# 3. 查询错误日志curl "http://localhost:7280/api/v1/app-logs/search?query=level:ERROR&start_timestamp=1710691200"案例 2:多租户 SaaS 日志
Section titled “案例 2:多租户 SaaS 日志”version: 0.7index_id: saas-logs
doc_mapping: field_mappings: - name: tenant_id type: u64 - name: timestamp type: datetime fast: true - name: event_type type: text tokenizer: raw - name: user_id type: text tokenizer: raw - name: data type: json # 灵活的 JSON 字段
tag_fields: [tenant_id, event_type] timestamp_field: timestamp partition_key: tenant_id max_num_partitions: 1000查询特定租户的数据:
curl "http://localhost:7280/api/v1/saas-logs/search?query=tenant_id:123+AND+event_type:purchase"Q1: 如何选择合适的分片大小?
Section titled “Q1: 如何选择合适的分片大小?”A: 默认 1000 万文档/分片是个好的起点。对于:
- 高查询频率:使用较小的分片(500 万文档)
- 高写入吞吐:使用较大的分片(2000 万文档)
Q2: 对象存储成本过高怎么办?
Section titled “Q2: 对象存储成本过高怎么办?”A:
- 启用对象存储生命周期策略
- 减少
stored字段 - 使用更激进的合并策略
- 定期删除旧索引
Q3: 搜索性能不佳如何优化?
Section titled “Q3: 搜索性能不佳如何优化?”A:
- 确保使用
tag_fields和timestamp_field - 增加搜索节点数量
- 调整缓存大小配置
- 使用更具体的查询条件
Q4: 如何实现高可用?
Section titled “Q4: 如何实现高可用?”A:
- 部署至少 3 个搜索节点
- 使用负载均衡器分发请求
- 对象存储本身提供高可用性
- 使用 Kubernetes 自动重启失败的 Pod
Quickwit 是一个强大的云原生搜索引擎,特别适合以下场景:
✅ 适用场景
- 大规模日志分析和搜索
- 分布式追踪数据存储
- 多租户 SaaS 应用日志
- 需要成本优化的可观测性方案
✅ 核心优势
- 原生对象存储支持,成本低
- 亚秒级搜索性能
- 水平扩展能力强
- 运维简单,无需管理存储集群
✅ 关键特性
- 全文搜索和聚合
- 时间和标签分片剪枝
- 分布式搜索集群
- 与 Grafana、Jaeger 等工具集成
通过本文的详细介绍,你应该能够:
- 快速部署 Quickwit 服务
- 配置对象存储(S3/OSS)集成
- 实现高效的全文搜索
- 构建生产级分布式搜索集群