You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
10 KiB
10 KiB
OceanBase Vector Database Integration Guide
Overview
This document provides a comprehensive guide to the integration of OceanBase vector database in Coze Studio, including architectural design, implementation details, configuration instructions, and usage guidelines.
Integration Background
Why Choose OceanBase?
- Transaction Support: OceanBase provides complete ACID transaction support, ensuring data consistency
- Simple Deployment: Compared to specialized vector databases like Milvus, OceanBase deployment is simpler
- MySQL Compatibility: Compatible with MySQL protocol, low learning curve
- Vector Extensions: Native support for vector data types and indexing
- Operations Friendly: Low operational costs, suitable for small to medium-scale applications
Comparison with Milvus
Feature | OceanBase | Milvus |
---|---|---|
Deployment Complexity | Low (Single Machine) | High (Requires etcd, MinIO) |
Transaction Support | Full ACID | Limited |
Vector Search Speed | Medium | Faster |
Storage Efficiency | Medium | Higher |
Operational Cost | Low | High |
Learning Curve | Gentle | Steep |
Architectural Design
Overall Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Coze Studio │ │ OceanBase │ │ Vector Store │
│ Application │───▶│ Client │───▶│ Manager │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ OceanBase │
│ Database │
└─────────────────┘
Core Components
1. OceanBase Client (backend/infra/impl/oceanbase/
)
Main Files:
oceanbase.go
- Delegation client, providing backward-compatible interfaceoceanbase_official.go
- Core implementation, based on official documentationtypes.go
- Type definitions
Core Functions:
type OceanBaseClient interface {
CreateCollection(ctx context.Context, collectionName string) error
InsertVectors(ctx context.Context, collectionName string, vectors []VectorResult) error
SearchVectors(ctx context.Context, collectionName string, queryVector []float64, topK int) ([]VectorResult, error)
DeleteVector(ctx context.Context, collectionName string, vectorID string) error
InitDatabase(ctx context.Context) error
DropCollection(ctx context.Context, collectionName string) error
}
2. Search Store Manager (backend/infra/impl/document/searchstore/oceanbase/
)
Main Files:
oceanbase_manager.go
- Manager implementationoceanbase_searchstore.go
- Search store implementationfactory.go
- Factory pattern creationconsts.go
- Constant definitionsconvert.go
- Data conversionregister.go
- Registration functions
Core Functions:
type Manager interface {
Create(ctx context.Context, collectionName string) (SearchStore, error)
Get(ctx context.Context, collectionName string) (SearchStore, error)
Delete(ctx context.Context, collectionName string) error
}
3. Application Layer Integration (backend/application/base/appinfra/
)
File: app_infra.go
Integration Point:
case "oceanbase":
// Build DSN
dsn := fmt.Sprintf("%s:%s@tcp(%s:%s)/%s?charset=utf8mb4&parseTime=True&loc=Local",
user, password, host, port, database)
// Create client
client, err := oceanbaseClient.NewOceanBaseClient(dsn)
// Initialize database
if err := client.InitDatabase(ctx); err != nil {
return nil, fmt.Errorf("init oceanbase database failed, err=%w", err)
}
Configuration Instructions
Environment Variable Configuration
Required Configuration
# Vector store type
VECTOR_STORE_TYPE=oceanbase
# OceanBase connection configuration
OCEANBASE_HOST=localhost
OCEANBASE_PORT=2881
OCEANBASE_USER=root
OCEANBASE_PASSWORD=coze123
OCEANBASE_DATABASE=test
Optional Configuration
# Performance optimization configuration
OCEANBASE_VECTOR_MEMORY_LIMIT_PERCENTAGE=30
OCEANBASE_BATCH_SIZE=100
OCEANBASE_MAX_OPEN_CONNS=100
OCEANBASE_MAX_IDLE_CONNS=10
# Cache configuration
OCEANBASE_ENABLE_CACHE=true
OCEANBASE_CACHE_TTL=300
# Monitoring configuration
OCEANBASE_ENABLE_METRICS=true
OCEANBASE_ENABLE_SLOW_QUERY_LOG=true
# Retry configuration
OCEANBASE_MAX_RETRIES=3
OCEANBASE_RETRY_DELAY=1
OCEANBASE_CONN_TIMEOUT=30
Docker Configuration
docker-compose-oceanbase.yml
oceanbase:
image: oceanbase/oceanbase-ce:latest
container_name: coze-oceanbase
environment:
MODE: SLIM
OB_DATAFILE_SIZE: 1G
OB_SYS_PASSWORD: ${OCEANBASE_PASSWORD:-coze123}
OB_TENANT_PASSWORD: ${OCEANBASE_PASSWORD:-coze123}
ports:
- '2881:2881'
volumes:
- ./data/oceanbase/ob:/root/ob
- ./data/oceanbase/cluster:/root/.obd/cluster
deploy:
resources:
limits:
memory: 4G
reservations:
memory: 2G
Usage Guide
1. Quick Start
# Clone the project
git clone https://github.com/coze-dev/coze-studio.git
cd coze-studio
# Setup OceanBase environment
make oceanbase_env
# Start OceanBase debug environment
make oceanbase_debug
2. Verify Deployment
# Check container status
docker ps | grep oceanbase
# Test connection
mysql -h localhost -P 2881 -u root -p -e "SELECT 1;"
# View databases
mysql -h localhost -P 2881 -u root -p -e "SHOW DATABASES;"
3. Create Knowledge Base
In the Coze Studio interface:
- Enter knowledge base management
- Select OceanBase as vector storage
- Upload documents for vectorization
- Test vector retrieval functionality
4. Performance Monitoring
# View container resource usage
docker stats coze-oceanbase
# View slow query logs
docker logs coze-oceanbase | grep "slow query"
# View connection count
mysql -h localhost -P 2881 -u root -p -e "SHOW PROCESSLIST;"
Integration Features
1. Design Principles
Architecture Compatibility Design
- Strictly follow Coze Studio core architectural design principles, ensuring seamless integration of OceanBase adaptation layer with existing systems
- Adopt delegation pattern (Delegation Pattern) to achieve backward compatibility, ensuring stability and consistency of existing interfaces
- Maintain complete compatibility with existing vector storage interfaces, ensuring smooth system migration and upgrade
Performance First
- Use HNSW index to achieve efficient approximate nearest neighbor search
- Batch operations reduce database interaction frequency
- Connection pool management optimizes resource usage
Easy Deployment
- Single machine deployment, no complex cluster configuration required
- Docker one-click deployment
- Environment variable configuration, flexible and easy to use
2. Technical Highlights
Delegation Pattern Design
type OceanBaseClient struct {
official *OceanBaseOfficialClient
}
func (c *OceanBaseClient) CreateCollection(ctx context.Context, collectionName string) error {
return c.official.CreateCollection(ctx, collectionName)
}
Intelligent Configuration Management
func DefaultConfig() *Config {
return &Config{
Host: getEnv("OCEANBASE_HOST", "localhost"),
Port: getEnvAsInt("OCEANBASE_PORT", 2881),
User: getEnv("OCEANBASE_USER", "root"),
Password: getEnv("OCEANBASE_PASSWORD", ""),
Database: getEnv("OCEANBASE_DATABASE", "test"),
// ... other configurations
}
}
Error Handling Optimization
func (c *OceanBaseOfficialClient) setVectorParameters() error {
params := map[string]string{
"ob_vector_memory_limit_percentage": "30",
"ob_query_timeout": "86400000000",
"max_allowed_packet": "1073741824",
}
for param, value := range params {
if err := c.db.Exec(fmt.Sprintf("SET GLOBAL %s = %s", param, value)).Error; err != nil {
log.Printf("Warning: Failed to set %s: %v", param, err)
}
}
return nil
}
Troubleshooting
1. Common Issues
Connection Issues
# Check container status
docker ps | grep oceanbase
# Check port mapping
docker port coze-oceanbase
# Test connection
mysql -h localhost -P 2881 -u root -p -e "SELECT 1;"
Vector Index Issues
-- Check index status
SHOW INDEX FROM test_vectors;
-- Rebuild index
DROP INDEX idx_test_embedding ON test_vectors;
CREATE VECTOR INDEX idx_test_embedding ON test_vectors(embedding)
WITH (distance=cosine, type=hnsw, lib=vsag, m=16, ef_construction=200, ef_search=64);
Performance Issues
-- Adjust memory limit
SET GLOBAL ob_vector_memory_limit_percentage = 50;
-- View slow queries
SHOW VARIABLES LIKE 'slow_query_log';
2. Log Analysis
# View OceanBase logs
docker logs coze-oceanbase
# View application logs
tail -f logs/coze-studio.log | grep -i "oceanbase\|vector"
Summary
The integration of OceanBase vector database in Coze Studio has achieved the following goals:
- Complete Functionality: Supports complete vector storage and retrieval functionality
- Good Performance: Achieves efficient vector search through HNSW indexing
- Simple Deployment: Single machine deployment, no complex configuration required
- Operations Friendly: Low operational costs, easy monitoring and management
- Strong Scalability: Supports horizontal and vertical scaling
Through this integration, Coze Studio provides users with a simple, efficient, and reliable vector database solution, particularly suitable for scenarios requiring transaction support, simple deployment, and low operational costs.