# OceanBase Vector Database Integration Guide ## Overview This document provides a comprehensive guide to the integration of OceanBase vector database in Coze Studio, including architectural design, implementation details, configuration instructions, and usage guidelines. ## Integration Background ### Why Choose OceanBase? 1. **Transaction Support**: OceanBase provides complete ACID transaction support, ensuring data consistency 2. **Simple Deployment**: Compared to specialized vector databases like Milvus, OceanBase deployment is simpler 3. **MySQL Compatibility**: Compatible with MySQL protocol, low learning curve 4. **Vector Extensions**: Native support for vector data types and indexing 5. **Operations Friendly**: Low operational costs, suitable for small to medium-scale applications ### Comparison with Milvus | Feature | OceanBase | Milvus | | ------------------------------- | -------------------- | --------------------------- | | **Deployment Complexity** | Low (Single Machine) | High (Requires etcd, MinIO) | | **Transaction Support** | Full ACID | Limited | | **Vector Search Speed** | Medium | Faster | | **Storage Efficiency** | Medium | Higher | | **Operational Cost** | Low | High | | **Learning Curve** | Gentle | Steep | ## Architectural Design ### Overall Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Coze Studio │ │ OceanBase │ │ Vector Store │ │ Application │───▶│ Client │───▶│ Manager │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ OceanBase │ │ Database │ └─────────────────┘ ``` ### Core Components #### 1. OceanBase Client (`backend/infra/impl/oceanbase/`) **Main Files**: - `oceanbase.go` - Delegation client, providing backward-compatible interface - `oceanbase_official.go` - Core implementation, based on official documentation - `types.go` - Type definitions **Core Functions**: ```go type OceanBaseClient interface { CreateCollection(ctx context.Context, collectionName string) error InsertVectors(ctx context.Context, collectionName string, vectors []VectorResult) error SearchVectors(ctx context.Context, collectionName string, queryVector []float64, topK int) ([]VectorResult, error) DeleteVector(ctx context.Context, collectionName string, vectorID string) error InitDatabase(ctx context.Context) error DropCollection(ctx context.Context, collectionName string) error } ``` #### 2. Search Store Manager (`backend/infra/impl/document/searchstore/oceanbase/`) **Main Files**: - `oceanbase_manager.go` - Manager implementation - `oceanbase_searchstore.go` - Search store implementation - `factory.go` - Factory pattern creation - `consts.go` - Constant definitions - `convert.go` - Data conversion - `register.go` - Registration functions **Core Functions**: ```go type Manager interface { Create(ctx context.Context, collectionName string) (SearchStore, error) Get(ctx context.Context, collectionName string) (SearchStore, error) Delete(ctx context.Context, collectionName string) error } ``` #### 3. Application Layer Integration (`backend/application/base/appinfra/`) **File**: `app_infra.go` **Integration Point**: ```go case "oceanbase": // Build DSN dsn := fmt.Sprintf("%s:%s@tcp(%s:%s)/%s?charset=utf8mb4&parseTime=True&loc=Local", user, password, host, port, database) // Create client client, err := oceanbaseClient.NewOceanBaseClient(dsn) // Initialize database if err := client.InitDatabase(ctx); err != nil { return nil, fmt.Errorf("init oceanbase database failed, err=%w", err) } ``` ## Configuration Instructions ### Environment Variable Configuration #### Required Configuration ```bash # Vector store type VECTOR_STORE_TYPE=oceanbase # OceanBase connection configuration OCEANBASE_HOST=localhost OCEANBASE_PORT=2881 OCEANBASE_USER=root OCEANBASE_PASSWORD=coze123 OCEANBASE_DATABASE=test ``` #### Optional Configuration ```bash # Performance optimization configuration OCEANBASE_VECTOR_MEMORY_LIMIT_PERCENTAGE=30 OCEANBASE_BATCH_SIZE=100 OCEANBASE_MAX_OPEN_CONNS=100 OCEANBASE_MAX_IDLE_CONNS=10 # Cache configuration OCEANBASE_ENABLE_CACHE=true OCEANBASE_CACHE_TTL=300 # Monitoring configuration OCEANBASE_ENABLE_METRICS=true OCEANBASE_ENABLE_SLOW_QUERY_LOG=true # Retry configuration OCEANBASE_MAX_RETRIES=3 OCEANBASE_RETRY_DELAY=1 OCEANBASE_CONN_TIMEOUT=30 ``` ### Docker Configuration #### docker-compose-oceanbase.yml ```yaml oceanbase: image: oceanbase/oceanbase-ce:latest container_name: coze-oceanbase environment: MODE: SLIM OB_DATAFILE_SIZE: 1G OB_SYS_PASSWORD: ${OCEANBASE_PASSWORD:-coze123} OB_TENANT_PASSWORD: ${OCEANBASE_PASSWORD:-coze123} ports: - '2881:2881' volumes: - ./data/oceanbase/ob:/root/ob - ./data/oceanbase/cluster:/root/.obd/cluster deploy: resources: limits: memory: 4G reservations: memory: 2G ``` ## Usage Guide ### 1. Quick Start ```bash # Clone the project git clone https://github.com/coze-dev/coze-studio.git cd coze-studio # Setup OceanBase environment make oceanbase_env # Start OceanBase debug environment make oceanbase_debug ``` ### 2. Verify Deployment ```bash # Check container status docker ps | grep oceanbase # Test connection mysql -h localhost -P 2881 -u root -p -e "SELECT 1;" # View databases mysql -h localhost -P 2881 -u root -p -e "SHOW DATABASES;" ``` ### 3. Create Knowledge Base In the Coze Studio interface: 1. Enter knowledge base management 2. Select OceanBase as vector storage 3. Upload documents for vectorization 4. Test vector retrieval functionality ### 4. Performance Monitoring ```bash # View container resource usage docker stats coze-oceanbase # View slow query logs docker logs coze-oceanbase | grep "slow query" # View connection count mysql -h localhost -P 2881 -u root -p -e "SHOW PROCESSLIST;" ``` ## Helm Deployment Guide (Kubernetes) ### 1. Environment Preparation Ensure the following tools are installed: - Kubernetes cluster (recommended: k3s or kind) - Helm 3.x - kubectl ### 2. Install Dependencies #### Install cert-manager ```bash # Add cert-manager Helm repository helm repo add jetstack https://charts.jetstack.io helm repo update # Install cert-manager kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.2/cert-manager.yaml # Wait for cert-manager to be ready kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=cert-manager -n cert-manager --timeout=300s ``` #### Install ob-operator ```bash # Add ob-operator Helm repository helm repo add ob-operator https://oceanbase.github.io/ob-operator/ helm repo update # Install ob-operator helm install ob-operator ob-operator/ob-operator --set reporter=cozeAi --namespace=oceanbase-system --create-namespace # Wait for ob-operator to be ready kubectl wait --for=condition=ready pod -l control-plane=controller-manager -n oceanbase-system --timeout=300s ``` ### 3. Deploy OceanBase #### Using Integrated Helm Chart ```bash # Deploy complete Coze Studio application (including OceanBase) helm install coze-studio helm/charts/opencoze \ --set oceanbase.enabled=true \ --namespace coze-studio \ --create-namespace # Or deploy only OceanBase component helm install oceanbase-only helm/charts/opencoze \ --set oceanbase.enabled=true \ --set mysql.enabled=false \ --set redis.enabled=false \ --set minio.enabled=false \ --set elasticsearch.enabled=false \ --set milvus.enabled=false \ --set rocketmq.enabled=false \ --namespace oceanbase \ --create-namespace ``` #### Custom Configuration Create `oceanbase-values.yaml` file: ```yaml oceanbase: enabled: true port: 2881 targetPort: 2881 clusterName: 'cozeAi' clusterId: 1 image: repository: oceanbase/oceanbase-ce tag: 'latest' obAgentVersion: '4.2.2-100000042024011120' monitorEnabled: true storageClass: '' observerConfig: resource: cpu: 2 memory: 8Gi storages: dataStorage: 10G redoLogStorage: 5G logStorage: 5G monitorResource: cpu: 100m memory: 256Mi generateUserSecrets: true userSecrets: root: 'coze123' monitor: 'coze123' operator: 'coze123' proxyro: 'coze123' topology: - zone: zone1 replica: 1 parameters: - name: system_memory value: '4G' - name: '__min_full_resource_pool_memory' value: '4294967296' annotations: {} backupVolumeEnabled: false ``` Deploy with custom configuration: ```bash helm install oceanbase-custom helm/charts/opencoze \ -f oceanbase-values.yaml \ --namespace oceanbase \ --create-namespace ``` ### 4. Verify Deployment ```bash # Check OBCluster status kubectl get obcluster -n oceanbase # Check OceanBase pods kubectl get pods -n oceanbase # Check services kubectl get svc -n oceanbase # View detailed status kubectl describe obcluster -n oceanbase ``` ### 5. Connection Testing #### Port Forwarding ```bash # Forward OceanBase port kubectl port-forward svc/oceanbase-service -n oceanbase 2881:2881 ``` #### Using obclient Connection ```bash # Connect within cluster kubectl exec -it deployment/oceanbase-obcluster-zone1 -n oceanbase -- obclient -h127.0.0.1 -P2881 -uroot@test -pcoze123 -Dtest # Connect from external (requires port forwarding) obclient -h127.0.0.1 -P2881 -uroot@test -pcoze123 -Dtest ``` #### Using MySQL Client Connection ```bash # Using MySQL client mysql -h127.0.0.1 -P2881 -uroot@test -pcoze123 -Dtest ``` ### 6. Monitoring and Management #### View Logs ```bash # View OceanBase logs kubectl logs -f deployment/oceanbase-obcluster-zone1 -n oceanbase # View ob-operator logs kubectl logs -f deployment/oceanbase-controller-manager -n oceanbase-system ``` #### Scaling ```bash # Scale replica count kubectl patch obcluster oceanbase-obcluster -n oceanbase --type='merge' -p='{"spec":{"topology":[{"zone":"zone1","replica":2}]}}' # Adjust resource configuration kubectl patch obcluster oceanbase-obcluster -n oceanbase --type='merge' -p='{"spec":{"observer":{"resource":{"cpu":4,"memory":"16Gi"}}}}' ``` #### Backup and Recovery ```bash # Create backup kubectl apply -f - <