Google BigQuery 벡터 검색

Google Cloud BigQuery Vector Search는 GoogleSQL을 사용해 시맨틱 검색을 수행할 수 있도록 해주며, 벡터 인덱스를 사용한 빠른 근사 결과 또는 brute force를 통한 정확한 결과를 제공합니다.

이 튜토리얼은 LangChain에서 엔드 투 엔드 데이터 및 임베딩 관리 시스템을 사용하는 방법을 설명하고, BigQueryVectorStore 클래스를 사용하여 BigQuery에서 확장 가능한 시맨틱 검색을 제공합니다. 이 클래스는 Google Cloud에서 통합 데이터 저장소와 유연한 벡터 검색을 제공하는 2개 클래스 세트의 일부입니다:

BigQuery Vector Search: BigQueryVectorStore 클래스를 사용하며, 인프라 설정 없이 빠른 프로토타이핑과 배치 검색에 이상적입니다.
Feature Store Online Store: VertexFSVectorStore 클래스를 사용하며, 수동 또는 예약된 데이터 동기화로 저지연 검색을 지원합니다. 프로덕션 준비가 된 사용자 지향 GenAI 애플리케이션에 적합합니다.

시작하기

라이브러리 설치

pip install -qU  langchain langchain-google-vertexai "langchain-google-community[featurestore]"

이 Jupyter 런타임에서 새로 설치한 패키지를 사용하려면 런타임을 재시작해야 합니다. 아래 셀을 실행하여 현재 커널을 재시작할 수 있습니다.

import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

시작 전 준비

프로젝트 ID 설정

프로젝트 ID를 모르는 경우 다음을 시도하세요:

gcloud config list 실행
gcloud projects list 실행
지원 페이지 참고: 프로젝트 ID 찾기

PROJECT_ID = ""  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

리전 설정

BigQuery에서 사용하는 REGION 변수도 변경할 수 있습니다. BigQuery 리전에 대해 자세히 알아보세요.

REGION = "us-central1"  # @param {type: "string"}

Dataset 및 Table 이름 설정

이 값들은 BigQuery Vector Store가 됩니다.

DATASET = "my_langchain_dataset"  # @param {type: "string"}
TABLE = "doc_and_vectors"  # @param {type: "string"}

노트북 환경 인증

이 노트북을 Colab에서 실행 중이라면, 아래 셀의 주석을 해제하고 계속 진행하세요.
Vertex AI Workbench를 사용하는 경우, 여기의 설정 안내를 확인하세요.

# from google.colab import auth as google_auth

# google_auth.authenticate_user()

데모: BigQueryVectorStore

임베딩 클래스 인스턴스 생성

프로젝트에서 Vertex AI API를 활성화해야 할 수도 있습니다: gcloud services enable aiplatform.googleapis.com --project {PROJECT_ID} ({PROJECT_ID}를 프로젝트 이름으로 바꾸세요) 어떤 LangChain embeddings 모델이든 사용할 수 있습니다.

from langchain_google_vertexai import VertexAIEmbeddings

embedding = VertexAIEmbeddings(
    model_name="textembedding-gecko@latest", project=PROJECT_ID
)

BigQueryVectorStore 초기화

BigQuery Dataset과 Table은 존재하지 않을 경우 자동으로 생성됩니다. 모든 선택적 매개변수는 클래스 정의에서 확인하세요.

from langchain_google_community import BigQueryVectorStore

store = BigQueryVectorStore(
    project_id=PROJECT_ID,
    dataset_name=DATASET,
    table_name=TABLE,
    location=REGION,
    embedding=embedding,
)

텍스트 추가

all_texts = ["Apples and oranges", "Cars and airplanes", "Pineapple", "Train", "Banana"]
metadatas = [{"len": len(t)} for t in all_texts]

store.add_texts(all_texts, metadatas=metadatas)

문서 검색

query = "I'd like a fruit."
docs = store.similarity_search(query)
print(docs)

벡터로 문서 검색

query_vector = embedding.embed_query(query)
docs = store.similarity_search_by_vector(query_vector, k=2)
print(docs)

메타데이터 필터로 문서 검색

vectorstore는 문서 검색 시 메타데이터 필드에 대한 필터를 적용하는 두 가지 방법을 지원합니다:

Dictionary 기반 필터
- 키가 메타데이터 필드, 값이 필터 조건을 나타내는 dictionary(dict)를 전달할 수 있습니다. 이 방식은 키와 해당 값 간의 동등성 필터를 적용합니다. 여러 키-값 쌍이 제공되면 논리 AND로 결합됩니다.
SQL 기반 필터
- 더 복잡한 필터링 조건이 필요한 경우, SQL WHERE 절을 나타내는 문자열을 제공할 수 있습니다. 비교 연산자 및 논리 연산자 등을 포함한 SQL 표현식을 사용할 수 있어 유연성이 높습니다. BigQuery 연산자에 대해 자세히 알아보세요.

# Dictionary-based Filters
# This should only return "Banana" document.
docs = store.similarity_search_by_vector(query_vector, filter={"len": 6})
print(docs)

# SQL-based Filters
# This should return "Banana", "Apples and oranges" and "Cars and airplanes" documents.
docs = store.similarity_search_by_vector(query_vector, filter="len = 6 AND len > 17")
print(docs)

배치 검색

BigQueryVectorStore는 확장 가능한 벡터 유사도 검색을 위한 batch_search 메서드를 제공합니다.

results = store.batch_search(
    embeddings=None,  # can pass embeddings or
    queries=["search_query", "search_query"],  # can pass queries
)

임베딩과 함께 텍스트 추가

add_texts_with_embeddings 메서드를 사용하여 직접 생성한 임베딩을 함께 저장할 수도 있습니다. 이는 임베딩 생성 전에 사용자 지정 전처리가 필요한 멀티모달 데이터에 특히 유용합니다.

items = ["some text"]
embs = embedding.embed(items)

ids = store.add_texts_with_embeddings(
    texts=["some text"], embs=embs, metadatas=[{"len": 1}]
)

Feature Store로 저지연 서빙

간단히 .to_vertex_fs_vector_store() 메서드를 사용하여 VertexFSVectorStore 객체를 얻을 수 있으며, 온라인 사용 사례를 위한 저지연 검색을 제공합니다. 필수 매개변수는 기존 BigQueryVectorStore 클래스에서 자동으로 전달됩니다. 사용할 수 있는 다른 모든 매개변수는 클래스 정의에서 확인하세요. 다시 BigQueryVectorStore로 되돌리는 것도 .to_bq_vector_store() 메서드로 동일하게 간단합니다.

store.to_vertex_fs_vector_store()  # pass optional VertexFSVectorStore parameters as arguments

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

시작하기

라이브러리 설치

시작 전 준비

프로젝트 ID 설정

리전 설정

Dataset 및 Table 이름 설정

노트북 환경 인증

데모: BigQueryVectorStore

임베딩 클래스 인스턴스 생성

BigQueryVectorStore 초기화

텍스트 추가

문서 검색

벡터로 문서 검색

메타데이터 필터로 문서 검색

배치 검색

임베딩과 함께 텍스트 추가

Feature Store로 저지연 서빙

Popular Providers

Integrations by component

​시작하기

​라이브러리 설치

​시작 전 준비

​프로젝트 ID 설정

​리전 설정

​Dataset 및 Table 이름 설정

​노트북 환경 인증

​데모: BigQueryVectorStore

​임베딩 클래스 인스턴스 생성

​BigQueryVectorStore 초기화

​텍스트 추가

​문서 검색

​벡터로 문서 검색

​메타데이터 필터로 문서 검색

​배치 검색

​임베딩과 함께 텍스트 추가

​Feature Store로 저지연 서빙

시작하기

라이브러리 설치

시작 전 준비

프로젝트 ID 설정

리전 설정

Dataset 및 Table 이름 설정

노트북 환경 인증

데모: BigQueryVectorStore

임베딩 클래스 인스턴스 생성

BigQueryVectorStore 초기화

텍스트 추가

문서 검색

벡터로 문서 검색

메타데이터 필터로 문서 검색

배치 검색

임베딩과 함께 텍스트 추가

Feature Store로 저지연 서빙