Google Cloud Storage는 비구조화된 데이터를 저장하기 위한 관리형 서비스입니다.
이 문서는 Google Cloud Storage (GCS) directory (bucket)에서 document object를 로드하는 방법을 다룹니다.
pip install -qU  langchain-google-community[gcs]
from langchain_google_community import GCSDirectoryLoader
loader = GCSDirectoryLoader(project_name="aist", bucket="testing-hwc")
loader.load()
/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpz37njh7u/fake.docx'}, lookup_index=0)]

prefix 지정하기

더 세밀하게 로드할 파일을 제어하기 위해 prefix를 지정할 수도 있습니다 -특정 폴더의 모든 파일을 로드하는 것을 포함하여-.
loader = GCSDirectoryLoader(project_name="aist", bucket="testing-hwc", prefix="fake")
loader.load()
/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpylg6291i/fake.docx'}, lookup_index=0)]

단일 파일 로드 실패 시 계속 진행하기

GCS bucket의 파일들은 처리 중에 오류를 발생시킬 수 있습니다. continue_on_failure=True 인자를 활성화하면 자동으로 실패를 처리할 수 있습니다. 이는 단일 파일 처리 실패가 함수를 중단시키지 않고 대신 warning을 로그에 기록한다는 것을 의미합니다.
loader = GCSDirectoryLoader(
    project_name="aist", bucket="testing-hwc", continue_on_failure=True
)
loader.load()

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.
I