Azure AI Data

Azure AI Foundry (이전 Azure AI Studio는 클라우드 스토리지에 데이터 자산을 업로드하고 다음 소스에서 기존 데이터 자산을 등록하는 기능을 제공합니다:

Microsoft OneLake

Azure Blob Storage

Azure Data Lake gen 2

이 접근 방식은 AzureBlobStorageContainerLoader 및 AzureBlobStorageFileLoader에 비해 클라우드 스토리지에 대한 인증이 원활하게 처리된다는 장점이 있습니다. 데이터에 대한 identity-based 데이터 액세스 제어 또는 credential-based (예: SAS token, account key) 방식을 사용할 수 있습니다. credential-based 데이터 액세스의 경우 코드에 비밀을 지정하거나 key vault를 설정할 필요가 없으며 시스템이 자동으로 처리합니다. 이 노트북은 AI Studio의 데이터 자산에서 document 객체를 로드하는 방법을 다룹니다.

pip install -qU azureml-fsspec azure-ai-generative

from azure.ai.resources.client import AIClient
from azure.identity import DefaultAzureCredential
from langchain_community.document_loaders import AzureAIDataLoader

# Create a connection to your project
client = AIClient(
    credential=DefaultAzureCredential(),
    subscription_id="<subscription_id>",
    resource_group_name="<resource_group_name>",
    project_name="<project_name>",
)

# get the latest version of your data asset
data_asset = client.data.get(name="<data_asset_name>", label="latest")

# load the data asset
loader = AzureAIDataLoader(url=data_asset.path)

loader.load()

[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpaa9xl6ch/fake.docx'}, lookup_index=0)]

glob pattern 지정하기

로드할 파일을 보다 세밀하게 제어하기 위해 glob pattern을 지정할 수도 있습니다. 아래 예제에서는 pdf 확장자를 가진 파일만 로드됩니다.

loader = AzureAIDataLoader(url=data_asset.path, glob="*.pdf")

loader.load()

[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpujbkzf_l/fake.docx'}, lookup_index=0)]

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

glob pattern 지정하기

Popular Providers

Integrations by component

​glob pattern 지정하기

glob pattern 지정하기