Polaris AI DataInsight는 다양한 파일 형식에서 문서 요소(텍스트, 이미지, 복잡한 표, 차트 등)를 구조화된 JSON으로 추출하여 RAG 시스템에 쉽게 통합할 수 있도록 하는 문서 파서입니다.
Installation
langchain-polaris-ai-datainsight package를 설치합니다.
Copy
pip install langchain-polaris-ai-datainsight
Environment Setup
다음 환경 변수를 설정해야 합니다:POLARIS_AI_DATA_INSIGHT_API_KEY: Polaris AI DataInsight API key입니다. API key를 얻으려면 Polaris AI DataInsight Documentation을 참조하세요.
Usage
Copy
import getpass
import os
os.environ["POLARIS_AI_DATA_INSIGHT_API_KEY"] = getpass.getpass(
"Enter your PolarisAIDataInsight API key: "
)
Copy
from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader
loader = PolarisAIDataInsightLoader(
file_path="example_data/polaris_ai_example.docx",
resources_dir="example_data/tmp",
mode="page", # "element", "page", or "single". (default is "single")
)
docs = loader.load() # or loader.lazy_load()
for doc in docs[:3]:
print(" --------- < Page Content > --------- ")
print(doc.page_content)
print(" --------- < Metadata > --------- ")
print(doc.metadata)
print("\n")
Copy
--------- < Page Content > ---------
2025 Seed Program Application
I. Funding Information by Track
1. Beginning and Advanced Track Comparison Overview
<table><tbody><tr><td>Category</td><td>Beginning Track*</td><td>Advanced Track*</td></tr><tr><td>Funding target</td><td>A university located outside Korea that has a Central Grant Management Department, an existing Korean Studies infrastructure, and plans to establish an education foundation.</td><td>A non-Korean university with a Central Grant Management Department, at least one full-time Korean Studies faculty member, an undergraduate Korean Studies major or department, and commitment to supporting Korean Studies.</td></tr><tr><td>Funding period</td><td>3 years</td><td>5 years<3+2years></td></tr><tr><td>Funding size</td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 200 million</td></tr><tr><td>B</td><td>Up to KRW 50 million</td></tr></tbody></table></td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 150 million</td></tr><tr><td>B</td><td>Up to KRW 90 million</td></tr></tbody></table></td></tr><tr><td>Required project content</td><td>· Fund 2 or more scholarship students<br>· Offer 1 or more regular Korean Studies lecture courses (Excluding Korean language courses)<br>· Hold 1 or more workshops per year in which that students may participate</td><td>· Hire 1 or more Korean Studies full-time faculty<br>· Fund 1 or more scholarship student for Korean Studies<br>· Offer 2 or more regular graduate-level Korean Studies lecture courses (Excluding Korean language courses)<br>· Hold 1 or more international Korean Studies conference<br>· Establish and manage a website, blog, or social media relating to the program </td></tr><tr><td>Recommended content</td><td>· Foster talent (education)<br>· Establish a Korean Studies research institute/center<br>· Establish Korean Studies undergraduate department/major & program<br>· Develop Korean Studies textbooks<br>· Hold academic activities</td><td>· Foster talent (education)<br>· Establish a Korean Studies research institute/center<br>· Establish Korean Studies M.A/Ph.D. department/major & program<br>· Develop Korean Studies textbooks<br>· Hold academic activities</td></tr></tbody></table>
<img id="di.image.im12" data-category="image"/>
2 / 3
--------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te0': {'id': 'di.text.te0', 'type': 'text'}, 'di.text.te2': {'id': 'di.text.te2', 'type': 'text'}, 'di.table.ta9': {'id': 'di.table.ta9', 'type': 'table'}, 'di.image.im12': {'id': 'di.image.im12', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image12.png'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}
--------- < Page Content > ---------
2025 Seed Program Application
II. Review and Selection
1. Review Process
<img id="di.image.im13" data-category="image"/>
Review of whether the basic requirements for application have been met
Review of the Project Proposal
Admistered by the Expert Review Team
Final review and decision
Admistered by the Comprehensive Review Committee
1. Preliminary Review
2. Content Review (80 pts)
3. Comprehensive Review (20 pts)
2. Review Stages and Content
Stage 1: Preliminary Review
Conducted by Main Department
● Verifies document submission, eligibility, and overlapping support.
● Applications missing required documents, signatures, or failing to meet eligibility do not proceed.
● Applications with Indirect Expenses over 10% of Direct Expenses (including Labor Expenses) are rejected.
Stage 2: Content Review
Conducted by Expert Review Team
● Online review: Points given individually
● Panel review: Points determined by consensus
● Assesses leadership potential, capacity, and project plans.
● Items and scores assigned for evaluation.
<table><tbody><tr><td>Areas</td><td>Items (Points)</td><td>Content</td></tr></tbody></table>
2 / 3
--------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te10': {'id': 'di.text.te10', 'type': 'text'}, 'di.text.te12': {'id': 'di.text.te12', 'type': 'text'}, 'di.image.im13': {'id': 'di.image.im13', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image13.png'}, 'di.text.sh15': {'id': 'di.text.sh15', 'type': 'text'}, 'di.text.sh16': {'id': 'di.text.sh16', 'type': 'text'}, 'di.text.sh16te0': {'id': 'di.text.sh16te0', 'type': 'text'}, 'di.text.sh17': {'id': 'di.text.sh17', 'type': 'text'}, 'di.text.sh18': {'id': 'di.text.sh18', 'type': 'text'}, 'di.text.sh19': {'id': 'di.text.sh19', 'type': 'text'}, 'di.text.sh19te0': {'id': 'di.text.sh19te0', 'type': 'text'}, 'di.text.sh19te1': {'id': 'di.text.sh19te1', 'type': 'text'}, 'di.text.sh20': {'id': 'di.text.sh20', 'type': 'text'}, 'di.text.sh21': {'id': 'di.text.sh21', 'type': 'text'}, 'di.text.sh22': {'id': 'di.text.sh22', 'type': 'text'}, 'di.text.sh22te0': {'id': 'di.text.sh22te0', 'type': 'text'}, 'di.text.sh22te1': {'id': 'di.text.sh22te1', 'type': 'text'}, 'di.text.sh23': {'id': 'di.text.sh23', 'type': 'text'}, 'di.text.sh23te0': {'id': 'di.text.sh23te0', 'type': 'text'}, 'di.text.sh24': {'id': 'di.text.sh24', 'type': 'text'}, 'di.text.sh24te0': {'id': 'di.text.sh24te0', 'type': 'text'}, 'di.text.sh25': {'id': 'di.text.sh25', 'type': 'text'}, 'di.text.sh25te0': {'id': 'di.text.sh25te0', 'type': 'text'}, 'di.text.te15': {'id': 'di.text.te15', 'type': 'text'}, 'di.text.te16': {'id': 'di.text.te16', 'type': 'text'}, 'di.text.te17': {'id': 'di.text.te17', 'type': 'text'}, 'di.text.te18': {'id': 'di.text.te18', 'type': 'text'}, 'di.text.te19': {'id': 'di.text.te19', 'type': 'text'}, 'di.text.te20': {'id': 'di.text.te20', 'type': 'text'}, 'di.text.te21': {'id': 'di.text.te21', 'type': 'text'}, 'di.text.te22': {'id': 'di.text.te22', 'type': 'text'}, 'di.text.te23': {'id': 'di.text.te23', 'type': 'text'}, 'di.text.te24': {'id': 'di.text.te24', 'type': 'text'}, 'di.text.te25': {'id': 'di.text.te25', 'type': 'text'}, 'di.text.te26': {'id': 'di.text.te26', 'type': 'text'}, 'di.table.ta26': {'id': 'di.table.ta26', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}
--------- < Page Content > ---------
2025 Seed Program Application
<table><tbody><tr><td rowspan="3">Evaluation of the Basis for the Project (40)</td><td>Potential to lead Korean Studies (20)</td><td>- Assess whether the university has a distinguished reputation in terms of history and academic disciplines.<br>- Evaluate the strength of the network between the Project Director and local researchers.</td></tr><tr><td>Performance capacity (20)<br>Eligibility criteria (10)</td><td>- Determine if the project director possesses the skills and commitment to execute the project (e.g., Korean language proficiency, influence within the institution, management skills).<br>- Review the achievements of collaborative researchers in Korean Studies.<br>- Confirm whether personnel (Beginning/Advanced) or coursework (Advanced) meet eligibility criteria.</td></tr><tr><td>University support (10)</td><td>- Measure the institution's willingness to support Korean Studies (financial, spatial, and human resources, appropriate indirect expense ratio).<br>- Assess the competency of the Central Grant Management Department.</td></tr><tr><td rowspan="2">Evaluation of the Project Content (40)</td><td>Project plans (30)</td><td>- Ensure that the project objectives are realistic and well-defined.<br>- Verify that the plan aligns with local conditions.<br>- Review the suitability of the Project Team’s structure.<br>- Assess whether the budget plan reflects local price levels.</td></tr></tbody></table>
2 / 3
--------- < Metadata > ---------
{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.table.ta29': {'id': 'di.table.ta29', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.