학습일지/AI
Advanced RAG with Llama 3 in Langchain | Chat with PDF using Free Embeddings, Reranker & LlamaParse
inspirit941
2024. 5. 21. 18:12
반응형
https://youtu.be/HkG06wBbTPM?si=-UFRBpyWJ_tZMohJ
RAG Architecture
Knowledge Base
- pdf text를 받아서 parse -> chunk 단위로 분할
- embedding vector로 변환 -> vector DB에 저장
- user query와 유사한 document를 찾는다
Reranker (ranker)
- Reranker로 pairwise ranking, filter out irrelevant docs, sort
LLM with Custom Prompt
- with prompt engineering
실습코드 관련 정보
테스트에 사용할 pdf
- text, table가 혼합된 형태.
https://docs.llamaindex.ai/en/stable/module_guides/loading/connector/llama_parse/
- langchain에서도 다양한 형태의 pdf parser를 지원하지만, Complex PDF 파싱에서 성능이 좋은 건 LlamaParse였다. 이 실습에서는 LlamaParse 사용할 예정.
- https://news.hada.io/topic?id=13466
- https://cloud.llamaindex.ai 에서 api key를 받아야 한다. 지금은 무료
Parse Document with LlamaParse
## parse and extract text 하라는 prompt.
instruction = """The provided document is Meta First Quarter 2024 Results.
This form provides detailed financial information about the company's performance for a specific quarter.
It includes unaudited financial statements, management discussion and analysis, and other relevant disclosures required by the SEC.
It contains many tables.
Try to be precise while answering the questions"""
parser = LlamaParse(
api_key=userdata.get("LLAMA_PARSE"),
result_type="markdown", # markdown형태로 Response
parsing_instruction=instruction,
max_timeout=5000,
)
llama_parse_documents = await parser.aload_data("./data/meta-earnings.pdf") # async task
parsed_doc = llama_parse_documents[0]
Markdown(parsed_doc.text[:4096]) ## jupyter에서 Markdown 형태로 볼 수 있다.
document_path = Path("data/parsed_document.md")
with document_path.open("a") as f:
f.write(parsed_doc.text)
Vector Embedding
# markdown 변환한 데이터를 markdownloader로 로드
loader = UnstructuredMarkdownLoader(document_path)
loaded_documents = loader.load()
# splitter로 분리
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=128)
docs = text_splitter.split_documents(loaded_documents)
len(docs)
# english 특화된 embedding model 사용. 이 모델의 Multilingual 버전도 있다.
# 경량모델임 (약 220MB). 굉장히 빠르다.
embeddings = FastEmbedEmbeddings(model_name="BAAI/bge-base-en-v1.5")
# vector DB. 오픈소스이며, Cloud provided 버전도 있음.
qdrant = Qdrant.from_documents(
docs,
embeddings,
# location=":memory:",
path="./db", # 로컬에 저장한다.
collection_name="document_embeddings",
)
%%time
query = "What is the most important innovation from Meta?"
similar_docs = qdrant.similarity_search_with_score(query)
## wall tiem 470ms. 유사도 검색 속도가 빠름.
for doc, score in similar_docs:
print(f"text: {doc.page_content[:256]}\n")
print(f"score: {score}")
print("-" * 80)
print()
## score 정보도 확인 가능하다.
%%time
retriever = qdrant.as_retriever(search_kwargs={"k": 5})
retrieved_docs = retriever.invoke(query)
## langchain에 사용할 수 있는 retriever 인터페이스로 변환. wall time 570mx
Reranker
# Flashrank: 여러 rerank 알고리즘 제공하는 오픈소스.
compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
# 22MB로 매우 작고, 빠르다.
# base Retriever를 Wrapping.
%%time
reranked_docs = compression_retriever.invoke(query)
len(reranked_docs) # return 3
# wall time: 3.28s
for doc in reranked_docs:
print(f"id: {doc.metadata['_id']}\n")
print(f"text: {doc.page_content[:256]}\n")
print(f"score: {doc.metadata['relevance_score']}")
print("-" * 80)
print()
# rerank 결과도 마찬가지로 metadata, relevance score를 제공한다.
Q&A over Documents
## Groq에서 제공하는 llama 모델 사용
## prompt로 hallucination 관련 명령어도 추가. 없는 것보단 낫다.
llm = ChatGroq(temperature=0, model_name="llama3-70b-8192")
prompt_template = """
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: {context}
Question: {question}
Answer the question and provide additional helpful information,
based on the pieces of information, if applicable. Be succinct.
Responses should be properly formatted to be easily read.
"""
prompt = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=compression_retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": prompt, "verbose": True},
)
%%time
response = qa.invoke("What is the most significant innovation from Meta?")
# return time 7s
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=compression_retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": prompt, "verbose": False},
)
response = qa.invoke("What is the revenue for 2024 and % change?")
print_response(response) # print response
Markdown(response["result"]) # returns Markdown.
반응형