示例
🚧 文档正在建设中 🚧
以下是一些用于检查和检验不同链的示例。
📄️ 字符串评估器
字符串评估器是LangChain中的一个组件,设计用于通过将其生成的输出(预测)与参考字符串或输入进行比较,来评估语言模型的性能。这种比较是评估语言模型的关键步骤,提供了生成文本的准确性或质量的度量。
📄️ 示例
🚧 文档正在建设中 🚧
📄️ Agent Benchmarking: Search + Calculator
Here we go over how to benchmark performance of an agent on tasks where it has access to a calculator and a search tool.
📄️ Agent VectorDB Question Answering Benchmarking
Here we go over how to benchmark performance on a question answering task using an agent to route between multiple vectordatabases.
📄️ 基准模板
这是一个示例笔记本,可用于为您选择的任务创建基准笔记本。评估非常困难,因此我们非常欢迎任何可以使人们更容易进行实验的贡献
📄️ index
---
📄️ Data Augmented Question Answering
This notebook uses some generic prompts/language models to evaluate an question answering system that uses other sources of data besides what is in the model. For example, this can be used to evaluate a question answering system over your proprietary data.
📄️ Generic Agent Evaluation
Good evaluation is key for quickly iterating on your agent's prompts and tools. Here we provide an example of how to use the TrajectoryEvalChain to evaluate your agent.
📄️ 使用Hugging Face Datasets
这个示例展示了如何使用Hugging Face数据集来评估模型。具体来说,我们展示了如何加载示例以评估来自Hugging Face数据集包的模型。
📄️ LLM数学
评估会做数学的链。
📄️ Evaluating an OpenAPI Chain
This notebook goes over ways to semantically evaluate an OpenAPI Chain, which calls an endpoint defined by the OpenAPI specification using purely natural language.
📄️ 问题回答基准测试:Paul Graham Essay
在这里,我们将介绍如何在Paul Graham的文章上对问题回答任务的性能进行基准测试。
📄️ 问题回答基准测试: 国情咨文
在这里,我们将介绍如何对国情咨文上的问题回答任务进行性能基准测试。
📄️ QA生成
本笔记本展示了如何使用QAGenerationChain来生成特定文档的问题-答案对。
📄️ Question Answering
This notebook covers how to evaluate generic question answering problems. This is a situation where you have an example containing a question and its corresponding ground truth answer, and you want to measure how well the language model does at answering those questions.
📄️ SQL 问题回答基准测试:Chinook
在这里,我们将介绍如何对 SQL 数据库上的问题回答任务进行性能基准测试。