15 |
16 | ## How is continuous-eval different?
17 |
18 | - **Modularized Evaluation**: Measure each module in the pipeline with tailored metrics.
19 |
20 | - **Comprehensive Metric Library**: Covers Retrieval-Augmented Generation (RAG), Code Generation, Agent Tool Use, Classification and a variety of other LLM use cases. Mix and match Deterministic, Semantic and LLM-based metrics.
21 |
22 | ## Resources
23 |
24 | - **Relari Blog:** Useful articles on how to evaluate LLM applications [link](https://www.relari.ai/blog)
25 | - **Discord:** Join our community of LLM developers [Discord](https://discord.gg/GJnM8SRsHr)
26 | - **Reach out to founders:** [Email](mailto:founders@relari.ai) or [Schedule a chat](https://cal.com/pasquale/continuous-eval)
27 |
--------------------------------------------------------------------------------
/docs/src/components/ThemeSelect.astro:
--------------------------------------------------------------------------------
1 | ---
2 | import type { Props } from '@astrojs/starlight/props';
3 | import Default from '@astrojs/starlight/components/ThemeSelect.astro';
4 | import Select from '@astrojs/starlight/components/Select.astro';
5 | import { defaultVersion, versions } from '../content/config.ts';
6 |
7 | const url = new URL(Astro.url);
8 | const currentVersion = /^\/v\d\.\d+/.test(url.pathname)
9 | ? url.pathname.split('/')[1]
10 | : defaultVersion;
11 | console.log(currentVersion);
12 | const options = versions.map(([version, label]) => ({
13 | label,
14 | selected: currentVersion === version,
15 | value: `/${version}`,
16 | }));
17 | ---
18 |
19 |
28 |
29 |
30 | | Match Type | 51 |Component | 52 |Retrieved Component Considered relevant if: | 53 |
|---|---|---|
ExactChunkMatch() |
58 | Chunk | 59 |Exact match to a Ground Truth Context Chunk. | 60 |
ExactSentenceMatch() |
63 | Sentence | 64 |Exact match to a Ground Truth Context Sentence. | 65 |
RoughChunkMatch() |
68 | Chunk | 69 |Match to a Ground Truth Context Chunk with ROUGE-L Recall > ROUGE_CHUNK_MATCH_THRESHOLD (default 0.7). |
70 |
RougeSentenceMatch() |
73 | Sentence | 74 |Match to a Ground Truth Context Sentence with ROUGE-L Recall > ROUGE_CHUNK_SENTENCE_THRESHOLD (default 0.8). |
75 |