Knowledge Base

1.Knowledge Management

Each bot comes with a exclusive knowledge base by default and can also be linked to multiple public knowledge bases from the workspace.
The configuration logic for knowledge within dedicated and public knowledge bases is completely identical.

1.1 Knowledge List

Supports importing three types of knowledge: Files, Q&A, and Text blocks. On the knowledge list page, you can view detailed information related to the knowledge, including: segment *character count, file size, updated by * time, status, audit status, and actions. The bottom-right corner allows for page navigation and setting the display number of knowledge items per page."

By adding new documents, hierarchical categorization can be achieved, enabling structured management of knowledge content.

Add Category: Enter category name, description (up to 150 words), and reference questions (up to 1000 entries, supporting batch import). A maximum of 10 levels of categorization is supported.
Reference Questions: Trigger reference questions for this node (can only be added after enabling the Tree Retrieval function in the knowledge base settings), used for precise retrieval when using Tree Retrieval.

Click the clock icon next to the “knowledge list” to view the operation history and perform a restore operation.

1.1.1 File

The File includes three types: files, compressed packages, and webpage. You can choose the corresponding import mode based on the existing knowledge.

Files support uploading and importing PDF, DOC, DOCX, TXT, HTML, PPT, PPTX, MD, XLS, XLSX, CSV, PNG, JPEG, and JPG formats. A maximum of 100 files can be imported, and the size of a single file must not exceed 20MB. If manual segmentation is required, you can download the template in advance (manual segmentation only supports .TXT format). After upload and parsing, the document, spreadsheet, and image will be previewed in segments before the final import."

Document Type Segmentation Preview
- Smart Segmentation: Uses an algorithm to automatically detect and parse webpage content, achieving the optimal segmentation effect. The segment length can be specified, with the recommended length being 300-600 characters. The default length is 300 characters.
- Fixed-length Segmentation: After uploading the file, it will be segmented based on the default length of 500 characters.
- Manual Segmentation: Before uploading the file, manual delimiters “-%@#&*&#@%-” can be added for segmentation. This option only supports .TXT format files, and a template can be downloaded.
Table Type Segmentation Preview
- Smart Table Content Segmentation: Intelligently segments based on table content, suitable for non-standardized tables. Only supports .xlsx and .xls formats. The default segment length is 5000 characters.
- Segment by Row: Segments each row into a separate knowledge block using the "header + content" format.
Image Type Segmentation Preview
- Image OCR Recognition: OCR (Optical Character Recognition) technology is used to recognize the text information in images, assisting with image Retrieval.

Compressed packages can be extracted and uploaded online, but only one zip file is supported for upload, with a size limit of 1GB. The size of a single file cannot exceed 20MB, and files larger than this cannot be uploaded after extraction. The total folder hierarchy should not exceed 10 levels, and nested compression is not supported. The supported file formats within the compressed package are consistent with those of the file type.

Select and segement the files based on their types within the compressed package before importing

Webpages support URLs starting with http or https, and can be imported via two methods: individual URLs or sitemap URLs. By entering a page address, the system can automatically crawl and extract the text content for import. Enabling the "deep parsing" option allows the system to parse subpages and retrieve their content as well.

Individual URLs: You can add up to 10 website addresses at a time, separated by line breaks. Enabling the deep parsing option allows the system to parse subpages and retrieve their content.
Sitemap URLs: Allows retrieval of content from up to the first 20 pages listed in the website's sitemap.

👲🏼e.g: Below are the upload and segmentation preview interfaces when deep parsing is disabled.

👨🏼Upload and Preview Interfaces (Deep Parsing Enabled)

1.1.2 Q&A

Supports three modes for importing question-answer (Q&A) pairs: Batch import , manual entry, and generate from file. In the Q&A gneration confirmation panel, users can choose to select, unselect, or delete generated Q&A pairs. For individual Q&A items that have already been published, the following operations are supported: edit, delete, move category, associate with files, and export.

Batch import

Batch import supports files in CSV, XLS, and XLSX formats, with up to 3,000 Q&A pairs allowed per file. The header row must include two columns: "question" and "answer".

By using batch import, you can see multiple questions imported in one batch.

Manual Entry

Manual import requires manually entering both the question and the answer, with the option to add image and file attachments as supplementary answers

Image Attachments: If the knowledge content is highlighted and referenced during a Q&A interaction, the image(s) will be displayed to the user after the response is generated. A maximum of 9 images can be uploaded, each no larger than 20MB. Supported formats include JPEG, PNG, JPG, and GIF.
File Attachments: If the knowledge content is highlighted and referenced during a Q&A interaction, the system will generate a response and also send the attached file(s) to the user. You can upload up to 9 files or images, with each file/image no larger than 20MB. Supported formats include: PDF, DOC, DOCX, TXT, HTML, PPT, PPTX, MD, XLS, XLSX, CSV, PNG, JPEG, JPG, GIF, and MP3.

Example:

Imported Question: What’s your favorite animal?
Answer: Rabbit
mported Files: An image of a rabbit and a document containing rabbit-related knowledge

Conversation:

User: What’s your favorite animal?

Response: I like rabbits. Rabbits are very cute animals with soft fur, long ears, and red eyes. Just like the one in the picture—doesn’t it look gentle and adorable? One of the reasons rabbits are so lovable is their calm nature. The way they nibble on carrots is especially amusing. Don’t you think rabbits are cute too?

[Rabbit Image]

[rabbit_knowledge.doc]

Generate from File: Uses a large language model to automatically parse the imported file and generate relevant questions and answers based on its content. Generated Q&A pairs can be reviewed in the Q&A confirmation panel.

You can perform batch operations on the generated Q&A pairs, including select, unselect, and delete. The associated file indicates the source file from which the Q&A was generated.

1.1.3 Text Blocks

Text blocks supports two import methods: batch import and manual import. A template can be downloaded, and the header field is required to be "content"

Batch Import: Supports CSV, XLS, and XLSX file formats, with up to 3,000 text blocks per file. The header must contain only one column.

Manual Import: Allows input of up to 50,000 characters of text. Image and file attachments can be optionally added as incremental answer, following the same rules as manual import in the Q&A module.

1.2 Retrieval Testing

Enter a Retrieval Term(up to 500 characters). Based on the given test parameters, the system will display the top 7 Q&A pairs ranked by relevance. The blue-highlighted number in the search results represents the relevance score between the answer and the retrieval term. Q&A pairs with scores below the relevance threshold will be excluded. When the threshold is met, the system will reference the maximum number of relevant knowledge entries to generate a response using the LLM.

1.3 Knowledge settings

Knowledge base settings include three categories: Retrieval settings, Knowledge Comparison Settings, and Audit settings. These settings offer flexible and diverse configurations to help you accurately access the knowledge you need.

1.3.1 Retrieval settings

1.3.1.1 Regular retrieval

In the regular retrieval mode, we first ensure that the recalled knowledge meets the relevance threshold. Then, it performs an inverted ranking of the recalled items and selects the Top N most relevant knowledge items based on the maximum number of references allowed. This approach ensures that the content you receive is not only highly relevant to your query but also of high reference value, helping you quickly access authoritative and practical knowledge.

Retrieval Ratio: Allows adjustment of the balance between semantic retrieval and keyword retrieval via a slider. The default setting is 50% for each.

Relevance Threshold: Sets the minimum relevance score (ranging from 0 to 1) required for knowledge to be considered in retrieval. The default value is 0.65.

Maximum Number of Referenced Knowledge: Defines the maximum number of knowledge entries that can be referenced in a response (0–10). The default is 3.

Re-ranking Model Selection: Offers two models — BAAI/bge-rerank-large and bairong-inc/bge-rerank-large-hsbc. The latter is better suited for financial domain applications.

Recall Knowledge Block Filtering: When enabled, irrelevant or less relevant knowledge blocks are filtered out based on a predefined strategy. It applies absolute and relative comparisons to exclude knowledge that, while above the relevance threshold, differs significantly from the most relevant entries.

1.3.1.2 Structured Retrieval

In structured retrieval mode, the system recalls only structured knowledge within specified folders, while still ensuring the relevance threshold is met. This approach allows for more precise filtering of relevant content, reduces information overload, and improves retrieval efficiency. When generating retrieval results, the system takes into account both the 'Maximum Reference Folder Count' and the Maximum Number of Referenced Knowledge per folder. This ensures that the results are not only accurate but also highly targeted, helping you quickly locate relevant structured knowledge and improve work efficiency.

Maximum Reference Folder Count: From all the recalled folders, take the Top N folders with the highest knowledge count and relevance
Maximum Reference Count per Folder: Take the Top N knowledge from the recalled folders

1.3.1.3 Tree Retrieval

Tree-structured retrieval mode ensures that you can quickly and accurately find the information you need. The system organizes knowledge into multiple hierarchical levels, presented in the form of folders—each level representing a specific knowledge domain, with the final answers or content stored in the leaf nodes. This structure allows you to navigate layer by layer, much like browsing through folders, to locate your topic of interest. When you submit a query, the system will guide you step by step through the relevant levels based on your question to ensure accurate intent recognition. If your question is not specific enough, the system will automatically perform intent clarification, asking follow-up questions until it can determine the final answer.

Reference Question Match Threshold:The minimum threshold for reference questions to be considered a match. Reference questions below this threshold will not be matched, similar to intent matching — it defines the similarity threshold between the user’s question and the corresponding knowledge block.
Leaf Node Relevance Threshold: For terminal leaf nodes, you can configure the relevance threshold for retrieval.
Maximum Knowledge References for Leaf Nodes: For terminal leaf nodes, you can configure the number of items to retrieve.

1.3.2 Knowledge Comparison Settings

The Knowledge comparison feature has a configurable prerequisite: you can choose to enable or disable automatic comparison. When enabled, files, Q&A pairs, and text blocks will be automatically compared upon upload. You can also initiate manual comparison tasks if needed. Knowledge comparison applies only to items of the same type (e.g., file-to-file, Q&A-to-Q&A). The knowledge comparison relevance is controlled by setting relevance thresholds separately for files, Q&A pairs, and text blocks.

File Relevance Threshold: Set the relevance threshold for files when comparing knowledge. Range: 0–1.
Q&A Relevance Threshold: Set the relevance threshold for Q&A when comparing knowledge. Range: 0–1.
Text Block Relevance Threshold: Set the relevance threshold for text blocks when comparing knowledge Range: 0–1.

👩🏼Example: Setting a lower relevance threshold for files, Q&A, or text blocks makes it easier to trigger automatic comparison; increasing the threshold makes automatic comparison less likely to occur.

1.3.3 Audit Settings

You can choose to enable or disable knowledge audit. When enabled, files, Q&A, and text blocks must pass audit before being imported as knowledge. Audit operations can be performed in the Audit Center. Only space owners and space administrators have permission to configure this setting.

1.4 Knowledge Comparison

This feature can only be used after enabling automatic comparison in Knowledge Base Settings – Knowledge Comparison Settings.

File Comparison

After enabling the automatic comparison settings, the system will automatically compare files with duplicate names that meet the threshold. You can also manually create a comparison task by clicking the orange-highlighted button in the upper right corner. This allows you to upload two files of the same type for comparison. Only files in the "Pending" or "Active" status are supported. In the "View Comparison" column, you can view the comparison results. After analyzing the results, you can perform operations on the corresponding files, including: No Processing, Delete Old File, Delete New File, Old File Renamed, or New File Renamed. Click the "Start Processing" button to execute the selected operation.

Automatic Comparison: After enabling the automatic comparison settings, comparison will be automatically performed on parsed files.
Manual Comparison Creation: After enabling the automatic comparison settings, you can select two files from categorized folders to perform a specific comparison task.

After comparison, you can click the "View" button under the View Comparison column to read the detailed comparison results. Differences between the two files are highlighted in different colors for easy identification.

You can click the "View Results" button under the Operations column to see the corresponding feedback after performing operations on files and Q&A items.

Q&A comparison

After enabling automatic comparison, text that exceed the similarity threshold will be automatically compared. Highlighted areas indicate differences. Gray blocks with no content represent missing parts in the comparison. You can select one version as the correct answer to be entered into the knowledge base.

Text Blocks comparison

After enabling automatic comparison, texts that exceed the text block similarity threshold will be compared. You can select a text to submit as the final answer.

1.5 Audit Center

The Audit Center is a functional module designed to centrally manage various content audit tasks, helping space owners and administrators efficiently handle the audit process for files, Q&A items, and text blocks. Users without audit permissions cannot access the Audit Center or the “Audit Settings” section in Knowledge Base Settings.The “Category” column supports search and filtering based on users’ personalized category structures, helping to identify the ownership of content pending review. You can also switch between “File Audit,” “Q&A Audit,” and “Text Block Audit” tabs to view specific types of content. The list view displays detailed information and allows filtering by status such as “All” or “Pending Audit.” “Batch Operation” improve efficiency, and combined with the pagination feature, they allow easy tracking of audit progress and results.

File Audit

After enabling Audit in the Knowledge Base Settings, any addition or removal of files will require auditing. There are two options: Approve or Reject. Rejected files will not be imported into or removed from the knowledge base, and will be marked with a "Rejected" status. Files that have not yet been audit will display a "Pending" status.

Click the ✔ on the right to approve the change after a second confirmation. Click ✖ to reject the operation. A reason must be provided for rejection, with a maximum of 500 characters

Click on a specific file item to view detailed audit information. You can scroll up and down to see the audit details for multiple sections within the file.

Q&A Audit

After enabling Audit in the Knowledge Base Settings, any addition or removal of Q&A will require auditing. There are two options: Approve or Reject. Rejected Q&A will not be imported into or removed from the knowledge base and will be marked with a "Rejected" status. Q&A that have not yet been audited will display a "Pending" status.

Similar to file audits, click the ✔ on the right to approve the audit after a second confirmation. Click ✖ to reject the operation. A reason must be provided for rejection, with a maximum of 500 characters.Click on a specific Q&A item to view detailed audit information.

Text blocks audit

After enabling audit in the Knowledge Base Settings, any addition or removal of text blocks will require auditing. There are two options: Approve or Reject. Rejected text blocks will not be imported into or removed from the knowledge base and will be marked with a "Rejected" status. Items that have not yet been audited will display a "Pending Audit" status.

Similar to file audits, click the ✔ on the right to approve the audit after a second confirmation. Click ✖ to reject the operation. A reason must be provided for the rejection, with a maximum of 500 characters.Click on a specific text block item to view detailed audit information.

PreviousBuild an Agent NextDatabase

Last updated 4 months ago