How to Use the URL Scrape Feature in Volt Vectors

The URL Scrape feature in Volt Vectors allows you to easily create embeddings from any publicly accessible web page. This can be a great way to add more content and capabilities to your LLM Chat’s knowledge base.

Overview #

The URL Scrape feature works by scraping the content from a web page you specify and using that content to generate embeddings. It has options to limit the actual content that gets scraped to focus only on what you need.

Here’s a quick rundown of how to use the URL Scrape feature:

  • Navigate to the Add New page to add a new set of embeddings
  • Select the “URL Scrape” option
  • Enter the URL of the web page you want to scrape
  • Use the selector field to narrow down the content to scrape
  • Customize other options like the title and source URL
  • Publish the embeddings and then generate them

Once generated, the new embeddings created from the scraped content can be accessed by your LLM Chat.

Step-by-Step Instructions #

Here is a more detailed walkthrough of using the URL Scrape feature:

  1. Go to the Add New page to add new embeddings
  2. Select URL Scrape as the source type
  3. Enter the URL of the web page you want to scrape in the URL field
    • This can be any publicly accessible web page
  4. Specify a selector to narrow down the content
    • The selector targets a specific HTML element on the page
    • Helps focus only on relevant content and avoid things like sidebars, headers, footers etc.
    • Use your browser’s inspector tool to find a good selector
  5. Customize additional options (optional):
    • Document Title: Change the default title extracted from the web page
    • Source: Defaults to the URL but can be customized
  6. Choose embedding settings like chunk size and embedding model
  7. Publish the new embeddings
  8. Generate the embeddings to scrape the URL and create the new embeddings
  9. The new embeddings will now be available for your LLM Chat!

Pro Tips #

Here are some additional pointers when using the URL Scrape feature:

  • Test different selectors to fine tune the content being scraped
  • Generate separate embedding sets for different parts of a web page
  • Consider removing extraneous page elements like menus, sidebars etc via the selector
  • Don’t go overboard scraping large websites – focus on specific relevant content
  • Re-generate embeddings periodically to keep the content current

And that covers the basics of harnessing the URL Scrape capability in Volt Vectors! With some strategic scraping, you can greatly expand your LLM Chat’s knowledge.