DocSearch can work with almost any website, but we found that some site structures yield more relevant result or faster indexing time. In this page we'll share some tips on how you can make the most out of DocSearch.
If you provide a sitemap in your configuration, DocSearch will use it to directly browse the pages to index. Pages are still crawled which means we extract every compliant link.
We highly recommend you add a
sitemap.xml to your website if you don't have
one already. This will make the indexing faster, but will also give you more
control over which page you'd like to include or not in the indexing.
Sitemaps are also considered good practice for other aspects, including SEO (more information on sitemaps).
DocSearch works better on structured documentation. Relevance of results is
based on the structural hierarchy of content. In simpler terms it means that we
<h6> headings of your page to guess the hierarchy of
information. This hierarchy brings contextual information to your records.
Documentation starts by explaining generic concepts first and then goes deeper
into specifics. This is represented in your HTML markup by the hierarchy of
headings you're using. For example, concepts discussed under a
<h4> are more
specific than concepts discussed under a
<h2> in the same page. The sooner the
information comes up within the page, the higher is it ranked.
DocSearch uses this structure to fine-tune the relevance of results as well as to provide potential filtering. Documentation that follows this pattern often has better relevance in its search results.
Finding the right depth of your documentation tree and how to split-up your content are two of the most complex tasks. For large documents, we recommend having 4 levels (from lvl0 until lvl3). We recommend at least three different levels.
_Note that you don't have to use
<hX> tags and can use
<span class="title-X"> for example instead. Your will need to update your set
DocSearch is extracting content based on the HTML structure. We recommend that
you add a custom
class to the HTML element wrapping all your textual content.
This will help narrow selectors to the relevant content.
Having such a unique identifier will make your configuration more robust as it will make sure all indexed content is relevant content. We found that this is the most reliable way to exclude content in headers, sidebars, and footers that are not relevant to the search.
When using headings (as mentioned above), you should also try to add a custom
anchor to each of them. Anchors are HTML attributes (
id) added to
headers that will allow the browser to directly scroll to the right position in
the page when clicking a link with a
# in it.
DocSearch will honor such anchors and automatically bring your users to the anchor closest to the search result they selected.
If you're using a multi-level navigation, we recommend that you mark each active level with a custom CSS class. This will make it easier for DocSearch to know where the current page fits in the website hierarchy.
For example, if your
troubleshooting.html page is located under the
Installation menu in your sidebar, we recommend that you add a custom CSS
class to the
Troubleshooting links in your sidebar.
The name of the CSS class does not matter, as long as it's something that can be used as part of a CSS selector.
Consistency is a pillar of meaningful documentation. To increase the intelligibility of a document it also shortens the time required for a user to find the coveted information. The document's topic should be identifiable and its outline demarcated.
The hierarchy should always have the same size. Try to avoid orphan records such as the implicit introduction/conclusion, or asides. The selectors must be efficient for every document and highlight the proper hierarchy. They need to match the coveted elements depending on their level. Be careful to avoid the edge effect by matching unexpected superfluous elements.
Selectors should match information from real document web pages and stay
ineffective for others ones (e.g., landing page, table of content, etc.). We
urge the maintainer to define a dedicated class for the main DOM
container that includes the actual document content such as
Since documentation should be interactive, it is a key point to verbalize concepts with standardized words. This redundancy, empowered with the search experience (dropdown), will even enable the learn-as-you-type experience. The way to find the information plays a key role in leading the user to the retrieved knowledge itself. You can also use the synonym feature.
The more time-consuming reading documentation is, the more painful and reluctant its use will be. You must avoid hazy points or catch-all. With being unhelpful, the catch-all document may be confusing and counterproductive.
Duplicates introduce noise and mislead users. This is why you should always focus on the relevant content and avoid duplicating content within your site (for example landing page which contains all information, summing up, etc.). In cases where the duplicates's existence is expected since it belongs to another dataset (for example a different version), you should use facets.
What is clearly thought out is clearly and concisely expressed.
We highly recommend that you read this blog post about how to build a helpful search for technical documentation.