In the previous videos, I have shown you how to create a Search API server using the database service and the solr service. The next thing that we need to set up is the search index. A search index determines what should be indexed and searchable. In this video, I will show you how to setup a Search API index that will index nodes and define which fields should be stored in the index. At the end of the video, we'll take a quick look at how the database search works, as promised in Video 3.
Create index
- Go to "Configuration = > Search and Metadata => Search API" (admin/config/search/search_api)
Notice that there is a default index in the list, which is automatically created by the Search API module, the Default node index. You could use and edit this Default node index by clicking "edit" then configure its settings to your specific needs. But, let's create our own from scratch.
- Click "Add index"
- Index name: Node Index (You can name the index whatever you want. I'm going to name it "No index" because this will index nodes. You can create as many indexes as you'd like and have them index different types of entities like comments and users.)
- Item type: Node (This specifies the type of data that can be indexed.)
- [x] Enabled (This enables this index but will only take effect if you select an enabled server)
- (Optional) Index description: Used for indexing nodes.
- Server: Apache Solr search server (This specifies which server will be used to index the data. If you want to create an index that uses the Database service select "Database search server", but if you want to create an index that uses the Solr service, select "Apache Solr search server". Let's select "Apache Solr search server" for this index, since the Solr service supports more features than the database service.)
- [ ] Read only (If enabled, this will allow you to create an index that can only be used to search existing data on a remote server. Data will neither be indexed to the server nor deleted from the server by Search API.)
- [x] Index items immediately (This option will immediately index new or updated items instead of waiting for the next cron run. If you have a large number of items to index, you may want to leave this disabled and utilize cron. Doing that may prevent timeouts. But since I won't be creating a lot of content on the site in these videos, I'll enable it so that everything is indexed immediately, and we can search without having to run Cron all the time.)
- Cron batch size: 50 (This is where you specify the number of items to be indexed with each cron run. A larger number here means more resources will be consumed for a longer period of time.)
- (Create index)
After successfully creating an index, we'll be taken to the Fields settings where we have to define which fields will be indexed
Select fields to index
We see a list of all fields available to nodes and we can select the fields that we want to indexed and searchable.
- Click "Add related fields" (This lets you add fields of entities related to those in the list above so that they can be indexed too. If you're familiar with Relationships in Views, this is similar.)
- Select "Author" (This will add items that are related to the node's author, like its name, email, etc.)
- (Add fields)
- Click "Add related fields"
- Select "The main body text"
- (Add fields)
- [x] Content type
- [x] Title
- Type: Fulltext (Fulltext allows you to find individual words contained in this field.)
- Boost: 8.0 (When searching for fulltext keywords, this determines how important a certain field is. The higher the value, the more importance it indicates)
- [x] Comment count
- [x] Date created
- [x] Tags
- [x] Author » Name (We'll use this as an example in Video 9 when we add facets.)
- [x] The main body text » Text
- (Save changes)
Filters Settings
The filter settings allow you to specify data alterations and processors to perform on the indexed data.
- Data Alterations
- [ ] Bundle filter (If enabled, this lets you filter out items based on their bundle. ie: Article or Basic page.)
- [x] Node access (This will only return results that the current user has access to.)
- [x] Exclude unpublished nodes (This will exclude unpublished nodes from the index.)
- [x] Index hierarchy (This allows you to index hierarchical fields, such as taxonomy terms.)
- [ ] Complete entity view (This allows you to index exactly what the user sees when the content is viewed.)
- [ ] URL field (This allows you to add the item's URL to the indexed. Item types like node already have a URL field, so we don't need to enable this)
- [ ] Aggregated fields (This gives you the ability to add fields that have aggregated data from other fields.)
- Data alteration processing order (Here, you can set the order of data alterations)
- Callback settings
- Hierarchical fields
- Select "Parent terms" (This will lets you index taxonomy terms along with their parent terms.)
- Hierarchical fields
- Processors (Processors change how data is indexed. It is important to remember that some servers already have data processing built in, so enabling these processors is usually not needed. Solr already handles these preprocessing tasks. But enabling the HTML filter can be useful as the default config files included in this module don't handle stripping out HTML tags.)
- [ ] Ignore case (This makes searches case-insensitive.)
- [x] HTML filter (This removes HTML tags from fulltext fields and avoids indexing HTML tags.)
- [ ] Tokenizer (This allows you to specify how indexed fulltext is split into separate tokens, which characters make up words and which characters should be ignored.)
- [ ] Stopwords (This can be used to prevent commonly used words from being indexed.)
- [ ] Highlighting (This highlights the search results.)
- (Save Configuration)
To check if everything has been configured successfully:
- Click the "View" tab
Here, we will see if the index is enabled, the item type and which server is being used. We can see the current index status and the number of items indexed and the number of items still to be index. We also have the option to manually index items. Let's do that.
- Index now
- Index: all
- items in batches of: 50
- (Index now)
We've successfully indexed items!
Demonstrating Search API Database Search
I mentioned in Video 3 that I'd demonstrate how the Database Server we created uses our default database. Let me show you that now.
- I'll open my database in phpMyAdmin.
- When we scroll down to the Search section, we'll see some Search API tables, but none that say anything about "search_api_db_node_index"
- I'll go back to the "Node Index" page and click the "Edit" tab (admin/config/search/search_api/index/node_index/edit)
- Now, I'll change the "Server" field to "Database Server" and click "Save Settings"
- Now, I'll click "Index now" to index the site to the database.
- When that's done, I'll return to phpMyAdmin and refresh the page
- This time, we'll see a number of tables starting with "search_api_db_node_index"
This means that Search API is using our database to store the search data. This can be useful for small sites with low traffic, or not much content. On larger sites, though, this will cause a performance issue, and affect page loads, and the general site speed. So, if you need your site to be performant, you'll want to use an external search service such as Apache Solr.
The rest of this series will use Apache Solr, so I'll go ahead and edit the Node Index and switch the Server back to "Apache Solr Search Server". Then, I'll clear the cache and click the "Index now" button. Now, when I refresh phpMyAdmin, the "search_api_db_node_index" tables are gone. So, we're ready to move on to the next video, where I will show you how to create a search form.