Hello, and welcome. Starting this blog with a technical article about an important feature on the site: the search. For a database site such as VocaDB, a properly working search function is essential. The users should be able to find what they’re looking for quickly and reliably. It might seem simple, but implementing a search engine can be a lot of work, and there are tradeoffs, especially between search accuracy and performance. Here I try to explain how the search function works on VocaDB and also give a few tips on how to use it efficiently.
First, some basics. As mentioned, the search function is essential, therefore it’s placed on the top and is always visible when you scroll. By default you can search all types of entries (albums, artists and songs), but if you get too many results, you can narrow down your search to a specific category, such as albums. The search can also find users and tags. The autocomplete field suggests possible matches as you type letters. By clicking a suggestion or pressing the search button you’re taken to the detailed search results page, but if there’s only one entry, you will be redirected straight to that entry.
On VocaDB, text searches support a variety of “match modes”:
- Auto mode automatically selects the “best” match mode based on input. This is the default, but it doesn’t mean anything by itself, as Auto mode always selects some concrete match mode.
- Exact only allows exact matches. For example, the search “Hatsune Miku” only matches entries with that title, not Hatsune Miku Append.
- Prefix (or StartsWith) allows matches that start with the given search term. For example, “Hatsune Miku” also matches “Hatsune Miku Append”, but not “ELECTLOID feat. Hatsune Miku”.
- Contains match the term anywhere in the results, instead of just the beginning. This mode has worse performance than exact or prefix, but it’s also a lot more powerful.
- Words is the most complex mode. It breaks down the search term by whitespace into words and searched by every word individually. In this mode, the order of words doesn’t matter. For example, this is the only mode that matches both “Hatsune Miku” and “Miku Hatsune” when searching with the term “Hatsune Miku”.
As said, in most cases auto mode is the default. For short terms (1 or 2 characters) the auto mode uses the prefix match mode. For longer terms it generally uses the words match mode.
If you’re only looking for entries whose title starts with the given term, you can force the search box to use the prefix match mode with an asterisk in the end. For example, “Hatsune Miku*” (note the asterisk) searches by the term “Hatsune Miku”, but by using the prefix match mode instead of words. Therefore, you only get results where that start with the term. Likewise, by surrounding the search term with quotation marks you only get exact matches, meaning that “Hatsune Miku” (in quotes) would find only one result. These can be very useful for short names that would otherwise match too many names.
An important note is that all matches, even with the exact match mode, are case-insensitive.
In addition to matching the title of the entry, most types of searches have additional filter options. In most cases we try to make these options available through the UI. For example, the song search is able to filter results by artist. Album, artist and song searches are able to filter by tag. Note that at the moment there is no UI for filtering by tag: you have to do it manually by prefixing the search term with “tag:name of tag”, for example “tag:vocarock” (note that this syntax is subject to change). Unfortunately, at the moment, due to performance reasons, most search terms are mutually exclusive. You’re not able to filter by artist while also searching by keyword. We’re trying to make this possible in the future.
Artist search and the “P” suffix
It is very common to give the Vocaloid artists a “P” (short for “producer”) name, usually based on some of their famous early works. Generally this means the capital letter “P” is appended at the end of an artist name that the fans have come up with. For example, the VocaliodP. However, there are a couple of issues with the P suffix. It is often completely acceptable to omit the P suffix and use the artist name without it. There are also some variation on how the P suffix is added. Sometimes it’s appended right after the artist name with no whitespace or separator characters, but it’s also common to see spellings such as Vocaliod P and Vocaliod-P. Because of this, when searching for artists, VocaDB search automatically removes the P suffix (including common separators) from the search term and matches without it. Thus, terms “VocaliodP”, “Vocaliod” and “Vocaliod-P” are all equal, and find the intended artist. This causes other issues with the artist’s name actually ends with a “p”, but these cases are a lot more rare than the other search issues caused by the P suffix.
Super secret unofficial wildcards
Resulting from the database implementation, most searches accept the underscore “_” as a placeholder, matching any character. For example “Mik_” matches “Miku” and “Miki”. A percent sign “%” matches any number of characters. Finally, you’re able to specify a range of characters in brackets, for example “Mik[a-z]” also matches both “Miku” and “Miki”. It’s important to note that at the moment these wildcards are unofficial, and might change at any time, so use them at your own risk.
Main limitations and future developments
There are some issues with the current search feature. At the moment different types of apostropes and various other special characters aren’t converted, which might cause confusion. For example, the entry for koma’n. Both the ‘ and ’ characters are used, but the system does not consider them equivalent.
As mentioned before, combining different types of search filters is not possible. Filtering by multiple terms, each returning potentially a large number of results, can be very complicated and requires reworking the search engine. It’s not impossible, however. An experimental search engine for albums, called the “advanced search” exists, but currently the performance can be really bad if one of the filters results in a huge number of results, and therefore the feature is still considered to be in development. We’re looking into the possibility of implementing something called a full text index, possibly based on a library called Lucene, that would allow rapid textural queries by multiple types of filters. The goal is to be able to write a query such as “artist:”hatsune miku” tag:trance”, which would find all trance songs with Hatsune Miku.