Seamlessly switch Lucene indexes to avoid search downtime during re-indexing

(For those of you who don’t already know, John West is the Sitecore person. He wrote the book on Sitecore development, and his blog in particular is a treasure trove of Sitecore knowledge. He’s also an all-around nice guy, which I suppose fits his role as a technological evangelist; he’s always lending a hand to someone.)

During our current Sitecore upgrade, I found this gem: using the Lucene subdirectory-switching feature, via Sitecore.ContentSearch.LuceneProvider.SwitchOnRebuildLuceneIndex , allows for zero downtime when re-indexing in a production environment. Note that due to the use of two indexes, it may be necessary to rebuild twice when implementing indexing changes.

Advertisements

Some brief musings on search

We’re currently in the middle of upgrading a major codebase from Sitecore 6.6 to Sitecore 7.2 (and from thence to 7.5 or directly to 8.x). This has prompted a lot of soul-searching–how did we get here? Are we alone in the universe? And after a cup of coffee, some search-related thoughts have begun to emerge as well…

Solr is an exciting prospect, for sure, and the ContentSearch namespace is definitely much improved over what it’s replaced. Any pluggable architecture for any data-layer component always gives me the warm fuzzies. It’s actually kind of hot, in a strictly nerdy, programmatic sense.

Using PredicateBuilder to put together detailed queries can be a breeze. Still, the API is a bit clunky (I am apparently not alone in feeling so) and documentation seems to be a bit scattered. It seems that with a more careful eye to design, or maybe adding a bit of syntactic sugar somehow, the need for ugliness like an anchoring true-or-false condition could have been avoided, and the resulting programming style more natural.

Another persistent gripe of mine is the lack of some unification for Sitecore Query and Lucene (and now Solr) searches. The beauty of Sitecore Query, of course, is the beauty of XPath–although it may be a somewhat twisted, hobbled version, the orc to XPath-proper’s elf–and that is a declarative syntax. That’s why it’s used in tons of places within Sitecore, for data sources and more, as well as in third-party libraries like the wonderful Glass Mapper, which we use. You can’t beat the simplicity of referring, in your complex model, to related objects using an easily composed attribute path.

The fact is, in at least some situations where Lucene/Solr are valid choices for performance reasons, the actual search could be shoehorned into an XPath-like syntax too. Food for thought? I’m currently investigating something based on this… more later.

Quirkiness accessing the site root item from a descendant using Sitecore Query

Sitecore Query, with its XPath-like query syntax, has its share of quirks. It doesn’t always quite return the expected results, and one can’t use all functions available in later versions of XPath, either (though it does have some very useful Sitecore-specific functions). One can tackle these issues by extending Sitecore Query to include desired functionality. However, depending on the context this may not work well. Swapping out the built-in expression evaluation would be non-trivial (though useful, for example to merge Sitecore Query functionality with non-Sitecore search while retaining a declarative programming style, integrating with an ORM such as the Glass Mapper, or making non-Sitecore-Query results available everywhere in Sitecore itself that Sitecore Query is used, a topic for another exploration).

Everyone who’s used Sitecore extensively has had to access the site root item. This is not tough programmatically using the Sitecore API, and often done using extension methods. But when it’s helpful to use Sitecore Query, especially where the context item is at an undetermined depth in the content tree, one can run up against limitations of Sitecore Query.

I first discovered this on a project using Sitecore 7.2. The obvious approach is to use the ancestor axis in the query, but using index-based queries turned out to be funky; queries with an index of 1 would return the top three items (/sitecore, /sitecore/content, and the relevant child item of /sitecore/content). I found that using the following query would return the item at the specified level plus two, in this case actually returning the site-root item I was after:

./ancestor::*[ancestor::*[@@key='content']]/.[1]

Out of curiosity I determined that the following would also work, with varying degrees of hackiness:

./ancestor::*[@@key='home']/ancestor::*[@@key != 'sitecore' and @@key != 'content']
./ancestor::*[position() = 1 and @@key != 'sitecore' and @@key != 'content']
./ancestor::*[ancestor::*[@@key='content']]/.[1]
./ancestor::*[ancestor::*[parent::content]]/.[1]

These are written for readability; one could use @@templateid instead of @@key with the template ID of the /sitecore/content item as well, or simply restrict to the template of the site home/root item itself, where that template is guaranteed to be unique in the hierarchy as it often is. The latter is what I wound up doing and it worked fine, but I came away with an enhanced respect for the sheer individuality of Sitecore Query.

Temporarily disable Sitecore client notifications to suppress unwanted new-item redirects

The default behavior, when a user kicks off any action in the Sitecore client which adds a new item, is that the client redirects to the newly created item. In some (many) cases this may be desirable, but in others it is an unwanted side effect.

In order to suppress this behavior, two approaches may be helpful. One sometimes recommended is to suppress all events while creating the new item, but this should be done with caution depending on the environment. To do this enclose the item-creation code in a using statement like this one:

using (new Sitecore.Data.Events.EventDisabler()
{
// item creation code here...
}

Even better, and an approach which should be sufficient by itself, is to suppress client notifications during the item creation, like so:

Sitecore.Client.Site.Notifications.Disabled = true;
// item creation code here...
Sitecore.Client.Site.Notifications.Disabled = false;

To be safe, be sure to re-enable client notifications in the finally of a try-catch block. Also, depending on the Sitecore version, bucketable items may need a special tweak to suppress notifications as well.

Resetting the admin password in Sitecore, and some security considerations

As of Sitecore 7, one can still reset the admin password directly in the core database by executing the following statement:

UPDATE [aspnet_Membership] SET Password='qOvF8m8F2IcWMvfOBjJYHmfLABc='   
WHERE UserId IN (
     SELECT UserId FROM [aspnet_Users] WHERE UserName = 'sitecore\Admin'
)

This resets the admin password to the default value, “b”. Obviously a reset is in order shortly afterward, and this highlights the need for good security at the database level, but this tip can be helpful if the password is reset to a value that’s somehow lost.

Here’s the kicker: resetting the admin password may be necessary even in environments where the Sitecore client is disabled.  Perhaps the client-disabled environment was set up with the default password, or it simply needs to be changed due to security requirements. The password may still actually be used in a client-disabled environment as well, if the admin pages are not also disabled.

If that is the case, one easy approach may be to:

  1. Reset the admin password in a non-client-disabled environment;
  2. Get the stored, hashed password by running the following query in the core database of the non-client-disabled environment; and
    SELECT Password FROM [aspnet_Membership]   
    WHERE UserId IN (
         SELECT UserId FROM [aspnet_Users] WHERE UserName = 'sitecore\Admin'
    )
  3. Copy the password to the client-disabled environment, by running UPDATE statement above with the changed password in the core database of the client-disabled environment.

Moving data in Sitecore: publishing vs. packages and third-party software

When I began my current job, where Sitecore is used heavily and is a core piece of the company’s technology strategy going forward, the internal software landscape was fairly different from what it is today. The previous, just-departed Sitecore engineer had recommended use of software including Hedgehog TDS. Moving content between environments was a clunky and error-prone task when I arrived, but today things have been streamlined considerably due to our switch to using publishing for most content moves.

There are three main ways I’ve discovered to move substantial amounts of data in Sitecore (aside from the transfer-item wizard and other only tangentially relevant methods): packages; third-party software, of which Hedgehog TDS is definitely a primary example; and publishing. A brief description of the advantages we’ve found with each follows.

Packages: built-in, dependable, but inconvenient for frequent small updates

At my company, due to our use of Hedgehog TDS at the beginning, no one had yet used packages when I first arrived. Since then we have changed somewhat; for very large content and template/system updates, or for other reasons, we do sometimes use packages. The built-in tools for working with packages work fine, but it’s easier to create them using Sitecore Rocks, which I may review in a separate post.

Advantages of packages include dependability; ability to move content and structure past a network boundary where for security or other reasons publishing can’t be used; and avoidance of tying up the publishing pipeline. (Packages have other uses as well, such as backing up subsets of data from a Sitecore container in an easy way, but that’s not relevant here.)

The main disadvantage with using packages is inconvenience. One will usually not need a data save/backup point for every deployment of content or code, and when one can publish items, including template items, in seconds, time spent creating packages may be wasted.

A third-party tool was unnecessary for us and relatively complicated

When I arrived on the scene, Hedgehog TDS was in use at my company. Eventually, after setting up a full set of publishing targets between our various environments, we stopped using TDS (though we are evaluating a later version for use again today).

When my workstation was set up, it turned out that we didn’t have an installation file for the version the rest of the team was using. I downloaded the latest version of TDS available on the Hedgehog site and installed it, which seemed to go smoothly. Unfortunately, when trying to use the TDS plugin from within Visual Studio against our mixed Sitecore 6.5 and 6.6 environment, I would get error messages related to a web service apparently being unavailable. We were unable to get this problem resolved quickly in our environment, though of course not all users would have had the same experience with TDS.

Such third-party software may offer other benefits besides just moving content. In the case of Hedgehog TDS, its scheme of exposing items as serialized data on the file system opens up the possibility of easy integration with source control systems such as git, and its merging tools were useful too. The director of my department also received information from a potential DMS consultant, whom we wound up not hiring, that TDS would greatly simplify working with DMS; I have no way of weighing in on this, and in any event it’s not relevant to simply moving data.

The current version of TDS may be much different, and mileage may vary. I’m not out to slam the product or the company, but we simply had a poor, likely temporary stability experience with an older version of TDS.

Publishing between environments

Most Sitecore users are familiar with publishing content from content entry to content delivery environments. However, publishing can be used for more than that. Since system settings, layouts, templates etc. are stored as items as well (though some may be stored in the core database instead of master), they are all inherently movable by the publishing mechanism built in to Sitecore.

Publishing has a vast advantage in simplicity and ease of use over some other methods. When moving content and templates from development to content entry environments, for example, it’s usually just simpler to click on an ancestor item and publish, instead of creating packages. A main drawback, of course, is that one may lose version control (although the Sitecore databases of course should be backed up frequently and allow for disaster recovery accordingly). Also, unpublishable items cannot be moved this way either without either 1) setting them to publishable, moving, and resetting them again, or 2) finding a method of moving them without needing to change their publishability, such as using packages, which implies finding them first via a query in a complex data environment. And, of course, the Sitecore limitation of publishing one item at a time means that pushing large amounts of data this way could potentially interfere with content authors during the workday. As such, the publishing method is a “low-tech”, manual way of moving data in a snap, not a replacement for a fully automated solution.

To add new publishing targets, first create a database connection in ConnectionStrings.config to point to the target master database. Then add the database to the <databases> portion of Web.config. Create a new publishing target item in the content editor, and you’re ready to start moving data the easy way.

In our current setup, we have two parallel environments, one on Sitecore 6.6 and one on Sitecore 7.2; each version has a development (master-core-web), staging/content entry (master-core-web), and content delivery (core-web) environment. Before upgrading to Sitecore 7, these environments were used to run 6.5 and 6.6 side-by-side; after setting up the websites, I created publishing targets between master databases of the same stage between environments, so that for instance, the content entry 6.5 container would have a pointer to the content entry 6.6 container. These publishing targets were used to migrate data between versions, which went amazingly well.

Today, we routinely use publishing to move data between our shared development Sitecore system and content entry system, as well as from content entry to content delivery. We move content, templates, layouts and sublayouts, media items, etc. and it works very well. Below is a simplified diagram showing our environment as it was when we ran side-by-side 6.5 and 6.6 installations; today the environment has changed a bit more and we use version 7.2 and 6.6, but this gives an idea of how it was originally set up.

publishing_target_overview

Tips for setting up publishing targets to link Sitecore environments

1. For each pair of environments between which you must move data, both content and structure, consider setting up publishing targets to connect them. Decide whether forward-only publishing is desired, or whether bidirectional publishing may be helpful (we have this set up between our content entry and development environments, so that any tweaks to the content entry environment can be back-published to development, as well as periodically refreshing the content in the dev. environment in an easy way.) Consider any cross-version issues that may arise as well; run tests to see if publishing between major versions would result in unwanted effects.

2. In general, set up publishing targets to point to the master database in the target environment. Give each database connection and publishing target for a new inter-environment connection a descriptive name, to differentiate it from the source environment’s master database.

3. Since each database connection for publishing use will consume memory, consider setting caching options much lower than for other Sitecore database connections. See the applicable Sitecore caching reference for more info on the settings, which are beyond the scope of this article.

4. Hide the new publishing targets from non-admins, by hiding or otherwise read-protecting the publishing target item in /sitecore/system .

5. Since publishing large amounts of content will block the publishing pipeline, postpone larger publishing jobs until off hours or consider using a package.

6. Remember, publishing will obviously not move unpublishable items. Consider running a Sitecore Rocks or other query to find those items and move them with a package instead (more easy), or set them to publishable, move them, then set them unpublishable again (less easy).

Looking forward to Continuous Integration

Currently Continuous Integration (CI) tools have been evaluated, and one will be implemented in coming weeks, at my team. This is a positive change for us, and among other benefits will further ease the deployment burden. During the evaluation phase the director of my unit asked whether Hedgehog TDS would be a wise choice to re-evaluate, and we are in the middle of doing that. The Sitecore serialization API provides an easy enough hook for CI tools, so there’s no obstacle preventing use of CI without third-party software, but TDS may be useful in other ways as well. After we implement CI, we will still use publishing to move content on an ad-hoc basis, but may use CI tools to deploy new code, templates and other components side-by-side.

To sum up, the publishing mechanism can be used to good effect for moving small-to-moderate quantities of content and system data. It offers great flexibility and ease of use, and is relatively easy to set up with a few config-file additions. While my team is going in the direction of a more regimented approach with CI, the publishing method has had its uses for quick, easy deployments.