Lars Nielsen's Discoveries

July 30, 2012

SharePoint 2010 Search Troubleshooting

Filed under: Search,SharePoint,Troubleshooting — Lars Nielsen @ 7:02 pm
Tags:

Here are some troubleshooting tips when you hit problems setting up SharePoint 2010 Search Content Sources

If you’re using SSL (HTTPS) for the URL of your site then you need to use SPS3S not SPS3 as the prefix for the address in the content source.  Thanks to this thread for that tip.  If you don’t then you get an error in the crawl log:

The object was not found

You might also see this error when trying to crawl My Sites and/or when using SPS3 or SPS3S:

Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has “Full Read” permissions on the SharePoint Web Application being crawled. ( HttpStatusCode Unauthorized The request failed with HTTP status 401: Unauthorized. )

In this case you need to allow the crawler content account permissions access to the User Profile Service Application.  Thanks for this post from Cory Roth for the answer

In Manage Service Applications select the User Profile Service Application

Click Administrators button

Add an entry for the search crawler Content Access account and give it Retrieve People Data For Search Crawlers permission

Search Crawl account on User Profile Service

Advertisements

February 19, 2012

Create a search scope for a Site Directory

Filed under: Administration,Search,SharePoint — Lars Nielsen @ 6:52 pm
Tags: ,

Recently I needed to create a custom search page to search the site directory in SharePoint, which tracks new sites as they are being created.  The Site Directory contains a list called “Sites” (internal name SitesList) which tracks all the new sites as they are created.  I had already set up this list with columns to capture metadata about each site as it is created.

To get the search to work I wanted to create a new search scope in the site collection to search only the Sites list in the Site Directory.  To do this I created the new scope in the site collection settings and added a Folder rule to point to the URL of the list (e.g. http://my-domain/SiteDirectory/SitesList) and set the behaviour to be Require.  When this scope updated, it brought in many more items than the number of items in the list.  Perhaps it also brings in the list view ASPX pages as well, I’m not sure.  Anyway I’ve seen that before and I found the solution is to add another rule that restricts the scope to actual items in the list.

To do this, you can create another rule to filter on the ContentType and  contentclass properties and set the second rule to be Require behaviour as well.    The resulting scope will be the intersection of the two rules.  I used the contentclass property, and found a good list of possible values here.  But it took a while to find out what to use for contentclass for items in the Sites list.  Eventually an answer from Steve Curran to this question gave me the answer.  You need to use a rule like this:

contentclass = STS_ListItem_300

So the whole scope configuration looks like this:

Site Directory Scopes screenshot

This will restrict the scope to items in the Sites list in a Site Directory.

April 15, 2011

Crawl log errors and resolutions in Search

Filed under: Search,SharePoint,Troubleshooting — Lars Nielsen @ 9:58 pm

I was recently going through the crawl log and fixing errors – here are some of the error messages in the crawl log, and resolutions:

Error in the Microsoft Windows SharePoint Services Protocol Handler

This was an attempt to crawl a non-existent ASPX page.  Click on the link and you get a 404 error.  I’m not sure why it’s trying to crawl a deleted page.  Perhaps a broken link somewhere?

The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly

This was  at attempt to crawl a broken ASPX page – in this case a view of a calendar that gave a .NET error when you click on the link in the crawl log.  The resolution is to delete the ASPX view page from the list using SharePoint designer.

The parameter is incorrect

The crawler was trying to index a Word 2007 Template File (.dotx) instead of a regular Word .docx file.

Error in the Site Data Web Service

The target site was present (you can open it in SharePoint Designer) but its welcome page was pointing to a non-existent page so you could not browse to it.  Resolution is to go to the Site Settings page directly by appending “/_layouts/settings.aspx” onto the end of the site root URL, e.g. “http://mydomain/site/subsite/_layouts/settings.aspx” and click on “Welcome Page” and reset the welcome page to point to a real page so that you can browse to the site.

Finally, I had a problem that the crawler was crawling content OK.  I could find documents and pages by their URL in the crawl log, and the log said that they had been crawled successfully.  But when you searched for the items by keyword in the search box, no results were returned.   The problem seemed to be that the index not updating properly.  I used the Reset All Crawled Content page in Search Administration in Central Administration, followed by a full crawl, and everything was OK.

January 13, 2010

Unrecognized HTTP response error on search crawl log

Filed under: Search,Troubleshooting — Lars Nielsen @ 6:34 pm
Tags: , ,

I recently created a web application using SSL and a wildcard certificate on the server.  After some debate with our network people we established that we needed a separate IP address for each web application.  I did the same for My Sites, creating a separate web app and attaching this to the SSP as the MySite provider using SSL, again with its own IP.  Everything went pretty well once I got my head around the new IIS GUI with IIS  7. 

The only problem I had was once I tried getting the search to crawl the new sites.  I was getting errors in the application event log:

Source: Office Server Search
Event ID:      2436
Description:  The start address <https://mywebapp&gt; cannot be crawled.
Details:   An unrecognized HTTP response was received when attempting to crawl this item. Verify whether the item can be accessed using your browser.   (0x80041204)

I first tried going into the Search Administration in the SSP in Central Admin, and going to the Proxy and Timeouts screen, and putting in the IP address and port of our proxy server in the Proxy Server Settings section. But that didn’t resolve it, so I removed the proxy settings and instead checked the box at the bottom of the screen (Ignore SSL certificate name warnings):

Proxy and Timeouts

This has resolved the error for now, although I need to find out why the crawler is being blocked unless I set it to ignore certificate warnings – might be because we’re using wildcard certificates?

Blog at WordPress.com.