More servicesWindows Live
HomeHotmailSpacesOneCare
 
MSN
Sign in
 
 
Spaces home  MOSS & Collab Architectu...PhotosProfileFriendsBlog Tools Explore the Spaces community

Blog

February 14

Index as Dedicated Front End for Crawls?

Hmm Joelo raises an interesting point regarding the use of the Index server as a Dedicated WFE. Aside from the problem that Dedicated WFEs are not all of the stuff that they should be due to a silly implementation depending on the hosts files.. which has also been discussed in detail... IMHO.. if you want, you can manually configure a more robust solution and even use multiple front end servers for dedicated crawling if necessary just by using 2 hostnames "crawlmeplease" vs "companyportal" and load balancing access accordingly.. but that's another story.

Generally, I look at the Network as something "available" and CPU as something that is "scarce"; however, I've been in situations where the opposite was the case [all traffic going through same switch on a busted network card.. i.e. 10 Mbit/sec...]. Thus, trying to limit "hops" is not really important in my opinion [there is no real impact on the SQL server, as the content has to be retrieved anyway]

Frankly, I think that Index server is the weakest link in the V3 platform [MOSS 2007], and should be given as much elbow space as possible, especially in those large farms with a lot of difficult content to index.. and I think they a dedicated WFE should not be the index server, at least based on some of the following criteria:

  • Formula: check the difference in different activities that your server does, from crawling, to indexing, to propagation, to general server stats [CPU, memory, Disk I/O, network], on both servers in the mix
  • I bet you $2 that crawling [even a full crawl] will take about 1/10 time, and the rest of it will be content retrieval and indexing, propagation of the index will be fast as well.. if you have a ton of content.. maybe separation is advisable.. it is not really the network problem but the CPU problem
  • Depending on the crazy content stored in your site [e.g. PDF files] the indexing part may take for-ever-and-ever, especially if you have a lot of lame IFilters that are single threaded and crash in the middle of a document.. or can't address memory or something else... verdict: separate index from WFE
  • Perhaps another reason for slowness [this time it is the WFE, not the index server] is not an index server but poorly written ASPX pages and Web Parts that someone insists on indexing, and perhaps the index server could deal with 20 of them at a time, but the WFE can only serve 5 simultaneously without spiking the CPU.. verdict: separate WFE from index

But.. don't trust my observations and please do these yourself. It may be that your Index server has enough capacity, and separation [or even a dedicated crawl WFE is not necessary at all]. Meanwhile, if you have a chance, encourage MS to invest some money in the ability to distribute:

  • crawling
  • indexing
  • index merging

Into some kind of a "computing cluster". That would be neat.

January 23

Boatload of resources released

Finally the long awaited documentation and samples have been released. I was expecting them to be posted at least a week back, but they are welcome anyway.

Plus I wanted to congratulate MSDN and Technet teams on doing a very good job in providing excellent documentation. It seems that current MOSS/WSS blogosphere is still playing catch-up with the documentation, which IMHO means that they did a great job. My favorite starting point [when dealing with a task I haven't done before is the "How Do I..." section. It typically contains a wealth of conceptual information along with a code sample to get a particular task done. A great starting point. Also, from past experience, when you send some feedback it typically gets implemented [although after some cycles].

My favorite developer/architect resources:

SharePoint Server 2007 SDK: Software Development Kit and Enterprise Content Management Starter Kit

MOSS SDK provides a ton of useful starting points for workflow and records management, a couple of White Papers, as well as all of the stuff from WSS SDK [see links for 3 fixes that are required for complete installation with VS procedures] and way more. NOTE: online version of MOSS SDK is here

Windows SharePoint Services 3.0: Software Development Kit (SDK)

Please read the detailed installation for WSS [the workflow project types won't show up if you don't follow, and other stuff]. NOTE: online version of WSS SDK is here

WSS SDK contains help on:

  • Web Part Framework
  • Server-side object model
  • Web Services
  • CAML
  • Master Pages
  • Workflows [and workflow templates for VS 2005]
  • Custom Field Types
  • Information Rights Managements
  • Document property info
  • Search

My favorite IT Pro/architect resources:

...well... there is just Technet and many many blogs.

Planning and architecture for Office SharePoint Server 2007

Deep inside this guide there is a very cool toolkit for load testing [among other things.. or pre-loading]. This toolkit needs a special shout-out. If you are developing on SharePoint [and are not necessarily an admin] you can use it to populate portals, sites with list and document data, test it, and then delete. Great utility if you need to do some heavy unit and regression testing. Read about the Tools for performance and capacity planning here.

Microsoft Office SharePoint Server TechCenter

In the past the "admin" help was downloadable.. but now it is not. I hope someone compiles the Technet Planning guides into a comprehensive admin help file.

Instead it is packed into some weird books..
Planning and architecture for Office SharePoint Server 2007

Deployment for Office SharePoint Server 2007

Planning and architecture, deployment, and operations for Office SharePoint Server 2007 for Search

SharePoint vs. File Shares

Each time I venture to do some mid size and enterprise consulting, there is always the question of using SharePoint as a replacement for File Shares, or the typical Z: or G: drive mapped in an enterprise. Now, joelo has added some more more discussion in his File Servers and SharePoint Doc Libraries entry. I just wanted to add some more caveats where SharePoint simply may not work as well, and places/reasons where I recommend SharePoint 200%.

Potential Problem Areas [not addressed before]

  1. Long Paths - WebDav has a limit of 260 chars, and so does MOSS/WSS [or less]. I can't seem to find any references for 2007 on Technet so here is an old one, but you can run a quick test... create "Really Long Folder Name 123456" 8 times and an error will occur. Such a long pathname is really hard to find, but it is still possible [although many other systems will have the same issue]. Recommendation: check paths/lengths before migrating.
  2. Weird files with inter-file references - the most common problem here includes Excel spreadsheets that are typically peppered with references to data stored in other excel files, or help-style references to other documents. This will be difficult to migrate into SharePoint [or other system], as the references typically use a absolute file handle. If you migrate these into SharePoint.. expect to make a lot of modifications, and make sure you add proper clean-up time in your project if you discover these. Recommendation: you will have to clean up no matter what solution you choose, this is a brittle design anyway.
  3. Weird file names - Infrequently users will have filenames that are like test...txt or test#%.txt as a requirement of some application, or as part of a file naming standard. All these will be a no-no [there is an official list of disallowed characters in SPS 2003 admin guide if I recall properly]. Also folders and file names must be 128 characters and less. Recommendation: check file names, and figure out a decent way of substituting these characters without making an embarrassing mistake.
  4. Weird applications that do some crazy file locking a la Access - obviously Access is just one of the applications that depends on the file system beyond just storage [I guess others would include some engineering documents, or streaming audio]. These will simply not benefit from SharePoint because they are specifically dependent on the OS file system to perform [e.g. access creates lock files, streaming audio works better when only chunks of file are sent]. For these types of files, I'd recommend using a specific application server, or retain a standard file share.
  5. Weird applications that use the files and create files - occasionally, the files that are created by users are also reused by some other enterprise applications. Make sure and test that the 3rd party application can actually work with SharePoint as part of your project plan.
  6. Security - SharePoint security is only different than what you're used to in file system. You can't really have a "deny", as well as some other options that are typically ignored. If you happen to utilize one of those security options, you must rethink. Recommendation: check if security can be re-factored or updated to fit SharePoint, otherwise keep SharePoint.

Super-duper benefits of using SharePoint included in my top 5 benefits are below. Just one note of advice. When planning on using these features, plan on training your users on how to leverage these. It should not take more than a 1-2 hour session [in the area of file usage]. 

  1. Versioning, publishing, workflow - especially useful when working in a group of people. Every place that uses standard file system always runs into file editing collisions [multiple simultaneous edits], or ends up with silly file names ["proposal version 1 tom.doc", "proposal Mary v2 -review.doc"] this just leads to confusion and long term disaster. Recommendation: use SharePoint [MOSS if workflow needed], deploy Versioning [and some pruning]. Savings: 2-3 hours a week/person in reconciliation, approval.
  2. Backup/Recycle bin - typical problem with a standard file share, or even a local My Documents folder are accidental deletes of files. It takes 3 days for someone to realize it was deleted, and then a week to get it back. With SharePoint, you get it back immediately [or fairly quickly, even if you deleted it from recycle bin and have to ask an admin], most likely there is no backup when you save files locally. Recommendation: use OOB SharePoint [WSS works]. Savings: enormous if accidental deletion occurs.
  3. Access - when storing files in a share, you can typically only access it from few computers, then you have to do some crazy tricks to copy the files to open them at home, or even a different office. With SharePoint, files are available from desktop, intranet, or even the extranet if your company provides this feature. Recommendation: OOB SharePoint [WSS works]+ extranet. Savings: 1-2 hours a week/person in synchronization.
  4. Metadata - this is not necessarily obvious to many people that do not deal with metadata, but there is always a ton of implicit metadata to be found, even in a straight file system. First you start off with a folder/directory structure [people can store proposals, separate from invoices, separate from documentation, etc] and moving onto file names ["proposal for XYZ LA branch for 2006.doc"]. Eventually, there are 300 folders with 1 document in each folder, not an optimal solution for finding a document. It is so much easier to deal with the documents when you can see a collection of 10-200 documents and use metadata columns to browse/search the documents instead. Document folders could be used as security or team containers. And when the number of documents is too large, create more folders. Recommendation: rethink your organization of documents, use metadata [including site columns, preferably], roll out WSS. Savings: better reuse of data, impacts search. Cons: this, if decided by committee, may take a long time to implement.
  5. Search - for those people that have ever tried to find a file on a file share, you know it's like looking for a needle in a haystack. Whether is it 100 MB or 20 GB, it is almost always a very lengthy search [and typically locks up just about every resource available]. With SharePoint it is instantaneous, filtered, ranked and includes full text of the documents. Recommendation: use SharePoint [MOSS]. Savings: 1-3 hours a week per knowledge worker.

Now, the final question/problem I would have: shell integration [also, why does Windows XP and Vista in general shy away from Tree display?]. Is the ball dropped due to the bundling problem? SharePoint 2001 had a much better integration story with the desktop [property inspection, check-in/check-out, etc.] now it is gone....

Whenever considering SharePoint as file storage, the benefits are clearly enormous, with straight out-of-the box implementations, but keep in mind that there are these few scenarios where you may run into some difficulties, and check for these before ruining your reputation. Make sure your knowledge workers are trained to reap the benefits which could be 5+ hours a week.

November 13

Will IT Departments be ready for Office Server 2007?

Wow. Has anyone explored the depths of Office 2007 administration?

As much as people complained about the administration of the SPS 2003 server [which was quite justified], there will be different types of complaints for the Office 2007 administration.

Whereas in the past [or currently] the admin pages were thrown around all over the place, and some pages were hard to navigate to, the current set of admin pages is laid out much much better, but unfortunately, the new Office 2007 product has about 10x more features packed in. As I have been observing many skilled people in action get lost in simple security assignment, I believe it may be a hard transition to some people. Luckily, for those who are independently wealthy, or have wealthy companies, there is Admin 2007 Training available from Mindsharp guys. I've met them a couple of times at MS events in the past, and they were all quite talented. As I haven't quite seen a lot of other Admin training [including from Microsoft], I recommend you, or someone you know attends. One of my colleagues will be attending the December class in DC [sold out], I'll update with his experience.

Well, speaking of administration of SharePoint 2007. There is a quick and simple rule to follow. Learn the "logical" hierarchy:

  • Farm
    • Application
      • Site Collection
        • Site

After that, you'll need to figure out what features are available at each level, and viola, you are an admin. You'll also have to learn about special things like Shared Services and parts of the application services. Shared Services are kind of a SOA implementation that is consumed by other applications in the farm.

Most people forget about this hierarchy when setting the admin security. Generally, you have to figure out what part you are currently trying to administer. People forget that just because you are an app admin, it does not mean you can manage a site collection. You can "add yourself" to the site collection admin [role?], but you can't manage the site before that happens [and if you do add yourself to the site collection admin, the operation will get logged in the system log!]. I'll do another entry on planning infrastructure and administration operations for a MOSS 2007 farm.

Ah I forgot about the infrastructure part. The infrastructure is a bit easier to remember, as the infrastructure is probably infinitely more flexible than in SPS 2003. Just keep the front end servers separate from app servers, and you're on the way to achieve a good balance. Next, just monitor the performance, and some other counters, and add servers where it is most necessary.

Next post should be on some development.. how about the admin APIs?

Say No to /3GB in boot.ini

Some time ago, we decided to optimize server settings, as recommended by MVPs and other industry gurus. Basically, if it was in a Power Point slide deck, or in someone's blog, we tried it out. Unfortunately, we didn't quite see The Old New Thing blog entry which set things straight.

Unfortunately, the "/3GB" caused more grief than we anticipated... What happened [even MS Premiere seemed to be dumbfounded for a month] is that after some times we'd get crazy calls from people complaining that they cannot open "large office documents", with "Internet Explorer cannot download ... from ..." error. Sometimes it would be a 1 MB file, sometimes it would be 512 KB.. we'd scratch our heads, observe it only happens on 1 of 2 front-end servers, reboot the machine [after office hours] and try to provide access via alternative means. All for naught, any browser, any protocol, any machine [remote or localhost], same error [except it all worked with the other FE server!]. We called for support, nothing happened. We broke up few portals between another set of servers [new farm], and our problems went away... but only for a while. The traffic picked up, and once again, the problems started appearing.

Luckily, with proper documentation of all symptoms, settings and the hardware, and probably a different MS Support engineer, when we called again, the problem was solved in less than 30 minutes [including wait time]. Bottom line is.. as soon as the memory consumption on the system would exceed 3 GB, the problems would start. Removing the /3GB switch, which we carefully placed there 6 months earlier, was quite embarrassing.

Office 2007 released!

The wait is over, the Office 2007 has been released [although there are only the desktop apps available from MSDN at this time]...

May 04

Working in a Enterprise environment

I've been quiet on my Blog as I've been busy working at an enterprise customer for the last 1/2 year.. working 10-12 hour days. It is a definite change of pace, and the ability to see so many different scenarios [about which I will write soon]. As much as it is exhausting, there are exciting moments when you come up with a solution. The sad part is you start negelecting your family, and when you come home your kids are already in bed.
 
Piotr
 
 
May 19

Are some Site Definition Mods unsupported?

Wherever SPS folks are involved, there is always a discussion what is better, site definitions or site templates. The prevailing opinion is that site definitions are superior as they should not break during upgrades or service packs, and in a way they are upgradeable to some extent [modify + iisreset] to see the changes. Now, we find out that the upgradability of site definitions are iffy as the Support Team does not really agree with what the MSDN team considers to be a good practice in terms of editing site definitions. Check this out:

http://support.microsoft.com/default.aspx?scid=kb;en-us;898631

this piece states that the following activities are not supported:
---
You modify a custom site definition or a custom area definition after you deploy the custom site definition or the custom area definition.

Microsoft does not support modifying a custom site definition or a custom area definition after you create a new site or a new portal area by using that site definition or area definition. Additionally, Microsoft does not support modifying the .xml files or the .aspx files in the custom site definition or in the custom area definition after you deploy the custom site definition or the custom area definition.
---

On the other hand, MSDN tells us that some of this should be "fine":
http://msdn.microsoft.com/library/en-us/odc_sp2003_ta/html/ODC_SPSCustomizingSharePointSites2.asp
---
However, there is no easy way to modify site definitions once they are deployed. There is always the possibility of breaking existing deployed sites derived from the site definition once you modify an existing site definition. You can only add to the site definition once it is deployed.
---

In fact many of the bloggers claim that ability to modify an live site definition is a clear plus. If there is someone that knows Site Definitions it is Kris:
http://weblogs.ilg.com/ksyverstad/archive/2005/02/26/658.aspx
---
Site Definitions have the benefit of allowing you to make changes across all sites created using that definition.
---

Works out that any modifications may not be supported in the long run. Once you deploy. It's for good. I plan on writing the greatest migration tool ever. Just so that I can edit one page on all sites and still believe in MS Support.

Seriously, I hope that someone at MS Support just goofed, and there is a supported way to make modifications to existing sites. Otherwise the only way to be supported is to:
- develop site definitions
- develop site templates from site definitions
- work on individual pages

Where am I - "Breadcrumb" redesigned

For the last few days, I've been working on a 2 webparts that would display a "breadcrumb" allowing people to navigate more easily.

Surely, there is an existing webpart from Lead-It, but it wasn't exactly the thing I needed to do the job:

- I wanted breadcrumb for the portal
- I wanted breadcrumb to work inside Lists
- I wanted breadcrumb to be able to navigate up the folder hierarchy in Lists that support folder structures [some variation of document libraries]....

There were few things along the way that were "worthy" of mentioning from the perspective of the quirky API discovery.

1. Portal Areas

First, I struggled a bit with the Portal API, trying to apply the same style of API to retrieve information about the current Area. On the WSS side of the API we have the SPControl that easily tells us where we are in terms of the Web or Site (or even a Module). Unfortunately, there is no corresponding API that performs an equivalent action in terms of the Area management. The documentation does not yet cover Portal Area quick samples, so I was stuck going through pretty much most of the API looking for "Area" stuff....

Quick search, and voila, the Microsoft.SharePoint.Portal.SiteData Namespace hides all Area related things. Next, browsing through the OM we find the AreaManager. The AreaManager.GetArea requires context and a GUID. Getting context is simply done via static PortalContext.Current, and then, all you need is the ID. There are two ways to get the ID. One of them is via PageInfo.CategoryID [which is undocumented] or, as it works out, via the old WSS API.. using the Web.ID property. Since each Portal Area is simply a WSS Site [w/ no subwebs], you can still use most of the WSS APIs especially to get the GUID of the current web/area. 

SPWeb _Web = SPControl.GetContextWeb(Context);
Area _Area = AreaManager.GetArea(PortalContext.Current, _Web.ID);

Next steps were simple. From the area all we needed to go up the chain of command (assuming that the user had access to the parent areas):

while (currArea.ParentID != Guid.Empty)
{
    currArea = AreaManager.GetArea(PortalContext.Current, currArea.ParentID);
    //do whatever
}

2. List name

It works out that getting a name of the list was an equally arduous task. I mean, I'm not sure if I get it right. Basically, Google groups had nothing on the subject.. and I had a hunch I could use one of the built-in controls, the ListProperty control from the Microsoft.SharePoint.WebControls namespace.

I'd simply treat this control like any other Asp.Net control and retrieve its properties at runtime... So, after declaring it, I added it to the CreateChildControls() and set the Property property (?) to Title, and add it to the webpart controls.

_lpTitle = new ListProperty();
_lpTitle.Property = "Title";
Controls.Add(_lpTitle);

The next step was to retrieve the title from the control. Unfortunately the "List" property did not give me the expected results - somehow it always returned empty string. Instead I decided to retrieve the information that would be rendered on the page. In order to achieve this, I used a separate HtmlTextWriter based on a StringBuilder so I could get the information back...

StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);

HtmlTextWriter output2 = new HtmlTextWriter(sw);
_lpTitle.RenderControl(output2);

output2.Flush();
string _listTitle = sb.ToString();

SPList _list = web.Lists[_listTitle];

This worked like a charm, until I realized that even on a custom ASPX page the control would still return a Title of a list that would exist on a page as a webpart... to make things easier for myself, I just added a parameter that would toggle visibility of the list title in the webpart...

3. Document library Folders

Last, but not least, was a way to get the folder hierarchy. Luckily for us this is visible with a naked eye. While navigating through the folders, the page view is always the same, and only the query string on top of the page changes... The querystring is "RootFolder", so unless it is not empty, I could assume that someone was navigating through the folders.

string rootFolder = this.Context.Request.QueryString["RootFolder"];

Next, all I needed was to figure out what the folder represented.. and get all of its parents [as with areas] and build the breadcrumb. Out of the box I tried the web.GetFolder(rootFolder) and it worked like a charm.... Next by comparing the folder.url to folder name, I was able to tell if the place where we ended up was the actual URL of the List.

web = SPControl.GetContextWeb(Context);
SPFolder folder = web.GetFolder(rootFolder);

while (folder.Name.CompareTo(folder.Url) != 0)
{
    //do some work here
    folder = folder.ParentFolder;
}

And to get parent webs..

while(web.ParentWeb != null)
{
    // DO SOME WORK  
    web = web.ParentWeb;
}

This is pretty much it. Overall fairly simple once you can get through the clutter of the different pieces of the API.