Friday, December 16, 2011

Options to reinitialize subscriptions in SQL Server replication

Problem

I have transactional replication configured in my production environment with multiple subscribers. The business team has requested that one of the subscriptions be reinitialized, because they think there is some missing data. In this tip we look at the different options that you can use to reinitialize a subscription for transactional replication.

Solution

If we come across this requirement there are different ways by which we could achieve this. In this tip, we will accomplish this by using SQL Server Management Studio (SSMS) and Replication Monitor. Note, both options assume transactional replication is already configured in your server.

Option 1 : Using SQL Server Management Studio (SSMS).

In SSMS go to Replication -> Local Publications -> Locate your publication and expand it. The below screenshot shows the details of the subscriptions.

Suppose, you wish to reinitialize only the subscription [PHOENIX].[REP_S1]. To do this, right click that subscription and select the 'reinitialize' option and you will get a dialog window as shown in the below screenshot.

Here, we have the option to select either 'Use the current snapshot' or 'Use a new snapshot'. The use current snapshot will use the existing snapshot and the use a new snapshot will use a new current snapshot. Based on your requirement, select the desired option and then click on 'Mark for Reinitialization' which will enable you to reinitialize the subscription.

If you wish to reinitialize all subscriptions, you would need to right click on the publication and select 'Reinitialize All Subscriptions' as shown in the screenshot below, which would enable you to reinitialize all your subscriptions.

Option 2: Using Replication Monitor

In SSMS go to Replication -> right click on replication and select 'Launch Replication Monitor', as shown below.

The 'replication monitor' screen should open as shown below. On the left pane, under 'My publishers', click on the publisher node and expand to get a list of the subscriptions.

In the 'All Subscriptions' tab, you need to select the appropriate subscription and click on 'Reinitialize Subscription' which would enable you to reinitialize only that subscription in the list. Once done, you would encounter the same window (image 2) as shown in option 1, when you reinitialize a subscription and you would need to either select the existing snapshot or opt for a new one.

If you wish to reinitialize all your subscriptions using replication monitor, you could just right click on the publication node and select 'Reinitialize All Subscriptions' as shown below.

Both options could be tested easily by configuring a simple replication setup and performing the sequence of steps as shown above.

Things to note:

If you select "Use a new snapshot" the snapshot process will run automatically as long as the job is not disabled.
If you select the reinitialize option this will start automatically as long as the distribution agent is running continuously and the snapshot exists. If the distribution agent is scheduled to run at intervals then the snapshot would be applied to the subscription the next time this job runs.
It is important to know the implications of reinitializing a subscription. When this is run all data at the subscriber will be replaced with the new data for all articles in the subscription. Refer to this tip about implications of applying a new snapshot: http://hemant-vikram.blogspot.com/2011/12/space-impact-of-replication-snapshot.html
It is not possible to reinitialize a subscription using a backup. If you try you will get this error: "This subscription already exists".
The above steps were performed using SQL Server 2008 R2, but should be similar for SQL Server 2005 and later.

Next Steps

Consider testing this scenario through a simple transactional replication setup

Author: Mohammed Moinudheen

Space Impact of Replication Snapshot Agent Job in SQL Server

Problem

I have noticed that available disk space on my SQL Server is getting low and I see a number of files getting generated in the replication snapshot folder. Because of these files there is a space issue on my server. Why are these files here and what can I do to resolve this problem?

Solution

The snapshot agent job is used by default in all types of replication in SQL Server. It is used for generating an initial snapshot of all the articles that need to be replicated to subscribers. Refer to the screenshots below to check the location of the snapshot folder.

On clicking 'Properties', we can see the snapshot folder location.

Based on the schedule of the snapshot agent job, there will be snapshot agent files getting generated in the snapshot folder. There will be folders with names like 20110420061766 getting created. This depicts the time the snapshot agent was run. There are usually files similar to below that are generated in the snapshot folder for each article that is published.

Directory of E:\MSSQL2K5\MSSQL.1\MSSQL\repldata\unc\ServerName_R_PUB_R_PUB\20110421070872

NULL
04/21/2011 07:08 AM

.
04/21/2011 07:08 AM

          ..
04/21/2011 07:08 AM                55 emp_2.bcp
04/21/2011 07:08 AM               172 emp_2.idx
04/21/2011 07:08 AM               468 emp_2.pre
04/21/2011 07:08 AM               622 emp_2.sch

Some of these files are related to copying the schema of the article, indexes, constraints and the actual data in the BCP files through which the snapshot gets generated.

In this example, only one table 'emp' was selected to be published and we could well imagine the amount of files that would get generated if we are taking a snapshot of a large number of articles. Depending on the type of replication we chose; snapshot, transactional or merge, a new snapshot will get generated every time the snapshot agent job runs. Considering this, if a snapshot agent job is scheduled to run often on a server, then we would see similar set of files as shown above getting generated per published article every time the job runs. This would lead to considerable space constraints on the server if left unchecked.

Cleanup Jobs

In order to counter this, we need to be aware of the important replication maintenance jobs that gets created when replication is configured on the server. The maintenance job Distribution clean up: distribution is of considerable significance as it removes replicated transactions from the distribution database. This clean up job also deletes any old snapshot files that were created by the snapshot agent job keeping only the latest files that get generated the last time this job ran. So, we need to ensure that this job and the Agent history clean up: distribution job is scheduled to run periodically to mitigate space issues that arises due to the replication configuration.

Compression

Another option that is available to minimize the space requirement for snapshot files is the 'Compress snapshot files in this folder option'. Similar to previous screenshots, go to Replication-> Local Publication-> Right click and go to 'Properties' and click on 'Snapshot' tab as shown below.

As compression is not supported in the default snapshot folder, we need to enter the relevant path in the 'Put files in the following folder' option. Any UNC network share location or a relevant path with the necessary read/write privileges to the snapshot, distributor or merge agent could be provided in the 'Put files in the following folder' option. Enabling the 'Compress snapshot files in this folder' option would ensure that the snapshot files get compressed when they are generated. This would help in reducing the size of the files, but the trade off would be more CPU utilization with the additional overhead of uncompressing the files at the subscriber when applying the snapshot. This option must be thoroughly rehearsed based on your specific requirement before implementing in a production environment to verify the pros and cons of this feature.

Run Additional Scripts

Another notable feature that is available is the 'Run additional scripts' option as seen in the above screenshot. In this, we could run the scripts on the subscriber before and after applying the snapshot. Basically, we need to provide the location of the script in the 'Before applying the snapshot, execute this script' option. Similarly, provide the valid path in the 'After applying the snapshot' option. This would be handy especially in cases where the publisher, distributor and subscriber are on the same server. In this scenario, we could do a validation check of space availability on the server before and after applying the snapshot which would be a useful alerting mechanism in case of known space issues on the replication server. In the case where there were dedicated servers for publisher, distributor or subscriber, this option could be used based on the requirement to run relevant checks on any of the servers.

Time and Space Requirements

When dealing with large databases, it is recommended to verify the time and space requirement of generating a snapshot by performing a sample BCP of selected articles as the snapshots are created using the BCP utility. This gives better judgment for scheduling the snapshot agent appropriately as it would be quite resource intensive on the server while the snapshot agent is running. Estimating the space requirement of the snapshot agent job depends on the size and the number of articles that are chosen to be published. Below are some sample BCP commands for reference.

Run this command in command prompt if logged on using windows authentication. You would need to substitute the name of the table that you are testing with. This command will copy rows from a table to a data file.

bcp publisher_database.article_name out emp1.dat -T -c

The output for the command is as below. From this, we could get an estimate of the time taken using BCP and also the size of the BCP file.

It may not be easy to get an exact value of the space that would be needed by just issuing a BCP command. This is because additional files get generated by running the snapshot agent job directly depending on the items that are chosen for publication like indexes, constraints, stored procedures, etc. But running the sample BCP commands and maintaining a baseline when dealing with huge publisher databases would be useful especially if we face issues during initial replication set up. Similar to the above BCP command, we could copy the contents from the data file to the table in subscriber database using below command.

bcp subscriber_database.article_name in emp1.dat -T -c

Additional Note

Moreover, based on the kind of replication we configure; snapshot or transactional, the snapshot agent would be acquiring relevant locks on the replication tables. Keeping this in mind, it is advisable to schedule the snapshot agent job to run during non-business hours so as to enhance performance in replication.

Next Steps

Ensure to check the schedule of your snapshot agent job while configuring SQL Server replication
Ensure to check that the replication maintenance jobs are scheduled appropriately and are enabled
Ensure to configure alerting mechanism for the replication maintenance jobs in case it fails

Author: Mohammed Moinudheen

Wednesday, December 14, 2011

XML Methods in SQL Server

In my last article, “Working with the XML Data Type in SQL Server,” I included examples that returned data from columns and variables of the XML data type. Based on those examples, it might seem that retrieving XML data is similar to retrieving objects configured with other data types. However, that’s true only when returning the entire XML value. If instead you want to work with individual components within an XML instance, such as when retrieving the value of a single attribute, you must use one of the five methods that available to the XML data type—query(), value(), exist(), nodes(), or modify().

To use an XML method, you call it in association with the database object configured with the XML data type, as the following syntax illustrates:

DbObject.XmlMethod('XQuery'[, 'SqlType']) [AS TableAlias(ColumnAlias)]

The DbObject placeholder refers to a column, variable, or parameter configured with the XML data type. After the object name, you add a period, following by the name of the XML method. Next you provide, in parentheses, the arguments required for the specific method.

Most of the XML methods require only one argument: an XQuery expression that identifies the XML components to be retrieved or modified, and any actions to be taken on those components. XQuery, a powerful scripting language used specifically to access XML data, contains the elements necessary to create complex expressions that can include functions, operators, variables, and values. MSDN provides a complete XQuery language reference that you can refer to while you work with the XML methods.

In addition to the XQuery expression, an XML method might also require a Transact-SQL data type as a second argument or a table and column alias tagged on after the arguments. As we work through the article, you’ll see when and why these additional components are necessary.

To demonstrate how the various XML methods work, I’ve written a number of examples that use the methods to retrieve and modify data. The examples are based on the Stores table, which I created and populated with the following Transact-SQL script:

USE AdventureWorks2008R2

IF OBJECT_ID('Stores') IS NOT NULL

DROP TABLE Stores

CREATE TABLE Stores

(

StoreID INT PRIMARY KEY,

Survey_untyped XML,

Survey_typed XML(Sales.StoreSurveySchemaCollection)

);

INSERT INTO Stores

VALUES

(

292,

'<StoreSurvey>

<BankName>United Security</BankName>

<Specialty>Mountain</Specialty>

<Product>Mountain</Product>

<Product>Racing</Product>

</Products>

<Product>Jerseys</Product>

<Product>Jackets</Product>

<Product>Shorts</Product>

</Products>

</StoreSurvey>',

(SELECT Demographics FROM Sales.Store

WHERE BusinessEntityID = 292)

);

Notice that the table includes the Survey_untyped column, which is an XML untyped column, and the Survey_typed column, which is an XML typed column. The schema collection I associated with the typed column is already included in the AdventureWorks2008R2 database, which is where I’ve created the table. The INSERT statement in the Transact-SQL above also shows the XML document that I added to the untyped column. For the typed column, I retrieved the following XML document from the Demographics column (a typed XML column) in the Sales.Store table in the AdventureWorks2008R2 database:

<BankName>United Security</BankName>

<Specialty>Mountain</Specialty>

</StoreSurvey>

Notice that the XML includes the schema associated with that data. The schema comes from the same schema collection I used when I created the typed column in the Stores table.

The data I added to the untyped column is nearly identical to the typed XML. The untyped XML doesn’t include the namespace information, of course. However, it does include additional product information, which we’ll use to demonstrate the XML methods. So now that we have setup out of the way, let’s look at how those methods work.

The query() method

The query() method retrieves a subset of untyped XML from the target XML instance. It’s probably the simplest and most straightforward of the XML methods. You need only specify the database object, the method name, and an XQuery expression, as shown in the following syntax:

DbObject.query('XQuery')

You’ll notice that the XQuery expression is entirely enclosed in single quotes and those are enclosed in parentheses. The trick with any XML method is to get the XQuery expression right. But at its simplest, the expression must define the path of the element you want to retrieve.

In the following example, I use the query() method to return data from the <StoreSurvey> element in both the typed and untyped columns:

SELECT

Survey_untyped.query('/StoreSurvey')

AS Info_untyped,

Survey_typed.query('declare namespace

ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey";

/ns:StoreSurvey') AS Info_typed

FROM

Stores;

For the untyped column, I specified the column name (Survey_untyped) followed by a period and then the method name (query). Within the parentheses and single quotes, I defined the XQuery expression (/StoreSurvey). That’s all there is to it. Because the <StoreSurvey> element is the root node, the entire XML document is returned.

The XQuery expression for the Survey_typed column is a bit more complex. Because it is a typed column, the expression should be preceded by a namespace declaration. The namespace must be the same as the one referenced within the XML document stored in the typed column.

To declare a namespace, you specify the declare namespace keywords, followed by the alias name (in this case, ns). After the alias, you include an equal sign (=) and then the schema path and name. You then end the declaration with a semi-colon (;) to separate it from the main body of the XQuery expression. After you declare the namespace, you can use the alias within the expression to reference that namespace.

The expression itself is nearly identical to the one used for the untyped column, except that you must precede the element name with the namespace alias and a colon (ns:). As with the untyped column, the expression will return the entire XML document because it specifies only the root node.

Although the preceding example is helpful in demonstrating the basics of using the query() method, it’s not much use beyond that because you can just as easily retrieve all the column contents simply by specifying the column name. However, in the following example, I get more specific by limiting the results to the <AnnualSales> child element:

SELECT

Survey_untyped.query('/StoreSurvey/AnnualSales')

AS Info_untyped,

Survey_typed.query('declare namespace

ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey";

/ns:StoreSurvey/ns:AnnualSales') AS Info_typed

FROM

Stores;

The only difference between this example and the preceding one is that I added /AnnualSales to the XQuery expression for the untyped column and /ns:AnnualSales to the expression for the typed column. The XQuery for the untyped column returns the following results.

And the XQuery for the typed column returns these results:

<ns:AnnualSales xmlns:ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey">800000</ns:AnnualSales>

The primary difference between the two is that the data returned by the typed column includes the namespace information. The element values themselves (800000) are the same in both columns. If the <AnnualSales> element had included its own child elements, those too would have been displayed. For instance, the following example retrieves the <Products> elements, which are child elements of <StoreSurvey>, just like <AnnualSales>:

SELECT

Survey_untyped.query('/StoreSurvey/Products')

AS Products

FROM

Stores;

Because the XML document includes two <Products> elements and those elements each include several <Product> child elements, the SELECT statement returns all product-related elements, as shown in the following results:

<Product>Mountain</Product>

<Product>Racing</Product>

</Products>

<Product>jerseys</Product>

<Product>jackets</Product>

<Product>shorts</Product>

</Products>

If you want to return a specific element and its child elements, you can do so by referencing one of its attributes. For example, suppose you want to include only products from the Bikes category. To do so, you can modify your XQuery expression as follows:

SELECT

Survey_untyped.query('/StoreSurvey/Products[@Type="Bikes"]')

AS BikeProducts

FROM

Stores;

Now the XQuery expression includes bracketed information that specifies the Type attribute and its value. The attribute name is preceded by the at (@) symbol, and the attribute value is preceded by an equal sign (=) and enclosed in double quotes. As a result, the SELECT statement now returns the following XML fragment:

<Product>Mountain</Product>

<Product>Racing</Product>

</Products>

As you would expect, only elements whose products are in the Bikes category are returned. You could have just as easily specified “Clothes” rather than “Bikes,” and you would have received the product elements related to clothing.

The value() method

The value() method returns a scalar value from the targeted XML document. The returned value is converted to the data type you specify when you call the method. The value() method makes it easier to work with individual values in order to compare or combine them, either with other XML values or values of different types.

As the following syntax illustrates, when you call the value() method, you must specify the XQuery expression and the Transact-SQL data type for the returned data:

DbObject.value('XQuery', 'SqlType')

For example, suppose you want to pull the amount of sales from the <AnnualSales> element in either the typed or untyped column in the Stores table. You can specify the path in the XQuery expression plus a numerical data type as the second argument, as shown in the following SELECT statement:

SELECT

Survey_untyped.value('(/StoreSurvey/AnnualSales)[1]', 'int')

AS Sales_untyped,

Survey_typed.value('declare namespace

ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey";

(/ns:StoreSurvey/ns:AnnualSales)[1]', 'int') AS Sales_typed

FROM

Stores;

For the untyped column, the XQuery expression includes the element path as you saw in earlier examples. However, after the path, you must add an integer in brackets that indicates which element of that name you should retrieve. More often than not, you will simply add [1], as I’ve done here. This assures that, if there are multiple elements with the same name, only the first one will be returned. It also assures that only one element is being referenced at a time. In fact, even when there is only one element, as is the case with <AnnualSales>, you must still include the [1] because the value() method expects a singleton value.

The second argument passed into the value() method is the name of the data type, in this case, int. That means an int value will be returned by the method. Note that the method’s two arguments must each be enclosed in single quotes and separated with a comma.

As for the typed column, the namespace declaration and path are also similar to what you’ve seen in earlier examples. The only difference is that this expression must also include the [1], just as we did it for the untyped column. As a result, for each column, the SELECT statement returns a single int value of 800000, without any additional element information.

You can also retrieve an attribute value, rather than an element value, by specifying the name of the attribute in your element path. For example, the following SELECT statement retrieves the value of the Type attribute for the second instance of the <Products> element:

SELECT

Survey_untyped.value('(/StoreSurvey/Products/@Type)[2]', 'varchar(10)')

AS ProductType

FROM

Stores;

The expression includes [2], rather than [1], in order to retrieve data from the second instance of <Products>. As a result, the SELECT statement now returns the varchar value Clothes.

As stated earlier, the value() method is also handy if you want to combine or compare data. In the following example, I create a calculated column that’s based on two values returned from the untyped column:

SELECT

Survey_untyped.value('(/StoreSurvey/AnnualSales)[1]', 'int') -

Survey_untyped.value('(/StoreSurvey/AnnualRevenue)[1]', 'int')

AS Expenses

FROM

Stores;

The first instance of value() retrieves the <AnnualSales> value. The second value() instance retrieves the <AnnualRevenue> value. The second value is then subtracted from the first value to return a scalar value of 720000.

You can also use XQuery functions in your expressions. For instance, in the following example, I use the concat function to add a string to the <Specialty> value:

SELECT

Survey_untyped.value('concat("Bike specialty: ",

(/StoreSurvey/Specialty)[1])', 'varchar(25)')

AS Specialty

FROM

Stores;

When you use the concat function, you specify each element that you want to concatenate as an argument to the function and separate those arguments with a comma. The statement returns the value Bike specialty: Mountain.

As you can see, the string has been concatenated with the <Specialty> value. I could have just as easily concatenated multiple element values or added more string values.

The exist() method

The exist() method lets you test for the existence of an element or one of its values. The method takes only one argument, the XQuery expression, as shown in the following syntax:

DbObject.exist('XQuery')

The key to using the exist() method is in understanding the values it returns. Unlike the query() and value() methods, the exist() method doesn’t return XML content. Rather, the method returns one of the following three values:

A BIT value of 1 if the XQuery expression returns a nonempty result
A BIT value of 0 if the XQuery expression returns an empty result.
A NULL value if the XML data type instance is null.

A good way to test how the exist() method works is to use a variable to capture the method’s results, as I’ve done in the following example:

DECLARE @xml XML;

DECLARE @exist BIT;

SET @xml = (SELECT Survey_untyped FROM Stores);

SET @exist = @xml.exist('/StoreSurvey[BusinessType="BM"]');

SELECT @exist;

First, I declared the @xml variable as type XML. Then I declared the @exist variable as type BIT. I set the value of @xml to equal the XML document in the Survey_untyped column. I then used the exist() method on the @xml variable to test for the existence of /StoreSurvey[BusinessType="BM"] within the XML document. In other words, the exist() methods checks whether the <BusinessType> child element exists and whether it contains a value of “BM”.

I then assigned the results returned by the exist() method to the @exist variable and used a SELECT statement to return the contents of the variable. Because the XML document contains this child element and that element has a value of “BM”, the method returns a value of 1. However, if I change “BM” to “BMX”, as I’ve done in the following example, the method returns 0 because the element and value don’t exist exactly as specified:

DECLARE @xml2 XML;

DECLARE @exist2 BIT;

SET @xml2 = (SELECT Survey_untyped FROM Stores);

SET @exist2 = @xml2.exist('/StoreSurvey[BusinessType="BMX"]');

SELECT @exist2;

Now that you have a sense of how the exist() method works, let’s look at how you might use it in your queries. In the following example, I include a WHERE clause that uses the exist() method to again test for the existence of /StoreSurvey[BusinessType="BM"]:

SELECT

Survey_untyped.query('/StoreSurvey/Products[@Type="Bikes"]')

AS BikeProducts

FROM

Stores

WHERE

Survey_untyped.exist('/StoreSurvey[BusinessType="BM"]') = 1;

The WHERE clause compares the results of the exist() method to the number 1. If the results equal 1, that is, if the XML contains the specified element and values, the WHERE clause evaluates to True and the SELECT statement returns the product information, as shown in the following results:

<Product>Mountain</Product>

<Product>Racing</Product>

</Products>

If the exist() method had specified a different <BusinessType> value, as it does in the following example, the SELECT statement would instead return an empty result set:

SELECT

Survey_untyped.query('/StoreSurvey/Products[@Type="Bikes"]')

AS BikeProducts

FROM

Stores

WHERE

Survey_untyped.exist('/StoreSurvey[BusinessType="BMX"]') = 1;

Again, the key to the exist() method is to remember that it returns only one of three values: 1, 0, or NULL.

The nodes() method

The nodes() method can be a bit more slippery to understand than the other XML methods. To begin with, rather than returning XML or scalar values, the nodes() method returns what is essentially a table that includes one column. That means you should use the method only in those parts of a statement that can handle rowset views, such as the FROM clause. It also means that, when you call the nodes() method, you must assign a table alias and column alias to the rowset view returned by the method, as shown in the following syntax:

DbObject.nodes('XQuery') AS TableAlias(ColumnAlias)

The table and column aliases let you reference the rowset view in other parts of the statement. The method is useful when you want to shred an XML document, that is, decompose the document into a relational format. To better understand how this works, let’s look at an example that uses the nodes() method on an XML variable:

DECLARE @bikes XML

SET @bikes =

'<Products>

<Product>Mountain</Product>

<Product>Racing</Product>

</Products>'

SELECT

Category.query('./text()')

AS BikeTypes

FROM

@bikes.nodes('/Products/Product')

AS Bike(Category);

First, I declared the @bikes variable with the XML data type. Then I set its value to equal the XML fragment. In the FROM clause, I invoked the nodes() method on the variable to shred the <Product> elements. My goal was to retrieve the value from each instance of that element. I then provided a table alias (Bike) and a column alias (Category) to identify my rowset view. Now I essentially have a table with a single column that contains a row for each <Product> element.

When you use the nodes() method to create a rowset view of the XML data, you have to use one of the other XML methods to retrieve the contents of that view. In this case, I used the query() method in the SELECT list to return the value of each row. Notice that to retrieve the value I used a period to reference the current context node and the text() node function to retrieve only the element values, as shown in the following results:

Mountain

Road

Racing

If you want to use the nodes() method to retrieve data from a table, you can use the CROSS APPLY operator in the FROM clause to associate the table with the rowset view returned by the method, as shown in the following example:

SELECT

Category.query('./text()')

AS BikeTypes

FROM

Stores CROSS APPLY

Survey_untyped.nodes('/StoreSurvey/Products[@Type="Bikes"]/Product')

AS Bike(Category);

In this case, I specified the XQuery path as StoreSurvey/Products[@Type="Bikes"]/Product' in order to return only the bike-related products from the Survey_untyped column. However the SELECT list itself is the same as the preceding example, so this statement returns the same results as that example.

The modify() method

The methods we’ve looked at up to this point have been concerned only with reading data in an XML document, but the modify() method lets you update that data. As the following syntax shows, the only argument you pass into the modify() method is the XQuery expression:

DbObject.modify('XQuery')

In this case, however, the XQuery expression is actually a special type of expression that uses the XML Data Modification Language (XML DML), which is a SQL Server extension to XQuery. The modify() method lets you define XML DML expressions that can add, update, or delete elements within an XML document.

NOTE: This section covers only the basics of the modify() method. You can find more information about the method and its uses in the XML Data Modification Language Workbench, which provides additional details and examples about modifying XML data.

Adding components to XML data

To add components to your XML data, you must specify the insert keyword and target location in your expression. For instance, the following UPDATE statement adds the <Comments> element to the typed and untyped XML columns in the Stores table:

UPDATE Stores

SET Survey_untyped.modify('

insert(<Comments>Largest bike store in region</Comments>)

after(/StoreSurvey/NumberEmployees)[1]'),

Survey_typed.modify('declare namespace ns=

"http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey";

insert(<ns:Comments>Largest bike store in region</ns:Comments>)

after(/ns:StoreSurvey/ns:NumberEmployees)[1]')

WHERE StoreID = 292;

For the Survey_untyped column, I first specified the modify() method name, followed by the XML DML expression, which I enclosed in parentheses and single quotes, just like the other XML methods. Within the expression, I included the insert keyword and the element I wanted to add: (<Comments>Largest bike store in region</Comments>). Notice that I enclosed the element in parenthesis and included the element’s value.

Next, I added the after keyword and specified the location of where to add the new element. I also included the [1] because the method requires a singleton value. Now the new element will be added after the first instance of the /StoreSurvey/NumberEmployees element.

As you would expect, modifying the Survey_typed column followed the same process, except that I also included the namespace declaration and aliases. An important issue to consider, however, when working with typed XML columns is that any changes you make must conform to the schema that governs the XML content. For example, if I had tried to add the <Comments> element to any other location within the document, I would have received a violation error. However, if you’ve set up your statement properly and adhered to the schema, the new element should be added to the XML document with no problem.

Updating components in XML data

To use the modify() method to update XML data, you must include the replace value of keywords, rather than the insert keyword. You must then specify the component you want to update as well as the updated information. For example, in the following UPDATE statement, I change the value of the <Comments> element:

UPDATE Stores

SET Survey_untyped.modify('

replace value of (/StoreSurvey/Comments/text())[1]

with "2nd largest bike store in region" ')

WHERE StoreID = 292;

After I specified the replace value of keywords, I added the element path along with the text() node function, which let me change only the element’s value (without affecting the element itself). And as with the previous example, I also included a [1] because a singleton value is expected. Next, I specified the with keyword and the new element value, enclosed in double quotes. As a result, when I ran the statement, it replaced the old element value with the new one.

Deleting components from XML data

To use the modify() method to remove data from an XML document, you must specify the delete keyword, followed by the component you want to delete, as shown in the following example:

UPDATE Stores

SET Survey_untyped.modify('delete(/StoreSurvey/Comments)[1]')

WHERE StoreID = 292;

In this case, it was simply a matter of specifying the path to the <Comments> element, in parentheses, after the delete keyword. Of course, all this was followed by [1] to keep the database engine happy.

The XML Methods

The methods available to the XML data type provide you with a set of powerful tools for working with XML data. And as you can see from the examples, most of that power rests in your ability to create XQuery expressions that target the information you want to access. Yet the expressions shown here are relatively basic when compared to how extensive the XQuery language is. In fact, to make the most of what the XML methods offer, you must invest the time necessary to understand the various elements that make up that language. Until then, what I’ve shown you here should provide you with the first steps necessary to start accessing and updating your XML data. Just know that there’s a much bigger universe out there waiting to be discovered.

Author: Robert Sheldon

Working with the XML Data Type in SQL Server

Not all SQL Server data types are created equal. Just look at the XML data type. On the surface, it might seem like your run-of-the-mill type, except, of course, being geared toward XML data; but the ways in which it’s used, how its data is queried, and when and how XML columns should be indexed quickly sets the type apart from the rest of the crowd. And those differences are what matter when working within the extensible world of XML.

The XML data type, in fact, lies at the heart of understanding how to store and query XML data in a SQL Server database. That’s not to suggest that all XML data should be stored with the XML type, but knowing how the type works will help you determine when to use it and how to effectively access its data.

In some cases, you shouldn’t use the XML data type, but instead use large object storage—VARCHAR(MAX), NVARCHAR(MAX), or VARBINARY(MAX). For example, if you simply store your XML documents in the database and retrieve and update those documents as a whole—that is, if you never need to query or modify the individual XML components—you should consider using one of the large object data types. The same goes for XML files that you want to preserve in their original form, such as legal documents. If you need to retain an exact textual copy, use large object storage.

But the rest of the time, you should consider the XML data type. The type ensures that the data is well formed according to ISO standards, and it supports fine-grained queries and modifications to specific elements and attributes within the XML. You can also index an XML column and associate its data with an XML schema collection in order to preserve its content and structure. In addition, the XML data type lets you store data that follows a structure too fluid and complex to fit easily into a relational model.

However, when considering whether to use the XML data type, you should also be aware of its limitations. For instance, an XML column cannot be used as a key in an index, and a data value stored in an XML column cannot exceed 2 GB. You also cannot compare or sort data that uses the XML data type, nor can the data be used in a GROUP BY clause. For a complete description of the limitations on the XML data type, as well as other details about XML, see the topic “Implementing XML in SQL Server” in SQL Server Books Online.

Creating XML Database Objects

SQL Server lets you assign the XML data type to columns, variables, or parameters and store in any of these objects either XML documents or fragments. The data is considered a document if it has a single top-level element. Otherwise it falls under the category of fragment.

NOTE: You can also assign the XML data type to values returned by a function. However, function return values usually require XML components more complex than what we’ll cover in this article. For this reason, functions will be covered in a later article, after those XML components have been discussed.

When you assign the XML data type to a column, variable, or parameter, you can optionally associate an XML schema collection with the object, thus ensuring that data within that object conforms to schema specifications. In such cases, the object is referred to as typed. An XML object with no associated schema collection is considered untyped.

Creating Untyped XML Objects

An untyped XML object still requires that the data be well formed according to ISO standards; however, the data is not bound to an XML schema collection. You should choose untyped XML (the default) if you don’t have a schema to associate with the data or you don’t want to adhere to the constraints imposed by a schema. For instance, you might have a workable schema but might also need to store nonconforming fragments temporarily in the XML column.

To create an XML object in your database, you simply specify the XML data type as you would any other type. For instance, the following Transact-SQL code creates the Resumes tables, inserts data into the table, and then retrieves data from that table:

USE AdventureWorks2008R2;

IF OBJECT_ID('dbo.Resumes') IS NOT NULL

DROP TABLE dbo.Resumes;

CREATE TABLE dbo.Resumes

(

CandidateID INT IDENTITY PRIMARY KEY,

CandidateResume XML

);

INSERT INTO Resumes (CandidateResume)

SELECT Resume

FROM HumanResources.JobCandidate;

SELECT * FROM Resumes;

Notice that the table includes the CandidateResume column, which is configured with the XML data type. Because this is an untyped column, you don’t have to specific any other parameters related to the XML. You can, of course, specify the nullability or other column properties, but XML is all you need to include to create an XML column.

NOTE: I created the examples in this article within the AdventureWorks2008R2 database on a local instance of SQL Server 2008 R2.

When I ran the example above, the INSERT statement added 13 rows from the JobCandidate table into the Resumes table. I verified that the new rows had been added by running the SELECT statement, which also returned 13 rows. Each returned row included an XML value in the CandidateResume column. (Note that, in SQL Server Management Studio, you can click an XML value to open a window that displays the entire XML document or fragment.)

Because the CandidateResume column is untyped XML, the inserted data did not have to conform to a specific schema collection; however, the data still had to conform to the ISO standards that govern XML.

You can create an XML variable just as easily as you can create a column. In the following example, I declare the @Resume variable and set its value equal to the XML data retrieved from the Resumes table for candidate 1:

DECLARE @resume XML;

SELECT @resume = CandidateResume

FROM dbo.Resumes

WHERE CandidateID = 1;

SELECT @resume AS Resume;

As with the column definition, I simply specified the XML data type when I declared the variable. Again, the XML is untyped, which means that the data does not have to conform to a schema collection. When I ran the SELECT statement after declaring and setting the variable, it returned only a single value: the XML data I had inserted into the Resumes table for that candidate.

As mentioned above, you can also assign the XML data type to a parameter. In the next example, I create a stored procedure that’s defined with an XML input parameter:

USE AdventureWorks2008R2;

IF OBJECT_ID ( 'dbo.AddResume', 'P' ) IS NOT NULL

DROP PROCEDURE dbo.AddResume;

CREATE PROCEDURE AddResume

@resume XML

INSERT INTO Resumes (CandidateResume)

VALUES (@Resume);

Notice that the @Resume parameter is untyped XML. The stored procedure uses that parameter to insert data into the Resumes table. After I created the procedure, I declared an XML variable named @Resume2 and assigned XML data to the variable. (I retrieved the data from the JobCandidate table.) I then called the @Resume2 variable when I executed the stored procedure, as shown in the following example:

DECLARE @resume2 XML;

SELECT @resume2 = Resume

FROM HumanResources.JobCandidate

WHERE JobCandidateID = 4;

EXEC AddResume @resume2;

SELECT * FROM Resumes;

The stored procedure adds another row to the Resumes table. The row includes the XML data that I assigned to the @Resume2 variable. When I ran the SELECT statement this time, it returned 14 rows and showed the new row that was added to the table.

Creating Typed XML Objects

If you have a schema collection available to associate with your XML data, you should consider using that collection. Typed XML has several advantages over untyped. To begin with, it provides another level of validation. Not only must the XML data be well formed (based on ISO standards), but it must also conform to the validation constraints and data type specifications defined by the schema. For example, the type information enforces more precise semantics on the XML values. In addition, storage and queries are better optimized than on an untyped column.

However, before you can associate an XML object with a particularly schema, that schema must be registered in the database in which you’ll be defining the object. Only then can you reference that schema when you create your XML object.

NOTE: Registering schema collections is beyond the scope of this article. You can find details about how to register a collection in SQL Server Books Online, in the topic “CREATE XML SCHEMA COLLECTION (Transact-SQL).”

Once you’ve registered your schema collection, you can define your XML column, variable, or parameter. To do so, you must include the name of the collection in parenthesis after the XML data type name. For example, the following Transact-SQL re-creates the Resumes table with a typed XML column:

USE AdventureWorks2008R2;

IF OBJECT_ID('dbo.Resumes') IS NOT NULL

DROP TABLE dbo.Resumes;

CREATE TABLE dbo.Resumes

(

CandidateID INT IDENTITY PRIMARY KEY,

CandidateResume XML (HumanResources.HRResumeSchemaCollection)

);

INSERT INTO Resumes (CandidateResume)

SELECT Resume

FROM HumanResources.JobCandidate;

SELECT * FROM Resumes;

Notice that the CandidateResume column now includes a reference to the schema collection (in parentheses) after the data type name. In this case, I used HumanResources.HRResumeSchemaCollection, which is included in the AdventureWorks2008R2 database. This is the same collection associated with the Resume column in the JobCandidate table.

As you can see, the process for creating a typed XML object is fairly straightforward. As long as the schema collection is registered and you get the name right, you should have no problem. Something worth noting, though. By default, SQL Server permits typed XML objects to store data as either an XML document or fragment. You can override the default behavior and specify that the object be limited to only XML documents. To do so, precede the schema collection name with the DOCUMENT keyword. For example, the following column definition restricts the values to XML documents:

CandidateResume XML (DOCUMENT HumanResources.HRResumeSchemaCollection)

Regardless of whether you limit the XML to documents only, the process for creating a typed XML object is the same for columns, variables, and parameters. For instance, in the following example, I declare a typed XML variable and assign an XML value to the variable:

DECLARE @resume XML (HumanResources.HRResumeSchemaCollection);

SELECT @resume = CandidateResume

FROM dbo.Resumes

WHERE CandidateID = 1;

SELECT @resume AS Resume;

As you can see, this example is nearly identical to my earlier example in which I declared an untyped variable. All I’ve done is add the schema name. Now any XML document or fragment assigned to this variable must adhere to the structure defined in the schema collection. And just like before, if I were to run the SELECT statement along with the other statements, one row with one value would be returned, and that value would be the XML data.

At this point, you can probably see how easy it is to create a typed object. But just in case, here’s what it would look like if you re-created the AddResume stored procedure with a typed input parameter:

USE AdventureWorks2008R2;

IF OBJECT_ID ( 'dbo.AddResume', 'P' ) IS NOT NULL

DROP PROCEDURE dbo.AddResume;

CREATE PROCEDURE AddResume

@resume XML (HumanResources.HRResumeSchemaCollection)

INSERT INTO Resumes (CandidateResume)

VALUES (@Resume);

DECLARE @resume2 XML;

SELECT @resume2 = Resume

FROM HumanResources.JobCandidate

WHERE JobCandidateID = 4;

EXEC AddResume @resume2;

SELECT * FROM Resumes;

Again, there should be no surprises here. After I re-created the table and then re-created the stored procedure, I declared a variable to supply the parameter value to the stored procedure. When I ran the store procedure, it inserted the new row into the Resumes table, and the SELECT statement returned 14 rows.

Indexing an XML Column

Data in an XML column is stored as large binary objects (BLOBs). When no XML index is defined on the column, the database engine shreds the BLOBs at runtime during the query evaluation stage. This process can be quite time-consuming if your XML values are relatively large compared to the values being retrieved. In such cases, if you’re planning to generate a lot of queries, you should consider indexing your column.

There are two types of XML indexes that you can create on an XML column:

Primary: Indexes all tags, paths, and values, along with such details as node type and document order information.
Secondary: One of three types of indexes (PATH, VALUE, and PROPERTY) that target specific query types used to retrieve data from the XML column.

You must create a primary XML index before you can create any secondary XML indexes. In addition, the table that contains the XML column must be configured with a primary key and a clustered index based on that key. The database engine uses the primary key within the primary XML index to correlate rows.

NOTE: You can also create full-text indexes on XML columns. However, the indexes ignore the XML markup and include only element content.

Creating a Primary XML Index

The primary XML index provides a shredded and persisted representation of the data in an XML column. The query process uses the index for queries that target specific components within the XML data. Queries that retrieve the full XML instance—that is, the entire XML document or fragment—do not use the primary XML index, but instead retrieve the data directly from the XML column.

To create a primary XML index, you must specify the table and XML column where the index will be created, as shown in the following example:

CREATE PRIMARY XML INDEX idx_resumes_xml

ON Resumes(CandidateResume);

After you provide the CREATE PRIMARY XML INDEX keywords, you must specify a name for the index (in this case, idx_resumes_xml). Next, add the ON clause, which includes the name of the table (Resumes) and the name of the XML column (CandidateResume).

That’s all there is to creating a primary XML index. Once you’ve done that, you can create one or more secondary XML indexes.

Creating a Secondary XML Index

As mentioned above, SQL Server lets you define three types of secondary XML indexes:

PATH: Supports queries that use a significant number of path expressions.
PROPERTY: Supports queries that use path expressions to retrieve multiple values from individual XML instances.
VALUE: Supports queries that retrieve values without knowing the element or attribute names that contain those values.

You can define any combination of secondary XML indexes on your XML column, as long as a primary XML index has first been defined on that column. For instance, you can define PATH and VALUE indexes or PATH and PROPERTY indexes or only one index or all three.

NOTE: To fully understand the differences between the three types of secondary XML indexes, you need a basic understanding of how you query specific XML components (as opposed to querying the entire XML instance). In my next article on XML, I’ll cover the various methods used to access the individual components within an XML document or fragment.

Creating a secondary XML index is slightly more complicated than creating a primary one, but not too terribly difficult. As the following example illustrates, you must first specify an index name as well as the target table and column:

CREATE XML INDEX idx_resumes_xml_value

ON Resumes(CandidateResume)

USING XML INDEX idx_resumes_xml

FOR VALUE;

The first two lines of this statement should look similar to the statement you use to create a primary XML index. The only thing missing is the PRIMARY keyword. In the CREATE XML INDEX clause, you provide a name for the index (in this case, idx_resumes_xml_value). In the ON clause, you provide the name of the table (Resumes) and the name of the XML column (CandidateResume).

The next clause is USING XML INDEX. This is where you specify the name of the column’s primary XML index (idx_resumes_xml). Next, you define the FOR clause, which specifies the type of secondary index you want to create. In the example above, I created a VALUE secondary index.

That’s all there is to creating a secondary index. If you want to create a PROPERTY or PATH index on the same column, you simply change the index name and the value in the FOR clause.

You can verify that the primary and secondary XML indexes have been created by using the sys.xml_indexes catalog view to retrieve a list of indexes. For example, I used the following SELECT statement to retrieve those indexes whose names contain ‘resume’:

SELECT name AS IndexName

FROM sys.xml_indexes

WHERE name LIKE '%resumes%';

The statement should return the following results:

IndexName

idx_resumes_xml

idx_resumes_xml_value

You can remove these indexes from the database by using a DROP INDEX statement, as shown in the following example:

DROP INDEX idx_resumes_xml ON Resumes;

This statement drops the primary XML index from the XML column. When you drop a primary index, the secondary indexes are automatically removed. As a result, if I were to re-query the sys.xml_indexes catalog view, I would not see any of the XML indexes I created on the Resumes table.

Moving Forward with XML

As the examples have demonstrated, you can implement XML database objects and create XML indexes on columns with relative ease. And inserting data into those columns is little different from inserting data into other column types, assuming the data conforms to the type’s structure. The ingredient that’s missing from this mix is how you work with that data once you get it in there.

As it turns out, the XML data type supports several methods for querying and manipulating XML data. Unfortunately, that’s a discussion that must wait till my next article, in which I’ll cover each of these methods and show you how to use them to work with XML data. Until then, you should now have a basic overview of the XML data type and how to get started using it when creating columns, variables, and parameters. From this foundation, you’ll be ready to jump into the world of the XML methods in no time at all.

Author: Robert Sheldon