Master SQL Server 2005

Friday, November 26, 2010

Choosing Joins Over Subqueries

Hello Readers !!!

Colleagues @ my office always ask me - "When you can fetch the data from the database table using subqueries, then what is the need of joins".

And I am feed up of answering them again and again. And that has inspired me to write this post.

The data what you fetch is the same, irrespective of the methodology used. But the difference occurs in the time used to fetch the records.

In order to explain this properly, I have a simple example.

Example:

- Imagine you have a Employee table, with employee id as its primary key. And you also have Departments table with department id as its primary key respectively.

- Use the following example to see the difference by yourself:

BEGIN

DECLARE @START_TIME_1 DATETIME,@END_TIME_1 DATETIME,@START_TIME_2 DATETIME,
@END_TIME_2 DATETIME,@TOTAL_TIME_1 DATETIME,@TOTAL_TIME_2 DATETIME;

SELECT EMP.EMPID , EMP.EMPNAME, DEPT.DEPARTMENT_NAME
FROM EMPDETAILS EMP, DEPARTMENT DEPT
WHERE EXISTS
(
SELECT DEPTID FROM DEPARTMENT DEPT
WHERE EMP.DEPTID = EMP.DEPTID
);

SET @END_TIME_1 = GETDATE();
SET @TOTAL_TIME_1 = DATEDIFF(millisecond,@START_TIME_1,@END_TIME_1);

SET @START_TIME_2 = getdate();

SELECT EMP.EMPID, EMP.EMPNAME, DEPT.DEPARTMENT_NAME
FROM EMPDETAILS EMP
INNER JOIN DEPARTMENT DEPT
ON EMP.DEPT_ID = DEPT.DEPTID;
SET @END_TIME_2 = getdate();
SET @TOTAL_TIME_2 = DATEDIFF(millisecond,@START_TIME_2,@END_TIME_2);

SELECT @START_TIME_1 AS [FIRST_QUERY_START];
SELECT @END_TIME_1 AS [FIRST_QUERY_END];

SELECT @START_TIME_2 AS [SECOND_QUERY_START];
SELECT @END_TIME_2 AS [SECOND_QUERY_END];

SELECT @TOTAL_TIME_1 AS [FIRST_QUERY_TOTAL];
SELECT @TOTAL_TIME_2 AS [SECOND_QUERY_TOTAL];

END

Just see the difference in milliseconds, or even seconds in FIRST_QUERY_TOTAL and SECOND_QUERY_TOTAL columns.

Following are the proposed outputs.
1. Both the columns will display same values, if the amount of data (in the above tables described) is less.
2. But as the number of records in the tables increases, joins become more efficient than subqueries.

- So when the data to be fetched is to be fetched from a table with huge records, then prefer joins over sub-queries.

If you like this post, then please pray for the victims of 26/11 attack @ Mumbai.

Thanking You,
Akshay Sahu
(MCTS SQL Server 2005, OCJP [Java SE 6])SET @START_TIME_1 = GETDATE();

Sunday, July 18, 2010

Understanding Heap Structure in SQL Server 2005

Hello !!!

Hope the reader is in good state while going through this article!!!!

Indexing, a very important technique to master, when it comes to SQL Server DB Administration.

What is an Index????

English Definition: Something that serves as a guide or facilitates the path to the destination.

* The above definition, if understood properly, provides the entire structure of an index in SQL Server 2005.

*  An index is something, that facilitates the MS SQL Server to find the desired data by utlizing less amount of system resources and time.

Heap Data Structure present in SQL Server 2005:

*    In order to understand a heap structure, i'll take you across a scenario, which will inscribe heaps in your memory. This one is for the people who doesn't know what Indian City lanes are made up of !!

*    Imagine you're at Hyderabad, India and you're looking at Charminar and simultaneously adoring its art and beauty.

*    A person steals your cell phone (Say O2 or BlackBerry [With BlackBerry Messenger activated.] and you find that man running away. Your reflexes gets you started n you start chasing the theif. And after sometime the theif vanishes into the lanes around Charminar ( If anyone is unaware then always remember that the Charminar area contains around 200 lanes approx....).

*    Now you are panic stricken as you do not know your exact location. And again your reflexes compells you to start finding a way out.

*    Let the epicenter be some lane 10. From lane 10 you go across each and every lane present in that area to find a way out..

*    You come across 80 lanes, but in vain. And in the 81 lane, you find some way out. But that lane turns out to be a dead end lane and you again come back to lane 10.

*    And from lane 120, you again find a way out to the middle of the road. On following it, you're out of that area, safe and sound.

*    This is what happens to SQL Server, when the data to be searched is in a heap structure.

*    The server has to scan each and every data page to find the desired data. This becomes a time consuming process if the size of the database is huge (say yahoo mail users database).

*    The INDID (Index Id) of a heap structure is '0' (Zero). And you can find the heap structured objects in 'dbo.sysindexes' table.

SELECT
    OBJ.name AS [INDEX NAME],
    INDXDETS.indid AS [INDEX ID],
    INDXDETS.FirstIAM AS [FIRST PAGE ADDRESS],
    INDX.type_desc AS [STRUCTURE]
FROM
    dbo.sysindexes INDXDETS
    INNER JOIN sys.indexes INDX ON INDX.object_id = INDXDETS.id
    INNER JOIN dbo.sysobjects OBJ ON OBJ.id = INDXDETS.id
WHERE
    INDXDETS.indid = 0
GO

*    When the user issues a select statement against a heap, the server goes to dbo.sysindexes table to fetch the starting point(Address) to search. This address is given in FirstIAM column of dbo.sysindexes table.

*    The data page at that address contains a pointer to the address of subsequent data pages, which in turn will contain pointers pointing to the address of different data pages.

*    Unless server fetches the desired data, it keeps on scanning these pages. Thus wasting a lot of system resources as well as time.

*    Thus, in order to improve the performance of your server, try implementing database indexing on the highly used columns.

* The basic working of an heap is depicted in the following picture.

Akshay Sahu

Microsoft Certified Technology Specialist

(SQL Server 2005)

Thursday, July 15, 2010

Differences between User Defined Function(s) and Stored Procedure(s)

Hello!!! Hope the reader is in good state while reading this article!!!!

T-SQL FUNCTIONS:

A block of T-SQL code , that returns a single value / result set (Scalar Value, Table) after processing input (User Entered or Default value).

STORED PROCEDURE:

A common name given to a block of T-SQL code, which when executed, provides output to the user.

Aha!! The above two definitions look similar to each other and on a vigilant look, they both mean the same. Then what could be the difference. Are they really the same or is there any difference between them??

Yes, there are a lot of differences between a T-SQL function and a T-SQL Stored Procedure.

The following list provides you with most of the them.

1. Functions will always return a single value / result set (if the function is table-valued / inline). Whereas a stored procedure may (0....n) or may not return any value / result set to the user.

2. Functions are compiled only once in their entire lifespan, whereas you have an option of recompiling a stored procedure against each request (this may degrade your server performance).

3. Functions will only have input parameters, but no output parameters. Whereas a stored procedure can either have input parameters, output parameters or both.

4. During function execution, the user has to specify the names and values (Scalar Value / DEFAULT keyword) of all the input parameters present in its definition, even if the input parameter posses default value. If even a single input parameter is missed, then the server throws an error. This is not applicable to Stored Procedures.

Example:

* If suppose a function contains 4 input parameters and they all have a default value, and while calling that function, the user does not specifies the third input parameter, then the Server, instead of taking the default value of the 3 parameter from the definition, will throw an error and immediately halts the function execution.
* Whereas, if a stored procedure is having an input parameter with some default value, and a user while calling the sp does not specify that parameter, then the server takes the default value of that parameter from its definition and continues executing the stored procedure.

5. Functions cannot change the data present in permanent tables, it only supports DSL (Data Selection Language). Whereas you can perform all DML operations using a stored procedure.(DSL + DML).

6. Functions does not support TCL (Transaction Control Language) and EHL (Error Handling Language) . TCL includes (BEGIN TRANSACTION, COMMIT TRANSACTION, ROLLBACK) and EHL includes (TRY...CATCH).Whereas Stored Procedures provides complete support to TCL and EHL.

7. A user cannot use non-deterministic functions in User Defined functions, except for GETDATE() (this method got included from SQL Server 2005). Whereas this is not applicable to stored procedures.

8. Scope of a local or global temporary table in a function is limited to the parent function only. But the scope of a local or a global temporary table is accessible across the children in a stored procedure.

Example:
* If suppose a user has created a local temporary table in a function and is immediately calling another function. Then that particular temporary table is not accessible to the called(Child) function.
* Whereas in stored procedure, if a user creates a local temporary table and calls another stored procedure from there, then that local temporary table is accessible to the called(Child) SP.

9. You cannot call stored procedure using SELECT statement, whereas you can call a function using it.

I have found these differences while performing different operations. And if the reader finds some, then please share it!!!!!

Akshay Sahu
Microsoft Certified Technology Specialist (SQL Server 2005)

Tuesday, July 13, 2010

Introduction to SQL Server 2005

Hello!!!

Microsoft SQL Server 2005 is a database management system developed by Microsoft Corporation, for managing data.

The sentence above makes a naive reader feels that this software is another DBMS, developed by a better organization. But hold on!!! This DBMS is no odd software. The list of features shipped with this DBMS can help many administrators maintain and manage their production data in a structured manner.

My Liking towards SQL Server:

1. Implementation of Codd Rules:

- Microsoft's SQL Server obeys almost all the 12 Codd rules for RDBMS.
- And I have liked the implementation of Rule 4 (Dynamic Online Catalog Based on the Relational Model). 'Information_Schema' Schema, that contains views that opens up the entire database to the administrators and the users. This is the easiest and the most convenient way of fetching database details. I have worked around on different DBMSs like (Oracle 11g, MYSQL Server). But found Microsoft's style to be the most pragmatic and easy to master.

2.SQL Server Management Studio

SQL Server Management Studio is the GUI tool used for operating SQL Server 2005. An excellent software that instills great impression on the end users, when used. SSMS provides a professional approach of managing database objects and performing administration operation(s).

Key Features in SQL Server 2005:

1. Database Mirroring

Database Mirroring is the feature that makes the production database available at all times. This the most yearned feature since the inception of Databases in this world. SQL Server 2005 provides excellent options for implementing Database Mirroring,

2. Data Partitioning

A benchmark feature that utilizes multiple storage devices for storing data based on some bifurcating condition. Due to this, data archiving, data backup and brisk data read is made possible. And also the user is provided with an option of going either for horizontal or vertical partitioning.

3. Web Services

A standard protocol (HTTP , SOAP) can utilized to open the data to different applications.

4. Dedicated Administrator Connection(DAC)

When the user workstation runs out of memory, the administrator can still connect to the SQL Server instance by using the feature called DAC.

5. Online Database Restore / Online Indexing

Features like online database restore and online indexing helps administrators to hide the changes, that are made to the production databases, from the end users.

If I continue writing about all the features and their advantages, then Google will enjoin me from using Google Blogs.

But the usage of each and every feature will be posted later. My coming posts will describe and teach the readers the steps to perform several database administration activities.

But till then, take care

With Regards,
Akshay Sahu
(Microsoft Certified Technology Specialist: SQL Server 2005)