Monday, December 17, 2007

SQL Basic Question

Delete duplicate records: - Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateValueColumn1, DuplicateValueColumn2 and DuplicateValueColumn3.

DELETE
FROM     MyTable
WHERE   ID NOT IN
            (SELECT             MAX(ID)
        FROM                     MyTable
        GROUP BY   DuplicatevalueColumn1, DuplicateValueColumn2, DuplicateValueColumn2)

------------------------------------------------------------------------------------------UNION
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.

UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.

The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.

A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.

Example:
Table 1 : First,Second,Third,Fourth,Fifth
Table 2 : First,Second,Fifth,Sixth

Result Set:
UNION: First,Second,Third,Fourth,Fifth,Sixth (This will remove duplicate values)
UNION ALL: First,First,Second,Second,Third,Fourth,Fifth,Fifth,Sixth,Sixth (This will repeat values)

------------------------------------------------------------------------------------------

NEWSEQUENTIALID() and NEWID() both generates the GUID of datatype of uniqueidentifier. NEWID() generates the GUID in random order whereas NEWSEQUENTIALID() generates the GUID in sequential order.

Let us see example first demonstrating both of the function.

USE AdventureWorks;
GO
--Create Test Table for with default columns values
CREATE TABLE TestTable
(NewIDCol uniqueidentifier DEFAULT NewID(),
NewSeqCol uniqueidentifier DEFAULT NewSequentialID())
--Inserting five default values in table
INSERT INTO TestTable DEFAULT VALUES
INSERT INTO TestTable DEFAULT VALUES
INSERT INTO TestTable DEFAULT VALUES
INSERT INTO TestTable DEFAULT VALUES
INSERT INTO TestTable DEFAULT VALUES
--Test Table to see NewID() is random
--Test Table to see NewSequentialID() is Incremental Sequence
SELECT * FROM TestTable
--Clean up database with droping column
DROP TABLE TestTable
GO

ResultSet:

NewIDCol                             NewSeqCol
------------------------------------ ------------------------------------
DC896759-8B6B-4A62-8EC8-970BE2F0F04C 3D540550-2138-DC11-BF85-00123FD0986A
71129CD4-3A25-470A-AE8E-9475379EC6A7 3E540550-2138-DC11-BF85-00123FD0986A
B915CCDB-F480-4D89-ADD7-40D5DDE4FD52 3F540550-2138-DC11-BF85-00123FD0986A
5920DA2E-6CC2-4FE3-9C13-F5155B1923A0 40540550-2138-DC11-BF85-00123FD0986A
6C60BFD9-ACE1-4F74-BB74-99343A3A707D 41540550-2138-DC11-BF85-00123FD0986A

It is clear from example of that NEWSEQUENTIALID() generates GUID in hexadecimal incremental interval. The hexadecimal number can be any placed in GUID. In the example above first two digits are incrementing sequential.

Function NEWSEQUENTIALID() can not be used in SQL queries and it can be only used in DEFAULT clause of table. NEWSEQUENTIALID() are predictable, in case of privacy or security use NEWID() instead of NEWSEQUENTIALID(). If tried to use in QUERY it will thrown an error. NEWID() will work perfectly fine when used in queries.

Example:

--This will thrown an error
SELECT NEWSEQUENTIALID()
GO

ResultSet:
Msg 302, Level 16, State 0, Line 1
The newsequentialid() built-in function can only be used in a DEFAULT expression for a column of type ‘uniqueidentifier’ in a CREATE TABLE or ALTER TABLE statement. It cannot be combined with other operators to form a complex scalar expression.

--This will return GUID
SELECT NEWID()
GO

ResultSet:
————————————
26CE6817-B138-413A-92AD-A2F2BBF7E0B8

------------------------------------------------------------------------------------------

HAVING specifies a search condition for a group or an aggregate function used in SELECT statement.
HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause.

A HAVING clause is like a WHERE clause, but applies only to groups as a whole, whereas the WHERE clause applies to individual rows. A query can contain both a WHERE clause and a HAVING clause. The WHERE clause is applied first to the individual rows in the tables . Only the rows that meet the conditions in the WHERE clause are grouped. The HAVING clause is then applied to the rows in the result set. Only the groups that meet the HAVING conditions appear in the query output. You can apply a HAVING clause only to columns that also appear in the GROUP BY clause or in an aggregate function. (Reference :BOL)

Example of HAVING and WHERE in one query:

SELECT titles.pub_id, AVG(titles.price)
FROM titles INNER JOIN publishers
ON titles.pub_id = publishers.pub_id
WHERE publishers.state = 'CA'
GROUP BY titles.pub_id
HAVING AVG(titles.price) > 10

Sometimes you can specify the same set of rows using either a WHERE clause or a HAVING clause. In such cases, one method is not more or less efficient than the other. The optimizer always automatically analyzes each statement you enter and selects an efficient means of executing it. It is best to use the syntax that most clearly describes the desired result. In general, that means eliminating undesired rows in earlier clauses.

------------------------------------------------------------------------------------------What is difference between DISTINCT and GROUP BY?

A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.

Example of DISTINCT:

SELECT DISTINCT Employee, Rank
FROM Employees

Example of GROUP BY:

SELECT Employee, Rank
FROM Employees
GROUP BY Employee, Rank

Example of GROUP BY with aggregate function:

SELECT Employee, Rank, Count(*) EmployeeCount
FROM Employees
GROUP BY Employee, Rank

------------------------------------------------------------------------------------------------------------

UDFs (User Defined Function) vs. Stored Procedures :-

UDFs and stored procedures are both SQL Server objects that store one or more T-SQL statements in a single named, executable routine. Although you can often implement the same or similar functionality using either a UDF or a stored procedure, the code will look significantly different depending on which technique you choose. Here are the main differences between UDFs and stored procedures:

  • A UDF must return a value-a single result set. A stored procedure can return a value-or even multiple result sets-but doesn't have to.
  • You can use a UDF directly in a SELECT statement as well as in ORDER BY, WHERE, and FROM clauses, but you can't use a stored procedure in a SELECT statement.
  • A UDF can't use a nondeterministic function such as GETDATE(), NEWID(), or RAND(), whereas a stored procedure can use such functions. A nondeterministic function is one that can return a different result given the same input parameters.
  • A UDF can't change server environment variables; a stored procedure can.
  • A UDF always stops execution of T-SQL code when an error occurs, whereas a stored procedure continues to the next instruction if you've used proper error handling code.

Both UDFs and stored procedures can perform well, depending on how you write the code. To determine whether a UDF or a stored procedure would yield the best performance in a particular implementation, you should do performance testing. . . .

Google