Google

Feb 11, 2014

Dynamically generating "insert" SQL statments

There are times where you need to generate "INSERT" SQL statements from existing data. For example, you may have a production release ready "Delete" SQL statement as shown below to remove some existing records.


Delete from employee_table WHERE first_name like ('P%')

For this script, you need to have a rollback script. It can be dynamically generated as shown below.

SELECT 'INSERT INTO employee_table ( first_name, surname, birth_date, created_date, updated_by) VALUES (' 
|| '''' || first_name || ''' '
|| ',''' || surname || ''' '
|| ',''' || convert(CHAR(20),birth_date) || ''' '
|| ',''' || convert(CHAR(20),created_date) || ''' '
|| ',''' || updated_by || ''' '
FROM employee_table  WHERE first_name like ('P%')


Alternatively, there are database management tools like TOAD, DBArtisan, SQL Developer, etc that allows you to generate SQL statements from the selected records. At times those generated insert statements need to be massaged like removing the auto generated primary key ids, timestamp,  etc. You can make the changes in Notepad++ Find/Replace with regular expression feature. For example, the following generated insert statement


INSERT INTO employee_table (emp_id, first_name, surname, birth_date, created_date, updated_by, timestamp)
   VALUES ( 1365, 'Peter', 'Smith', '03 Dec 1950', '03 Jun 2013 11:15:59.456 AM', 'user1', 0x0000090209F4A6E6)

can be modified with the following regular expressions

For:   emp_id

Find: \(\s* emp_id,
Replace with: (

Find: VALUES\s*\(\s*\d+,
Replace with: VALUES (


For:  timestamp

Find: ,timestamp
Replace with:


Find:0x[0-9A-G]+,
Replace with:

Labels:

Feb 4, 2014

Transaction management in SQL -- Sybase example




You may also like:

Q. How do you perform transaction management in SQL stored procedures?
A. Handled with begin, commit, and rollback 'tran' commands in Sybase server.

Q. Why is it important?
A. It is important to leave the database in a consistent state.

Q. Can you give an example?
A. For example, as demonstrated below, if you are going to delete 21 records from a database table, you can use transaction management to ensure that a GIVEN COUNT say 21 records are either deleted or rolled back. Partially deleting records can leave the database  in inconsistent state.


print 'Before delete from employee_table'
print '-------------------------------'

DECLARE  @EMPLOYEE_IN_CLAUSE  varCHAR(10000)
Select @EMPLOYEE_IN_CLAUSE = "'John', 'Joseph'"

exec ('select * from semployee_table where first_name in (' +  @EMPLOYEE_IN_CLAUSE +  '))
exec ('select count(*) as deleted_transact_records_count from employee_table where first_name in (' +  @EMPLOYEE_IN_CLAUSE + ') )

declare @rowcount int
select  @rowcount = 0

--transaction starts
begin tran

--Deletion
print 'delete some records from employee_table'
exec ('delete from sd_wrap..employee_table where first_name in (' +  @EMPLOYEE_IN_CLAUSE + ') and bean_name = ''TransactDetail''')

--number of rows deleted
select @rowcount = @rowcount + @@rowcount

--commit or rollback
if @rowcount = 21
begin
   commit tran
   print 'success'
end
else
begin
   rollback tran
   print 'failed'
end


print 'After delete from employee_table'
print '-------------------------------'

exec ('select * from employee_table where first_name in (' +  @EMPLOYEE_IN_CLAUSE +  '))
exec ('select count(*) as deleted_transact_records_count from sd_wrap..employee_table where first_name in (' +  @EMPLOYEE_IN_CLAUSE + ') )

go


In Sybase, @@rowcount variable returns the number of rows affected by the query. This post also demonstrates a production ready script with proper print statements, transaction management, etc to perform a basic DELETE operation. Same thing is true for insert and update operations.

Q. Do you require any other script if the above deletion causes unexpected  issue?
A. Yes, you need a rollback script with 21 insert statements to revert the changes. In other words script to put the deleted data back.

Q. Is there a smarter way to generate the insert script or will you type them in one by one?
A. There is a smarter way using a select statement to generate all the 21 insert statements with the following query.

SELECT 'INSERT INTO employee_table ( first_name, surname, salary) VALUES (' 
|| '''' || fist_name || ''' '
|| ',''' || surname || ''' '
|| ',''' || salary 

|| ')'

FROM employee_table 
WHERE first_name in ('John', 'Joseph');

Another SQL generation tip with Excel spreadsheet.

You may also like:

Labels: ,

Nov 10, 2013

SQL Subquery interview questions and answers

Here are some beginner to intermediate level SQL interview questions and answers. This post compliments my other posts.

Q. What is a subquery?
A. Subquery or Inner query or Nested query is a query in a query. A subquery is usually added in the WHERE clause of the sql statement. A subquery can be nested inside a SELECT, INSERT, UPDATE, or DELETE statement or inside another subquery.  Subqueries are an alternate way of returning data from multiple tables.


Q. Can you create a subquery in a From clause?
A. Yes. Subqueries can be used in From, Where and Having clauses. For example, in Sybase


select * 
from 
(
    select 'A' as colVal
    union 
    select 'B' as colVal
) data

Returns:

colVal
------
A
B

Joining virtual tables is one of the most powerful feature of subqueries. Virtual in this context means the result set you are joining is built on the fly. Here is a more advanced example:

  declare @clientId varchar(30),
          @reportDate date, 
          
  set nocount on

  select reportId   from   
          Report_Status s,   
          ReportKey k,  
          ReportGroupKey gk,  
    
           --subquery in from clause
          (select  max(s.createddttm) as maxdate, k1.clientId from   
              Report_Status  s,   
              ReportKey k1,  
              ReportGroupKey gk      
           where   k1.InactiveFlag ='N'  
             and  gk.InactiveFlag ='N'  
             and  gk.KeyId = k1.Id  
             and  gk.Id = s.GroupKeyId  
           group by k1.clientId  
          )  maxdates    
            
      where k.InactiveFlag ='N'  
      and  gk.InactiveFlag ='N'  
      and  gk.KeyId = k.Id  
      and  gk.Id = s.GroupKeyId  
      and s.CreatedDtTm = maxdates.maxdate  
      and k.ClientId = @clientId 
      and maxdates.ClientId = k.ClientId
      and k.reportDate = @reportDate 
   


Q.What is a correlated subquery?
A. A query is called correlated subquery when both the inner query and the outer query are interdependent. For every row processed by the inner query, the outer query is processed as well. The inner query depends on the outer query before it can be processed.

 
SELECT outer.product_name FROM product outer 
  
 WHERE outer.product_id = (SELECT inner.product_id FROM order_items inner 
                            
                            WHERE outer.product_id = inner.product_id);


If a subquery is not dependent on the outer query it is called a non-correlated subquery.




Q. What are the advantages and disadvantages of using a subquery?
A.

Advantages:

  • Subqueries allow you to use the results of another query in the outer query.
  • Subqueries in some complex SQL queries can simplify coding and improve maintainability by breaking down the complex query into a series of logical steps.
  • In some cases, subqueries are easier to understand than complex joins and unions.

Disadvantages:

  •  When a subquery is used, the query optimizer of the database server may have to perform additional steps like sorting the results, etc. Hence, in some cases subqueries can be less efficient than using joins. So, favor joins to subqueries.

You may also like:

Labels: ,

Jun 24, 2013

Excel spreadsheet to generate SQL

When you have some data in tabular (e.g. Excel spreadsheet) format and would like to insert into a database table, you need to write an SQL insert query. Manually writing SQL query for multiple records can be cumbersome. This is where Excel spreadsheet comes in handy as demonstrated below. A single SQL query can be copied down where the formulas get copied with incrementing column numbers.

The  Excel concatenate character & is used to achieve this. The $ means fix. $a1 means fix excel column A. When you copy the formula, the row numbers will be incremented like 2,3,4, etc, but the column will remain fixed to A. In the example below
  • $A$1 = first_name
  • $B$1 = surname
  • $C$1 = age

Note: Both column and row are prefixed with $, which means both are fixed.


The Excel formula is

 

="insert into person ("&$A$1&", "&$B$1&", "&$C$1&") values ('"&$A2&"','"&$B2&"',"&$C2&")"

The above Excel expression is easier to understand if broken down as shown below where the concatenation character & plays a major role in combining static text within quotes with dynamic formulas like $A$1.

 
"insert into person ("

&

$A$1

&

", "

&

$B$1

&

", "

$C$1

&

") values ('"

&

$A2

&

"','"

&

$B2

&

"',"

&

$C2

&

")"


The generated SQL statement will be
 

insert into person (first_name, surname, age) values ('Peter','Smith',35)


This SQL can be copied down in Excel to get SQL for all rows

 

insert into person (first_name, surname, age) values ('Peter','Smith',35)    
insert into person (first_name, surname, age) values ('John','Smith ',12)    
insert into person (first_name, surname, age) values ('Eddy','Wayne',32)    




You can create other SQL statements using the above technique from a table of data for better productivity.


If you have date column, the use the following formula to convert it to a text.


TEXT($D2,"dd/mm/yyyy")


You may also like:

Labels: , ,

Jun 5, 2013

Sybase Interview Questions and Answers



In my experience, many investment banks use Sybase as their database management system.


Q. How would you go about creating a new stored procedure in Sybase?
A. The basic steps involved are

1. Check if the procedure already exists?
2. If it already exists drop the procedure
3. create a new procedure with input arguments
4. declare any additional variables to be used in the procedure
5. assign values to these variables
6. create temporary tables and load some data
7. join other tables with temp tables to process data
8. handle error  scenarios
9. Keep track of the row counts
10. add logging statements where required.


Here is a very simplified example, covering a number of key aspects of a stored procedure.

-- select the schema to use
use my_db_schema 
go

IF OBJECT_ID('dbo.my_stored_proc') IS NOT NULL
BEGIN
    DROP PROCEDURE dbo.my_stored_proc
    IF OBJECT_ID('dbo.my_stored_proc') IS NOT NULL
        PRINT '<<< FAILED DROPPING PROCEDURE dbo.my_stored_proc >>>'
    ELSE
        PRINT '<<< DROPPED PROCEDURE dbo.my_stored_proc >>>'
END
go

CREATE PROC dbo.my_stored_proc  
(   
    @ParentName      varchar(30) = null,
    @ParentBatchJobId   int = null
)   
AS   

--declare variables
DECLARE @error  int,   
    @rowcount   int,   
    @error_message  varchar(240),  
 @error_number   int,
 @NumberOfDays   int,
 @DeleteDate datetime,
    @TotalRows  int,
 @proc_name  varchar(30),   
 @now        datetime
 
 
--assign values to your variables
  SELECT  @proc_name  = object_name(@@procid),
          @now     = GETDATE(),
    @rowcount     = 0,
          @TotalRows = 0

  
   select  @NumberOfDays = NumberValue,
           @DeleteDate = DateValue
        from    ConfigTable
        where   type = 'basic' and name='purge'
       
 -- error handling from last select
  SELECT @Error = @@Error   
        IF @Error <> 0
        BEGIN   
            SELECT @error_number = 50010
            SELECT @error_message = "Failed: could not get number of days parameter.(" + CONVERT(varchar(10), @Error) + ") "
            GOTO Error_Exit   
        END
 
 
 --Create a temporary table select the parant ids that need to be deleted
 CREATE TABLE #tmp_parent_ids
    (   
       ParentId    numeric(12,0)  not null
    )
    
    insert  into #tmp_parent_ids 
    select distinct(parent_id) from dbo.parent_table where createddttm < @DeleteDate
    
 
 
 if @NumberOfDays > 0
 BEGIN
 
   -- execute another stored proc and get an output to be used in this stored proc
    EXEC another_stored_proc "BatchJob", @NewNum = @ChildBatchJobId Output  

    SELECT @Error = @@Error   
       IF @Error <> 0 OR @ChildBatchJobId IS NULL   
       BEGIN   
         SELECT @error_number = 50050
         SELECT @error_message = "Failed: could not get the next BatchJobId from the SequenceNumber table.(" + CONVERT(varchar(10), @Error) + ") "
         GOTO Error_Exit   
        END    
  
  
  //do som logic like purging rows
    BEGIN   
           
            SET ROWCOUNT @PurgeRows   
       
            BEGIN TRANSACTION   

            DELETE child_table
            FROM child_table ct, #tmp_parent_ids tmp
            WHERE ct.parent_id = tmp.ParentId
       
            SELECT  @Error = @@Error, @Rows = @@Rowcount
            IF @Error != 0   
            BEGIN   
                SELECT @error_number = 50100
                SELECT @error_message = "Failed: (" + CONVERT(varchar(10), @Error) + ") could delete rows"
                ROLLBACK TRANSACTION       
                GOTO Error_Exit   
            END   
 
 END
 

success_exit:
      RETURN (0)
 
Error_Exit:  

     SELECT @error_message = "Error processing the proc" + " " + + @error_message;
  RAISERROR @error_number @error_message   
     RETURN (1)
  
go

IF OBJECT_ID('dbo.my_stored_proc') IS NOT NULL
    EXEC sp_procxmode 'dbo.my_stored_proc','unchained'
go
IF OBJECT_ID('dbo.my_stored_proc') IS NOT NULL
    PRINT '<<< CREATED PROCEDURE dbo.my_stored_proc >>>'
ELSE
    PRINT '<<< FAILED CREATING PROCEDURE dbo.my_stored_proc >>>'
go
IF OBJECT_ID('dbo.my_stored_proc') IS NOT NULL
    GRANT EXECUTE ON dbo.my_stored_proc TO user_admin
go




@@variable_name like @@Error, @@procid are Sybase variables. @variable_name like @Error are user defined stored proc variables.

Labels: , ,

May 28, 2013

SQL Interview Questions and Answers on deleting records



The following is a very popular SQL job interview question.

Q. What is the difference between "Truncate" and "Delete" commands?
A
  • TRUNCATE TABLE_NAME always locks the table and page but not each row, whereas  DELETE statement is executed using a row lock, each row in the table is locked for deletion.
  • Truncate removes all the records in the table whereas delete can be used with WHERE clause to remove records conditionally. That is remove only a handful number of records.
  • Truncate performance is much faster than Delete, as its logging is minimal wheres the Delete command logs every record.
  • Truncate does not retain the identity, whereas DELETE command retains the identity. When you use Truncate, If the table contains an identity column, the counter for that column is reset to the seed value that is defined for the column.
  • Truncate cleans up the object statistics and clears the allocated space whereas Delete retains the object statistics and allocated space.
  • TRUNCATE is a DDL (Data Definition Language) and DELETE is a DML (Data Manipulation Language). 
  • Data removed by TRUNCATE command cannot be generally rolled back unless the database server specifically supports it. The DELETE command can rollback a transaction.
  • The TRUNCATE command does not fire any triggers, whereas the DELETE command fires any triggers defined on the table. For example, to keep an audit trail of records that have been deleted by inserting the deleted records into an audit table via the DELETE triggers.

Q. When will you use a truncate command?
A.TRUNCATE is useful for purging a table with huge amount of data. Alternatively, you can drop the table and recreate it that makes sense. Firing a delete command instead of a truncate command to empty a table with millions of records can result in locking the whole table and also can take longer time to complete, and at times cause the machine to hang.

The truncate command is executed as shown below.

TRUNCATE TABLE table_name



Q. Which command will you use to periodically purge data from your tables as part of a house keeping job?
A.  Use a DELETE command within a transaction with a WHERE clause to remove data that are older than 7 years. Remove  large amount of data in batches as opposed to in a single transaction.

Q. How will you delete a few records from single table
A.

DELETE FROM parent p WHERE p.parent_name = 'Peter'


Q
. How will you delete a few records from parent and child tables where the parent table with parent_name = 'Peter'?
A.

Firstly, you need to delete the child records because the integrity constraint won't let you delete the parent record when there are child records.

DELETE child

FROM  parent p, child c

WHERE p.parent_id = c.parent_id

  AND p.parent_name = 'Peter'


Now, the parent table can be deleted as shown below

DELETE FROM  parent p WHERE p.parent_name = 'Peter'


Note: Please note the difference in syntax when you make a join with the child. When there is only a single table involved, it is "DELETE FROM table_name", but when there is a join, it is "DELETE table_name" and then the "FROM" with the join clauses.


Q.  What do you do with the PURGE command?
A. The purge command is used to clear the recycle bin. It is generally used with the DROP command. For example,

drop table tablename purge;


the above  command will clear away the table from database as well as from the recycle bin. After executing  the purge command, you cannot retrieve the table using a flashback query. 



You may also like:

Labels:

Apr 5, 2013

SQL interview questions and answers - scenarios based

Q. How will you go about identifying duplicate records in a table
A. The following SQL query will do the trick

SELECT code, user_name, COUNT(user_name) AS NumOccurrences
  FROM tbl_user
GROUP BY code, user_name
HAVING ( COUNT(user_name) > 1 )


Q. How would you go about deleting the duplicate records?
A. You could do it in a number of steps as shown below.
  • Create a temporary table.
  • Insert the unique records into the temporary table.
  • Delete the records from the original table.
  • Insert the saved single records from the temporary table back to the original table.





Q. How will you go about searching for table and column names that you don't know where they really are? For example, search for a column name to find out in which tables they do exist.
A. You need to query the database system tables. For example, in Sybase, you can query it as shown below.

select a.name, b.name
from sysobjects a, syscolumns b
where a.id = b.id
and b.name like '%split_income%'

Q. How will you go about writing an SQL query for the following scenario?

Valuation table with the following columns portfolioid, accountid, balance, inactiveflag, valuationdttm, and typecd. The portfolio table has columns portfolioid, and portfoliocd.  The account table has columns accountid and accountcd.

Write an SQL query to extract out the accountcd and the corresponding balance for a given portfoliocd and valuationdttm. Please note that there will be multiple balance records for each account, and your query must only extract out a single balance record per account based on the rule 'extract the record with minimum value for typecd'.

A. As you can see in the sample answer below, inner joins are used to join with the relevant tables. A sub query is used  to calculate the min(typecd) to extract the record with minimum value for typecd.



select  acc.accountcd, val.balance
from valuation val
        inner join portfolio pf on pf.portfolioid = val.portfolioid
        inner join account acc on acc.accountid = val.accountid
where pf.portfoliocd = 'AR30'
and   val.valuationdttm = '28 Dec 2012'
and   val.inactiveflag = 'N'
and   acc.inactiveflag = 'N'
and   val.typecd = (select min(val2.typecd) from valuation val2 where val2.valuationdttm = val.valuationdttm and val2.inactiveflag = 'N' and val2.accountid = val.accountid group by accountid)
order by acc.accountcd




Q. If you need to map actual values retrieved from the database to some other value and then sort by these translated values as well, how will you go about accomplishing this in your SQL code?

For example, StatusCd is the column in the Portfolio table,  and it can have the values of New, and Processed. But the SQL query should return a status of 'Excluded' if the ExcludedFlag column is set yes, and 'Sent' if the SentDateTime is not null. iIf none of the above conditions are met, then return the StatusCd as in the database. The sorting needs to be carried out in the order of 'New', 'Processed', 'Sent', and then 'Excluded'.


A. This can be achieved with a switch/case statement. The syntax of switch/case statement can vary among databases. Here is a sample SQL based on Sybase database server.


SELECT PortfolioCd, SentDateTime, ExcludedFlag, StatusCd as ActualStatusCd, 
 case when p.ExcludedFlag = 'Y' then 'Excluded'
 
                    else case when p.SentDateTime is null then p.StatusCd 
                                 else 'Sent'
                   end
            
end as EvaluatedStatusCd
       
 FROM Portfolio p WHERE valuationdttm > '09 Jan 2013'  and InActiveFlag = 'N'
 
 ORDER BY case when p.ExcludedFlag = 'Y' then '4' 
                    else case  when p.SentDateTime is not null then '3'
                           else case when p.StatusCd = 'New' then '1'
                                        when p.StatusCd = 'Processed' then '2'
                                 end
                          end
              end,  

             PortfolioCd   
    


Q. How would you retrieve a date time column converted to string and formatted as dd/mm/yy hh:mm:ss
A. You can use specif functions provided by your database server. These functions are specific to the database server you are using, hence your code cannot be ported to other database servers. Here is an example in Sybase.

SELECT PortfolioCd, 
      convert(char(11), p.SentDateTime, 103) + convert(char(12), p.SentDateTime, 108) as SentDateTime
FROM Portfolio p 
WHERE valuationdttm > '09 Jan 2013'  and InActiveFlag = 'N'

In the above example, the convert function is used to convert the date time field to char. The 103 in Sybase means dd/mm/yy format and and 108 to convert to the time format hh:mm:ss.


Q. How will you go about tuning your SQL and stored procedures?
A. You can use tools like DB Artisan, TOAD, etc to analyze the query plan. The code (in Sybase) below gives you the elapsed time.

Q. How will you go about tuning your SQL and stored procedures?
A. You can use tools like DB Artisan, TOAD, etc to analyze the query plan. The code below gives you the elapsed time.
DECLARE @start datetime, @stop datetime
SET @start = GETDATE()

exec MY_PROC 'AC345', '02 Jan 2013', null, 'N'

SET @stop = GETDATE()
select datediff(ms, @start, @stop)


Proper indexing is key to get good performancee out of your SQL queries.

Q. What are all the different types of indexes?
A. There are three types of indexes

Unique Index: does not allow the field to have duplicate values if the column is unique indexed. Unique index can be applied automatically when primary key is defined.

Clustered Index: reorders the physical order of the table and search based on the key values. Each table can have only one clustered index.

NonClustered Index: does not alter the physical order of the table and maintains logical order of data. Each table can have 999 non-clustered indexes.


You may also like:

Labels:

Jul 24, 2012

SQL Interview Questions and Answers: storing a tree structure in a database table



Q. How will you represent a hierarchical structure shown below in a relational database? or How will you store a tree data structure into DB tables?



A.The hierarchical  data is an example of the composite design pattern. The entity relationship diagrams (aka ER diagram) are used to represent logical and physical relationships between the database tables. The diagram below shows how the table can be designed to store tree data by maintaining the adjacency information via superior_emp_id.



as you can see the "superior_emp_id" is a foreign key that points to the emp_id in the same table. So, Peter has null as he has no superiors. John and Amanda points to  Peter who is their manager or superior and so on.

The above table can be created using SQL DDL (Data Definition Language) as shown below.


CREATE TABLE employee (

emp_id          NUMBER (4) CONSTRAINT emp_pk PRIMARY KEY,
emp_name        VARCHAR2 (40) NOT NULL, 
title           VARCHAR2 (40),   
dept_id         NUMBER (2) NOT NULL,
superior_emp_id NUMBER (4) CONSTRAINT emp_fk REFERENCES employee(emp_id)

CONSTRAINT emp_pk
PRIMARY KEY NONCLUSTERED (emp_id)

)


This can be represented as an object model  to map relational data as shown below


public class Employee {

   private Long id;
   private String name;
   private String title;
   private Employee superior;
   private Set subordinates;

   //getters and setters are omitted

}


Q. How will you find out the superior for an emplyee?
A. You can use a self-join to find the manager of an employee

Select e.emp_id,e.emp_name, title from
employee e, employee s
where e.superior_emp_id = s.employee_id
  and e.emp_id = 3

This should return

1, Peter, cio 


Q. Is there any other way to to store tree structure in a relational database?
A. Yes, it can be done using the "modified preorder tree traversal" as described below.






As shown in the diagram above, each node is marked with a left and right numbers using a modified preorder traversal as shown above. This can be represented in a database table as shown below.



As you can see the numbers indicate the relationship between each node. All left values greater than 6 and right values less than 11 are descendants of  6-11 (i.e Id: 3 Amanda). Now if you want to extract out the 2-6 sub-tree for Amanda you can write the SQL as follows

SELECT * FROM employee WHERE left_val BETWEEN 6 and 11 ORDER BY left_val ASC;

Which will return Amanda, Ralph, and Jeanne.

If you want to get ancestors to a given node say 7-8 Ralph, you can write the SQL as follows

SELECT * FROM employee WHERE left_val < 7 and right_val > 8 WHERE ORDER BY left_val ASC;


Which will return: Peter and Amanda.

If you want to find out the number of descendants for a node, all you need is the left_val and right_val of the node for which you want to find the  descendants  count. The formula is

No. of descendants = (right_val - left_val -1) /2

So,  for 6 -11 Amanda, (11 - 6 - 1) /2 =  2 descendants
       for 1-12  Peter, (12 - 1 -1 ) / 2 = 5 descendants.
       for 3-4   Mary, (4 -3 - 1) / 2 =  0, means it is a child and has no descendants.

The modified preorder traversal is a little more complicated to understand, but is very useful.

You may also like:

Labels:

Oct 12, 2011

SQL Interview Questions and Answers



Relevant SQL interview Questions and Answers

Q: Explain inner and outer SQL joins?
A: Joins allow database users to combine data from one table with data from one or more other tables (or views, or synonyms). Tables are joined two at a time making a new table containing all possible combinations of rows from the original two tables.


Inner joins: Chooses the join criteria using any column names that happen to match between the two tables. The example below displays only the employees who are executives as well.

SELECT emp.firstname, exec.surname 
FROM employees emp, executives exec 
WHERE emp.id = exec.id;   


Left Outer joins: A problem with the inner join is that only rows that match between tables are returned. The example below will show all the employees and fill the null data for the executives.

SELECT emp.firstname, exec.surname 
FROM employees emp left join executives exec ON emp.id  = exec.id;  

Right Outer join: A problem with the inner join is that only rows that match between tables are returned. The example below will show all the executives and fill the null data for the employees.

SELECT emp.firstname, exec.surname 
FROM employees emp right join executives exec ON emp.id  = exec.id;   

Full outer join: To cause SQL to create both sides of the join

SELECT emp.firstname, exec.surname 
FROM employees emp full join executives exec ON emp.id  = exec.id;  

Self join: A self-join is a join of a table to itself. If you want to find out all the employees who live in the same city as employees whose first name starts with “Peter”, then one way is to use a sub-query as shown below:

SELECT emp.firstname, emp.surname 
FROM employees emp 
WHERE city IN (SELECT city FROM employees where firstname like ‘Peter’)


Q. Explain a sub-query? How does a sub-query impact on performance?
A. It is possible to embed a SQL statement within another. When this is done on the WHERE or the HAVING statements, we have a subquery construct.


Q. What is subquery useful for?
A. It is used to join tables and there are cases where the only way to correlate two tables is through a subquery.

SELECT emp.firstname, emp.surname 
FROM employees emp 
WHERE emp.id NOT IN (SELECT id FROM executives);

There are performance problems with sub-queries, which may return NULL values. The above sub-query can be re-written as shown below by invoking a correlated sub-query.

SELECT emp.firstname, emp.surname 
FROM employees emp 
WHERE emp.id NOT EXISTS (SELECT id FROM executives);

The above query can be re-written as an outer join for a faster performance as shown below:

SELECT emp.firstname, exec.surname 
FROM employees emp left join executives exec on emp.id  = exec.id AND exec.id IS NULL;  

The above execution plan will be faster by eliminating the need for a sub-query.


Q. Can you give SQL examples for the following scenarios?
A.

Scenario 1: Retrieve first name and sum of order qty for order sum greater than 25, and group the order sum by first name.?

SELECT FIRSTNAME,SUM(QTY) 
FROM orders 
GROUP BY FIRSTNAME 
HAVING SUM(QTY)>25;


Scenario 2: Retrieve all employees whose name has a String "au"?

SELECT * 
FROM employees emp 
WHERE emp.firstname LIKE ‘%au%’;

Scenario 3: select  account number and adviser code for a given adviser code, but restrict the returned values to supplied min and max limit. For example, record 1 to record 10, record 11 to record 20, etc.

The SQL for the above scenario needs to use some custom SQL parameters and functions. The example below uses the ROWNUM variable that keeps track of the row numbers in Oracle.  The nested query shown below can limit the returned results based on a lower and upper limit.

select * from 
    (select a.ACCOUNT_NO, a.ADVISER_CODE, ROWNUM rnum from 
       ( Select * from accounts where ADVISER_CODE=:advCode order by advCode)  a 
where ROWNUM <= :max_row) 
    where rnum  >=  :min_row
 

Q: In your experience, what are some of the common mistakes developers make?
A:

1. Cartesian joins

SQL Joins are used to relate information in different tables. A Join condition is a part of the sql query that retrieves rows from two or more tables. A SQL Join condition is used in the SQL WHERE Clause of select, update, delete statements.

The Syntax for joining two tables is:

SELECT col1, col2, col3 
FROM table_name1, table_name2
WHERE table_name1.col2 = table_name2.col1;

If a sql join condition is omitted as shown below

SELECT col1, col2, col3 
FROM table_name1, table_name2

or if the condition is invalid, then the join operation will result in a Cartesian product. The Cartesian product returns a number of rows equal to the product of all rows in all the tables being joined. For example, if the first table has 20 rows and the second table has 10 rows, the result will be 20 * 10, or 200 rows. This query will take a long time to execute.


2. Use of SELECT *

For example, a common misuse of SELECT * is to extract a set of all employees and to insert them into another table called Contractors with the same structure

INSERT INTO Contractors 
  SELECT * FROM Employees WHERE emp_type = 'C';

The above query does the job, however, one day business requirements change and two new columns are added to the Employees table:

ALTER TABLE Products
  ADD effective_start_date DATETIME, effective_end_date DATETIME;

All of sudden the query that extracts from the Employees table and insert records into the Contractor table results in error.

"Insert Error: Column name or number of supplied values does not match table definition."

The fix is to explicitly list the column names in the query:

INSERT INTO Contractors (emp_id, emp_name)
  SELECT emp_id, emp_name FROM Employees WHERE emp_type = 'C';


3. Embedding User Interface (UI) layer logic into Data layer via SQL. For example

SELECT '<a href="http://www.blogger.com/...">' + name ' </a>'


The above code is a bad practice because it tightly couples your UI Layer with the Data Layer.


4. Not using Prepared statements. Prepared statements are more secured and efficient than the ordinary statements. Prepared statements prevent SQL injection attacks.


5. Using the predicate "LIKE" in indexed columns. The "LIKE" predicate typically performs a search without the normal performance benefit of indexes. Using '=', '<>', etc instead of "LIKE" will increase performance. Also should be aware of that case sensitivity (e.g., 'A' versus 'a') may be different based upon database Server or configuration.


6. Over use of cursors in stored procedures. If possible, avoid using SQL stored proc cursors. They generally use a lot of  Server resources and reduce the performance and scalability of your applications. If you need to perform row-by-row operations, try to find another method to perform the task.

Here are some alternatives to using a cursor:

  •     Use WHILE LOOPS
  •     Use temp tables
  •     Use materialized views (allowing you to pre-join complex views and pre-compute summaries for super-fast response time. )
  •     Use derived tables
  •     Perform multiple queries
  •     Use correlated sub-queries
  •     Use the CASE statement 

Q. Can you give some database performance tuning tips based on your experience?
A.

1. Materialized views are one of the important SQL tuning tools in Oracle. Instead of the entire company accessing a single database server, user load can be distributed across multiple database servers with the help of materialized views in Oracle. Through the use of multi tier materialized views, you can create materialized views based on other materialized views, which enables you to distribute user load to an even greater extent because clients can access materialized view sites instead of master sites. To decrease the amount of data that is replicated, a materialized view can be a subset of a master table or master materialized view. 

  • Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data.
  • It allows you to pre-join complex views and pre-compute summaries for super-fast response times. Unlike an ordinary view, which does not take up any storage space or contain any data, a materialized view provides indirect access to table data by storing the results of a query in a separate schema object.
  • You can define a materialized view on a base table, partitioned table or view and you can define indexes on a materialized view.
  • A materialized view can be stored in the same database as its base tables (improves query performance through query rewrite) or in a different database.

It is also worth noting that this capability may not suit best for too frequent activities as in online transaction processing (i.e. OLTP)  environments. In other databases equivalent functionalities can be achieved through triggers on base tables that would update/insert aggregate or dimension tables and queries can be executed against these aggregated or dimension tables as oppose to the base tables.


2. As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that increases in one direction (i.e. monotonic) such as an identity column, or some other column where the value is increasing and is unique. In many cases, the primary key is the ideal column for a clustered index. Create an index on any column that is a foreign key. If you know it will be unique, set the flag to force the index to be unique.

3. Avoid temp tables as much as you can, but if you need a temp table, create it explicitly using Create Table #temp.

4. Only return the columns and the rows you need.

5. Avoid full table scan where possible. The full table scan can be caused by


  • No WHERE condition.
  • No index on any type of field in the WHERE clause.
  • NOT IN predicate that is easier to write (replace NOT IN with a left outer join).
  • WHERE clauses like column_name is not null, condition 1 or condition 2, column_name between ... and ..., not equality comparisons
  • Use of SQL “LIKE clause” operator to find a string within a large table column (e.g. VARCHAR(2000), CLOB, BLOB).
  • DISTINCT, ANY, and ALL.



You may also like:



    Labels: