Learn 10% SQL That Accounts for 90% Query
Here I am writing that 10% down in my blog so that next time I can refer to it and I am using SQLite database to illustrate all the steps. BTW, SQLite (SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine) is an extremely powerful database and definitely worth your time in learning it.
Here I am going to create the database table and populate it with data in my Cygwin environment:
$uname -a CYGWIN_NT-6.0 user-PC 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin $ls -l test.db ls: cannot access test.db: No such file or directory $sqlite3.exe test.db SQLite version 3.6.2 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite>CREATE TABLE Person ( ...>EmpId INTEGER NOT NULL PRIMARY KEY, ...>FirstName TEXT NOT NULL, ...>LastName TEXT NOT NULL, ...>Rate DECIMAL ...>); sqlite>CREATE TABLE Assigned ( ...>EmpId INTEGER NOT NULL, ...>ProjId INTEGER, ...>StartDate DATE, ...>EndDate DATE ...>); sqlite>CREATE TABLE Customer ( ...>CustId INTEGER NOT NULL PRIMARY KEY, ...>ContactInfo TEXT ...>); sqlite>CREATE TABLE Project ( ...>ProjId INTEGER NOT NULL PRIMARY KEY, ...>ProjName TEXT, ...>CustId INTEGER, ...>StartDate DATE, ...>EndDate DATE ...>); sqlite> sqlite>INSERT INTO Person VALUES (3001,'Dave','Thomas',400); sqlite>INSERT INTO Person VALUES (3002,'Andy','Hunt',400); sqlite>INSERT INTO Person VALUES (4001,'Greg','Wilson',320); sqlite>INSERT INTO Person VALUES (4002,'Grace','Hopper',500); sqlite>INSERT INTO Person VALUES (4003,'Alan','Turing',500); sqlite>INSERT INTO Person VALUES (4004,'Chunk','Babbage',125); sqlite> sqlite>INSERT INTO Project VALUES (904,'RubyMath',70043,'2004-05-01','2004-10-30'); sqlite>INSERT INTO Project VALUES (905,'DBBridge',70047,'2004-05-01','2004-10-30'); sqlite> sqlite>INSERT INTO Customer VALUES (70043,'MegaCorp Inc.'); sqlite>INSERT INTO Customer VALUES (70047,"Deadlines 'R' Us"); sqlite>INSERT INTO Customer VALUES (70101,'UNiversity of Euphoria'); sqlite> sqlite>INSERT INTO Assigned VALUES (3001,904,'2005-02-01','2005-02-28'); sqlite>INSERT INTO Assigned VALUES (3002,904,'2005-02-01','2005-03-15'); sqlite>INSERT INTO Assigned VALUES (4001,904,'2005-02-01','2005-03-21'); sqlite>INSERT INTO Assigned VALUES (4001,905,'2005-01-10','2005-02-22'); sqlite>INSERT INTO Assigned VALUES (4002,905,'2005-01-20','2005-04-01'); sqlite>INSERT INTO Assigned VALUES (4004,905,'2005-02-10','2005-03-31'); sqlite> sqlite>.q $ls -l test.db -rwxrwxrwx 1 user None 5120 Jan 27 20:18 test.db
Here are some of the SQL statements in doing join, nesting and negation. Basically they are trying to answer the following questions:
- Who is paying for the RubyMath project ?
- Get forenames and surnames of employees on RubyMath project
- Select people who are NOT assigned to the RubyMath project
- Select people who are assigned to exactly one project
- Find people in 904 or 905, but not both
- Find the most expensive contractors
$sqlite3.exe test.db SQLite version 3.6.2 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite>-- Who is paying for the RubyMath project ? sqlite>SELECT Customer.ContactInfo ...>FROM Customer, Project ...>WHERE (Customer.CustId = Project.CustId) ...>AND (Project.ProjName = 'RubyMath'); MegaCorp Inc. sqlite> sqlite> sqlite>SELECT Customer.ContactInfo ...>FROM Customer INNER JOIN Project ...>ON Customer.CustId = Project.CustId ...>WHERE Project.ProjName = 'RubyMath'; MegaCorp Inc. sqlite> sqlite> sqlite>-- Get forenames and surnames of employees on RubyMath project sqlite>SELECT Person.FirstName, Person.LastName ...>FROM Person, Project, Assigned ...>WHERE (Person.EmpId = Assigned.EmpId) ...>AND (Project.ProjName = 'RubyMath') ...>AND (Assigned.ProjId = Project.ProjId); Dave|Thomas Andy|Hunt Greg|Wilson sqlite> sqlite> sqlite>-- Select people who are NOT assigned to the RubyMath project sqlite>SELECT Person.FirstName, Person.LastName ...>FROM Person ...>WHERE Person.EmpId NOT IN ...>(SELECT Assigned.EmpId ...>FROM Assigned, Project ...>WHERE (Assigned.ProjId = Project.ProjId) ...>AND (Project.ProjName = 'RubyMath')); Grace|Hopper Alan|Turing Chunk|Babbage sqlite> sqlite> sqlite>-- Select people who are assigned to exactly one project sqlite>SELECT Person.FirstName, Person.LastName ...>FROM Person, Assigned ...>WHERE (Person.EmpId = Assigned.EmpId) ...>AND (Assigned.EmpId NOT IN ...>(SELECT A.EmpId ...>FROM Assigned A, Assigned B ...>WHERE (A.EmpId = B.EmpId) ...>AND (A.ProjId < B.ProjId))); Dave|Thomas Andy|Hunt Grace|Hopper Chunk|Babbage sqlite> sqlite> sqlite>-- Find people in 904 or 905, but not both sqlite>SELECT Person.FirstName, Person.LastName ...>FROM Person, Assigned ...>WHERE (Person.EmpId = Assigned.EmpId) ...>AND ((Assigned.ProjId = 904) OR (Assigned.ProjId = 905)) ...>AND (Assigned.ProjId NOT IN ...>(SELECT A.ProjId ...>FROM Assigned A, Assigned B ...>WHERE (A.ProjId = 904) AND (B.ProjId = 905))); Greg|Wilson Grace|Hopper Chunk|Babbage sqlite> sqlite> sqlite>-- Find the most expensive contractors sqlite>SELECT Person.FirstName, Person.LastName ...>FROM Person ...>WHERE (Person.Rate NOT IN ...>(SELECT A.Rate ...>FROM Person A, Person B ...>WHERE A.Rate < B.Rate)); Grace|Hopper Alan|Turing sqlite>.q $
Here are the SQL statements in aggregation and views answering the following questions:
- Get the total rate for all consultants
- Find consultant who rate is above average
$sqlite3.exe test.db SQLite version 3.6.2 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite>-- Get the total rate for all consultants sqlite>SELECT Project.ProjName, SUM(Person.Rate) ...>FROM Person, Assigned, Project ...>WHERE (Person.EmpId = Assigned.EmpId) ...>AND (Project.ProjId = Assigned.ProjId) ...>GROUP BY Assigned.ProjId; RubyMath|1120 DBBridge|945 sqlite> sqlite> sqlite>-- view sqlite>CREATE VIEW ProjAveRate AS ...>SELECT Project.ProjId AS ProjId, ...>AVG(Person.Rate) AS AveRate ...>FROM Person, Assigned, Project ...>WHERE (Person.EmpId = Assigned.EmpId) ...>AND (Project.ProjId = Assigned.ProjId) ...>GROUP BY Assigned.ProjId; sqlite> sqlite> sqlite>SELECT ProjAveRate.ProjId, Person.FirstName, Person.LastName, Person.Rate, ProjAveRate.AveRate ...>FROM Person, Assigned, ProjAveRate ...>WHERE (Person.EmpId = Assigned.EmpId) ...>AND (Assigned.ProjId = ProjAveRate.ProjId) ...>AND (Person.Rate > ProjAveRate.AveRate); 904|Dave|Thomas|400|373.333333333333 904|Andy|Hunt|400|373.333333333333 905|Greg|Wilson|320|315.0 905|Grace|Hopper|500|315.0 sqlite>.q $
If you cannot get hold of the book, you may want to read the author's article - Top Ten Data Crunching Tips and Tricks.
Labels: SQLite
1 Comments:
Hi, thanks for posting this. I wanted to learn SQL with Cygwin and this blog is great.
One thing I noticed while working through the examples, there seems to be a mistake in the example to find people in project 904 or 905 but not both.
sqlite> -- Find people in 904 or 905, but not both
sqlite> SELECT Person.FirstName, Person.LastName
...> FROM Person, Assigned
...> WHERE (Person.EmpId = Assigned.EmpId)
...> AND ((Assigned.ProjId = 904) OR (Assigned.ProjId = 905))
...> AND (Assigned.ProjId NOT IN
...> (SELECT A.ProjId
...> FROM Assigned A, Assigned B
...> WHERE (A.ProjId = 904) AND (B.ProjId = 905)));
Greg|Wilson
Grace|Hopper
Chunk|Babbage
Now Greg Wilson is the only guy in both 904 and 905. So the result should be everyone else.
Instead we're getting Greg, Grace and Chunk. Who are all working project 905.
The last section of this, assuming it's independent of the previous linkages and constraints (is it?) is
...> (SELECT A.ProjId
...> FROM Assigned A, Assigned B
...> WHERE (A.ProjId = 904) AND (B.ProjId = 905))
This returns 904, because we're returning a ProjId. And the selection is trivial, because there's no linkage between the tables A and B. So from here we're returning 904, and the main loop is going to select people not in that project. Thus we're going to select everyone on project 904 or 905 but not in project 904 – so everyone on project 905. Which we do.
It looks like, in the inner check, we're wanting to identify an Employee that is involved in both projects, but you'd want to return a Assigned.EmpId for that. Also, you'd want to link the two tables with an AND (A.EmpId = B.EmpId) to make it meaningful.
...> (SELECT A.EmpId
...> FROM Assigned A, Assigned B
...> WHERE (A.EmpId = B.EmpId)
...> AND (A.ProjId = 904) AND (B.ProjId = 905)
So the total check would be
sqlite> SELECT Person.FirstName, Person.LastName
...> FROM Person, Assigned
...> WHERE (Person.EmpId = Assigned.EmpId)
...> AND ((Assigned.ProjId = 904) OR (Assigned.ProjId = 905))
...> AND (Assigned.Empid NOT IN
...> (SELECT A.EmpId
...> FROM Assigned A, Assigned B
...> WHERE (A.EmpId = B.EmpId)
...> AND (A.ProjId = 904) AND (B.ProjId = 905)
...> ));
Does that make sense?
Post a Comment
<< Home