Introduction to SQL for Data Analyst

Initial SQL lessons a Data Analyst should know

Riza Purnaramadhan
Analytics Vidhya

--

Photo by Dominik Vanyi on Unsplash

Currently, if we observe, most of the job vacancies as a data analyst in companies who have implemented IT systems, require SQL as analysis tool expertise, therefore SQL skills are skills that must be mastered by someone who wants to work as a data analyst.

In this article, I intend to introduce an initial SQL lesson for someone who is looking for a career as a data analyst.

Table of Contents

  1. About SQL
  2. Types of SQL Commands
  3. Relational Database Management System (RDBMS)
  4. Query Example

About SQL

What is SQL?

SQL which stands for Structured Query Language is the standard computer language used for a database system — or rather a relational database management system.

Why do data analysts need to learn SQL?

As a data analyst, our job is not only to analyze the data that is already available but also to be able to retrieve, modify and access the data yourself from the data source, namely the database. The data analyst will use SQL to do query the data and gain insights from it.

Additionally, mastery of SQL will help companies in the following areas:

-Helps create information reports, for example, month-to-month sales trends, increase in customers, etc.

-Programmers who build application systems.

-Improve company performance.

-etc.

Where is SQL used?

SQL is usually used in companies that have implemented IT systems, for example:

E-commerce
- Retail
- Logisly
- etc.

Types of SQL Commands

The types of SQL commands that I find a lot are divided into 4 types, namely Data Definition Language(DDL), Data Manipulation Language(DML), Data Control Language(DCL), and Transaction Control Language(TCL), but a data analyst, actually enough to master DDL & DML, because in fact they often interact with type DDL & DML commands, maybe if you want to be a database administrator then will often interact with DCL & TCL command types. The following is a picture of the distribution of the types from the SQL commands:

Image Created by Author

Data Definition Language (DDL)

DDL (Data Definition Language) is a command that functions more to manipulate database structures, such as creating, modifying, and deleting. The following are the commands contained in the DDL and their functions:

  • CREATE: create a database or table.
  • ALTER: change the structure of the table, such as changing table names, adding columns, changing columns, deleting columns, and assigning attributes to columns.
  • DROP: delete database or table.

Data Manipulation Language (DML)

DML (Data Manipulation Language) is a command in the SQL programming language that is used to manipulate data in a database. The following are the commands contained in the DML and their functions:

  • INSERT: insert new data into a table.
  • SELECT: selects and displays columns from a table or even multiple columns from a table.
  • UPDATE: change/edit data in a table.
  • DELETE: delete data from a table.

Data Control Language (DCL)

Data Control Language (DCL) functions to control and manipulate the database permissions itself. Simply put, Data Control Language (DCL) functions to control access rights (Privileges) and manipulate the USER Database. The following are the commands contained in the DCL and their functions:

  • GRANT: this command is used to grant admin permissions to a user.
  • REVOKE: unlike GRANT which grants access rights, REVOKE has the right to revoke the access rights of a user.

Transaction Control Language (TCL)

TCL commands deal with transactions in the database. The following are the commands contained in the TCL and their functions:

  • COMMIT: used to permanently store transactions in the database.
  • ROLLBACK: The ROLLBACK command is used to return the database to the last COMMIT.
  • SAVEPOINT: SAVEPOINT is a special sign inside a transaction that allows all commands executed after they are set to be rolled back, restoring the transaction state to the state at the time of the savepoint.

Relational Database Management System (RDBMS)

What is RDBMS?

Relational Database Management System, commonly abbreviated as RDMBS, is a program that allows us to create, update, and manage a relational database. Relational database itself is a type of database where data is generally stored in a structured form in the form of tables (rows and columns) and each table/data contained in the database has a relationship (relational) to one another.

Simple Illustration of Relational Tables

Here is a simple example of a table that has a relationship with each other through its columns:

Image Created by Author

Popular RDBMS Products

  • PostgreSQL (Open Source)
  • MySQL, MariaDB (Open Source)
  • SQLite (Open Source)
  • Hadoop (Open Source)
  • Microsoft SQL Server (Not Open Source)
  • Oracle DB (Not Open Source)

Database Storage Structure

As a data store, a relational database system has a hierarchical structure of storage objects as follows:

  • Database
  • Table
  • Column

Table and Column

The following image is an example of a table in the database. For example, the database name is a student database

Image Created by Author

Query Example

Here I will show a little example of a query in SQL, hopefully, with this, you have an idea of ​​how to query in SQL.

For example, in a database called student, we have a table named Student_Identity, here is the table:

then we want to try manipulating the table with simple manipulations.

Retrieve The Entire Column in The Table

To retrieve all Student_Identity table columns, we can type the following command:

SELECT*FROM Student_Identity;

after running it will produce the following output:

syntax explanation:

Image Created by Author
  • The initial word, SELECT, is used to inform the system that we want to retrieve data.
  • An * (asterisk) means that all columns need to be retrieved from the referenced table. This sign is often referred to as a wildcard.
  • FROM [TABLE_NAME], it means the name of the table in which data will be taken.
  • Sign; (semicolon) is a sign that indicates the end of the SELECT command

Take One Column in The Table

To retrieve the Name column in the Student_Identity table, we can type the following command:

SELECT Name FROM Student_Identity;

after running it will produce the following output:

syntax explanation:

Take More Than One Column in The Table

To retrieve more than one column, we can type the following command:

SELECT Name, Age FROM Student_Identity;

after running it will produce the following output:

syntax explanation:

Image Created by Author

Conclusion

As a person who wants a career as a data analyst, it is very important to master SQL as an analysis tool, to master SQL it is necessary to know the basics of SQL lessons first so that the flow of thinking is built, hopefully the explanation above can help you!

--

--