Connecting to the database is easy, using tools that you may already use for data manipulation. Here are a few examples.
We illustrate with a .env
file for R and Python. This stores the credentials in a file on the local client.
In these examples, a plaintext .env
would contain the following values:
DATABASE_HOST=denali_database.url
DATABASE_PORT=5432
DATABASE_NAME=denali_database
DATABASE_USER=denali_user
DATABASE_PASS=denali_password
R
In R, we can connect to the database and fetch a dataframe, with minimal libraries required.
library(dotenv)
library(RPostgreSQL)
load_dot_env(file = ".env")
# Open connection to the database
con <- dbConnect(RPostgres::Postgres(),
dbname = Sys.getenv("DATABASE_NAME"),
host = Sys.getenv("DATABASE_HOST"),
port = Sys.getenv("DATABASE_PORT"),
user = Sys.getenv("DATABASE_USER"),
password = Sys.getenv("DATABASE_PASS"))
# query the database from a view
db_volcanic <- dbSendQuery(con, 'SELECT * FROM denali.volcanic_eruption')
volcanic <- data.frame(dbFetch(db_volcanic))
dbClearResult(db_volcanic)
# disconnect, keeping the resulting dataframe in 'volcanic'
dbDisconnect(con)
We can also take advantage of dbplyr to use the grammar of dataframes with the database connection. This allows for more complex operations to be performed with minimal network usage, and sometimes limited RAM usage on the client device.
library(dotenv)
library(RPostgreSQL)
library(dbplyr)
library(dplyr)
load_dot_env(file = ".env")
# Open connection to the database
con <- dbConnect(RPostgres::Postgres(),
dbname = Sys.getenv("DATABASE_NAME"),
host = Sys.getenv("DATABASE_HOST"),
port = Sys.getenv("DATABASE_PORT"),
user = Sys.getenv("DATABASE_USER"),
password = Sys.getenv("DATABASE_PASS"))
#query the database from a view. everything before collect() is translated into SQL and run in the database's container. This allows for the addition of joins, select statemens and other more complex operations.
volcanic <- con %>%
tbl(in_schema("denali", "volcanic_eruption")) %>%
collect()
# disconnect, keeping the resulting dataframe in 'volcanic'
dbDisconnect(con)
Python
Python operates similarly to R with the psycopg2 package, however a cursor is opened and closed within the connnection.
import psycopg2
from dotenv import dotenv_values
config = dotenv_values('dev.env')
## Connect to Database
conn = psycopg2.connect(host=config['DATABASE_HOST'], port=config['DATABASE_PORT'],
database=config['DATABASE_NAME'],
user=config['DATABASE_USER'], password=config['DATABASE_PASS'])
# create a cursor object
cur = conn.cursor()
# Exectue a query on the database
cur.execute('SELECT * FROM denali.volcanic_eruption')
volcanics = cur.fetchall()
## Database Disconnect
cur.close()
conn.close()
Alternatively, we can use sqlalchemy. This is especially convienance for uploads.
from sqlalchemy as db
from dotenv import dotenv_values
config = dotenv_values('dev.env')
# Build connection to Database
conn_string = 'postgresql://' + config['DATABASE_USER'] + ':' + config['DATABASE_PASS'] + '@' + \
config['DATABASE_HOST'] + ':' + config['DATABASE_PORT'] + '/' + config['DATABASE_NAME']
engine = db.create_engine(conn_string)
conn = enginge.connect()
metadata_db = db.MetaData(schema="denali")
volcanics_db = db.Table('volcanic_eruption', metadata_db, autoload=True, autoload_with=engine)
volcanics = volcanics_db.select()
Matlab
MatLab can support ‘.env’ files with a third-party dependency, but it is not in the base functionality. In this example, we hardcode the values from above.
MatLab’s database toolbox supports connection to PostgreSQL.
url = "denali_database.url:5432"
user = denali_user
database = denali_database
pw = denali_password
conn = postgresql(url, user, pw)
volcanics = sqlread(conn, "volcanic_eruption", 'Schema', "denali")
close(conn)
LibreOffice Calc
See LibreOffice’s documentation
Microsoft Office
See Microsoft’s documentation.
Google Cloud
See Google’s Documentation for Looker Studio, previously Data Studio.