3 Databases and Data APIs

3.1 Key Takeaways

Although it’s not explicitly described in this chapter, a database is, at minimum, two things:

a datastore (i.e., a way to store structured data) and
a compute engine (i.e., a way to perform operations on structured data).

In most programming languages, the way you begin an interaction with a database is to create a connection object. The methods and params used to create that object will vary from database type to type and from language to language, including how credentials are provided or set, but the successful creation of this object implies that you have an open connection to the database with a specific set of permissions (read/write/remove, etc.).

Another source of data may be an API, or Application Programming Interface. The term “API” is actually used in this chapter in two different contexts; the first is to describe the method (or interface) used in particular programming languages to connect to a database, but in the second usage the context is REST or REST-ful API’s. These are collections of URL’s that allow a server to listen for requests for data from clients. Once a client makes a request to a given URL with specified parameters, the server evaluates the request, computes a response and returns the results. Although this is very similar to how a database operates, a REST-ful API uses a specific protocol called the HyperText Transfer Protocol (HTTP). This protocol uses the parts of the URL (i.e., the path and the query string) and the verb of the client request (i.e., GET, POST, PUT, DELETE, etc.) as a way to specify how the server should evaluate the request. This protocol is widely available in many applications, including those that do no support any database drivers (like web browsers). Each programming languages has specific packages or libraries for making HTTP requests, as well as packages that provide “wrapper” functions; functions that hide the details of each API request from the developer and simply take arguments and return results in a synchronous manner. In addition to parameters, a REST-ful API endpoint may accept a body (block of code or text) as part of the request, usually in the form of a JSON object, which is often also the preferred format of the response body (again, block of code or text) chosen by API developers. There are various ways to include credentials with HTTP requests.

While I believe the topic should have been introduced earlier in this book, this chapter also discusses environment variables and how they can be used to provide configuration settings to the process that the code is running within, including database or API credentials.

At CORI/RISI we have our own “data connector” package called cori.db, which allows each of us to connect directly to the database and interact with it in a prescribed way. At the moment, we do not have an equivalent set of function to interact with the CORI Data API, although we have created a placeholder package called cori.data.api which may one day provide such functions in R.

3.2 Lab / Project

3.2.1 Updates

The lab work introduces the duckdb package, so I created a new markdown file called “00_data.qmd” in the project sample that demonstrates the most basic syntax for creating a new database (which is stored in a flat file on local disk) and loading the penguins data into a new table in that database. To make sure that this file is processed before the others, I prefixed the names of all the quarto files with sequences numbers (“00_…”, “01_…”, etc.). The model and api code has been updated to retrieve the penguin table from the local database store with duckdb. Olivier added an “02_model_R.qmd” file to demonstrate the R code that is equivalent to the modeling example used in the book. The same could be done for the vetiver code used to deploy the resulting model as an API endpoint.