Predicting Property Prices Using Oracle ADW and AutoML: Part 1 — Creating an ADW and Preparing the Data

Jared Bach
14 min readJul 6, 2022

This blog is part of a series. To learn more about this demo and to gain access to the other steps, refer to this blog. In this blog, we are going to create an Autonomous Data Warehouse and a database schema. We are then going to create an object storage bucket where our raw CSV file with all of our data will be stored. We will then create a live table inside of the database that will automatically read the CSV file from our bucket into an ADW table. After our data is inside our ADW, we can clean our data using SQL.

Sign-in to your OCI tenancy

Let’s start by logging into our OCI tenancy. To do this, navigate to this website. You should see a page that looks like this.

Where it says Cloud Account Name circled in red above, enter your tenancy name and then click Next.

Click Continue to sign-in. At long last, you will arrive in the OCI Console.

Create a Compartment

I have always liked to think of a compartment like a cubby. A cubby stores your stuff. A compartment, like a cubby, also stores your stuff. But instead of holding your snow boots and backpack, it holds a collection of cloud assets, like compute instances, load balancers, databases, and so on. By default, a root compartment was created for you when you created your tenancy. It is possible to create everything in the root compartment, but it is recommended that you create sub-compartments to help manage your resources more efficiently.

Once you are in the OCI console, navigate to the hamburger menu in the left-hand corner of the screen. Under Identity & Security, click Compartments.

Click the blue Create Compartment button to create a sub-compartment.

Give the compartment a name and description. Be sure your root compartment appears as the parent compartment. Press the blue Create Compartment button.

We are now going to create the Autonomous Data Warehouse.

Create an Autonomous Data Warehouse & ADW User

Navigate to the hamburger menu in the left-hand corner of the screen. Under Oracle Database, click Autonomous Data Warehouse.

Click Create Autonomous Database to start the instance creation process.

This brings up the Create Autonomous Database screen where you will configure the instance. Configure the ADW with the following specifications:

  • Compartment: OML
  • Display Name: OML-Demo
  • Database Name: OMLDemo
  • Password and Confirm Password — Specify the password for ADMIN user of the service instance. The password must meet the following requirements:
The password must be between 12 and 30 characters long and must include at least one uppercase letter, one lowercase letter, and one numeric character.The password cannot contain the username.The password cannot contain the double quote (“) character.The password must be different from the last 4 passwords used.The password must not be the same password that you set less than 24 hours ago.Re-enter the password to confirm it. Make a note of this password.

Leave all of the other selections as their default and click the Create Autonomous Database blue button at the bottom of the screen. The database will now provision. This will take no longer than a few minutes. Once the database has been provisioned, it should look like this.

Click the Database Actions button, circled in red above. Your screen should now look like this.

Congratulations, you now have a database. Let’s now create a database user. Click the Database Users tile, circled in red above. Then, click the Create User button, circled in red below.

A box will pop-up where you will need to specify some parameters. Define the user as BROOKLYN, set a strong password, set the quota on tablespace data to UNLIMITED, and turn on the Graph, OML, Web Access, and Authorization required switches. Your screen should look like this.

Click Create User when you are done. Your screen should now look like this. Open the BROOKLYN user in a new tab by pressing the “Open in new tab” icon circled in red below.

Sign-in to the database with the login credentials you just specified for the BROOKLYN user.

Congratulations — you are now logged into your database as the BROOKLYN user.

Let’s return to the OCI console. We are going to create an object storage bucket. We are almost ready to load out data into the database.

Creating an Object Storage Bucket & an ADW Live Table

Not a Chum Bucket — an Object Storage Bucket 🤪 The Oracle Cloud Infrastructure Object Storage service is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. The Object Storage service can store an unlimited amount of unstructured data of any content type, including analytic data and rich content, like images and videos. In latent terms, its Google Drive. We are going to store our Brooklyn Home Sales, 2003 to 2017 CSV in an object storage bucket. Before we upload our data, however, we need to make a small change to the CSV file. There is currently a bug that is preventing the Live Tables from working properly, because the first column header in our CSV does not have a name. To fix this, open-up the CSV in a text editor. Do not open in Excel, as this can sometimes lead to data loss. Your CSV should look like this.

As you can see, the first column header does not have any text, it is just two quotation marks. Let’s add column_1 inside the quotation marks. Your CSV should now look like this.

Save the CSV. Let’s now return to the OCI Console. Navigate to the hamburger menu in the left-hand corner of the screen. Under Storage, click Buckets.

Your screen should now look like this. Click the Create Bucket button.

Name your bucket bucket OML Bucket and then click Create.

After your bucket has been created, open the OML-Bucket. Click “Upload under Objects. Upload the brooklyn_sales_map.csv that you downloaded at the beginning of the blog. Your screen should look like this. Click Upload. This should take no more than a minute.

Next, under Resources, click Pre-Authenticated Requests. Then click the Create Pre-Authenticated Request button.

Name the Pre-Authenticated Request OML-Demo, set the Access Type to Permit object reads and writes, check Enable Object Listing, and set the expiration anytime in the future. I set mine to the end of the current year. After you do this, click the Create Pre-Authenticated Request button at the bottom of the screen.

Save the URL. We will need this when creating the live table. Click Close when you are done.

Now, return to the OCI Console home page. Click the person icon in the top right-hand corner of the screen. Then, click where it says oracleidentitycloudservice/[your_email].

In the bottom left-hand corner of the screen, click Auth Tokens and then the blue Generate Token button.

Name your token OMLDemo and then click Generate Token.

Copy your generated token and save it. We will need this when creating the live table. Click Close when you are done.

Let’s now return to the database. Make sure you are signed-in as the Brooklyn user.

Click the Data Load tile.

Click the Cloud Locations tile.

Click the +Add Cloud Store Location button.

A window should pop-up. Fill in the boxes with the following:

  • Name: OMLDemo
  • Description: Bucket for OML Demo

Select Create Credential and Oracle for Cloud Store. Fill in the boxes under Oracle with the following:

  • Credential Name: OMLDemo
  • Oracle Cloud Infrastructure User Name: oracleidentitycloudservice/[your_email]
  • Auth Token: your_token_from_previous_step

Under Bucket URI, paste the pre-authenticated bucket URL from the previous step. Your window should now look like this.

Click Next and then in the following window, click Create. Your window should now look like this.

Congrats — you successfully connected your object storage bucket to ADW. Now all that is left to do is upload the data in our object storage bucket to an ADW table. We are going to do this using a live table. Live tables are a cool and newer feature of ADW that essentially allow you to have a live connection between an object storage bucket and an ADW table. For example, let’s say you were streaming Tweets from Twitter and storing the raw output in an object storage bucket. You could set up a live table that connects your ADW to object storage in a way that when a new output is uploaded to your bucket, it is then automatically fed into an ADW table. To start your set-up, click the hamburger icon in the upper left hand corner of your screen. Under Data Tool, click Data Load.

Click the Feed Data tile and then click Next.

Click +Create Live Table Feed.

A window will pop-up. Fill-in the boxes with the following:

  • Live Table Feed Name: BROOKLYN_SALES_MAP
  • Target Table Name: BROOKLYN_SALES_MAP
  • Object Filter: *.csv

Uncheck Enable for Scheduling. Your window should look like this. Click Create.

Click the three dots next to your live table feed and click Run Live Table Feed Immediately (Once).

This may take a few minutes to execute. Once this is successfully executed, you will see a confirmation pop-up in the upper right-hand corner of your screen that looks like this.

Let’s now navigate to SQL. In the upper left-hand corner of your screen, open the hamburger menu. Under Development, click SQL.

Your screen should now look like this. Note: it might take a few minutes for you screen to look like this. Because there are almost 400K records, it takes a while for it all to load so be patient.

Uh-oh — it looks like we have a small problem.

Update: As of July, 22, 2022, it appears this bug has been fixed and live tables should work properly now using ADP Schema Version 22.3.1.0.6. If you are experiencing issues, however, upload the data locally using the method below. Additionally, when a live table is created currently, the columns that are built are case sensitive. Please be aware of this, as this may cause issues during the data cleaning and model building steps in the notebook. Until this case sensitivity issue is fixed, I recommend loading the data locally.

As you can see, we have a table named COPY$1_BAD. This table contains omitted records that our database was not able to read. We can learn more about why it omitted records in the COPY$1_LOG table. To count the number of omitted records, execute the following SQL statement in SQL Developer:

SELECT COUNT(*) FROM COPY$1_BAD;

This should return 62,208. This means about 62K records were omitted from our live table, BROOKLYN_SALES_MAP. To count the number of records in BROOKLYN_SALES_MAP, execute the following SQL statement in SQL Developer:

SELECT COUNT(*) FROM BROOKLYN_SALES_MAP;

This should return 328,892. This means there are almost 329K records in our live table, BROOKLYN_SALES_MAP. Doing some basic math, we can also calculate the percentage of omitted records, which in this instance would be 15.9%. While this bug is being worked on by Oracle, we are going to reload the data into the database locally. First, let’s drop our BROOKLYN_SALES_MAP using the following SQL command.

DROP TABLE BROOKLYN_SALES_MAP;

Next, click the hamburger icon in the upper left hand corner of your screen. Under Data Tool, click Data Load.

Now, select Load Data and Local File, and then click Next.

Your screen should now look like this — click Select Files and select the brooklyn_sales_map.csv file.

Your screen should now look like this. Before you execute this by clicking this green play button, click the pencil.

Your screen should now look like this.

We are not going to touch anything in here, but I just wanted to point out that you can view the CSV data, the column names, etc. before you load it into your ADW. You can also make changes to the ADW table before you load the data, if you would like. When you are ready, click the close button in the bottom right-hand corner of the screen and then proceed to execute this data load by clicking the green play button, circled in red below.

Click Run.

This will take a few minutes to load. Once the data is done loading, your screen should look like this.

As you can see, it took 4:11 seconds for my data to load. Let’s go check out the data in SQL. In the upper left-hand corner of your screen, open the hamburger menu. Under Development, click SQL.

Your screen should now look like this.

Let’s count the number of records in BROOKLYN_SALES_MAP. To do this, execute the following SQL statement in SQL Developer:

SELECT COUNT(*) FROM BROOKLYN_SALES_MAP;

This should return 390,883. This means there are 390,883 records in our database table. Congratulations — we have successfully uploaded our data to the database. We are now going to move onto prepping our data for machine learning.

Prepping the Data

When I first completed this demo, I used SQL statements inside of the OML notebook to prep the data for machine learning. After going about cleaning the data this way, however, I concluded that it would be a lot more efficient to clean the data in SQL Developer. We can then visualize the data in an OML notebook after cleaning. First, we are going to create a duplicate table, called BROOKLYN_COPY. We are doing this, because we do not want to touch the original data. Execute the following SQL statement.

CREATE TABLE BROOKLYN_COPY AS (SELECT * FROM BROOKLYN_SALES_MAP);

Next, we are going to add YEAR, MONTH, DAY, and DAY_OF WEEK columns to our table. We can do this with the following SQL.

ALTER TABLE BROOKLYN_COPY
ADD (
MONTH NUMBER(2) NULL,
YEAR NUMBER(4) NULL,
DAY NUMBER(2) NULL,
DAY_OF_WEEK NUMBER(1) NULL
);
UPDATE BROOKLYN_COPY
SET
MONTH = EXTRACT(MONTH FROM SALE_DATE),
YEAR = EXTRACT(YEAR FROM SALE_DATE),
DAY = EXTRACT(DAY FROM SALE_DATE),
DAY_OF_WEEK = to_char(SALE_DATE,'D')
WHERE rowid=rowid
;

Next, we are going to change the column type of the ZIP_CODE, LOT, YEAR, MONTH, DAY, and DAY_OF_WEEK columns with the following SQL code.

ALTER TABLE BROOKLYN_COPY
ADD
(
ZIP_CODE_ VARCHAR2(26),
LOT_ VARCHAR2(26),
YEAR_ VARCHAR2(26),
MONTH_ VARCHAR2(26),
DAY_ VARCHAR2(26),
DAY_OF_WEEK_ VARCHAR2(26)
);
UPDATE BROOKLYN_COPY
SET
ZIP_CODE_ = ZIP_CODE,
LOT_ = LOT,
YEAR_ = YEAR,
MONTH_ = MONTH,
DAY_ = DAY,
DAY_OF_WEEK_ = DAY_OF_WEEK
WHERE rowid=rowid
;
ALTER TABLE BROOKLYN_COPY
DROP
(
ZIP_CODE,
LOT,
YEAR,
MONTH,
DAY,
DAY_OF_WEEK
)
;
ALTER TABLE BROOKLYN_COPY RENAME COLUMN ZIP_CODE_ TO ZIP_CODE;
ALTER TABLE BROOKLYN_COPY RENAME COLUMN LOT_ TO LOT;
ALTER TABLE BROOKLYN_COPY RENAME COLUMN YEAR_ TO YEAR;
ALTER TABLE BROOKLYN_COPY RENAME COLUMN MONTH_ TO MONTH;
ALTER TABLE BROOKLYN_COPY RENAME COLUMN DAY_ TO DAY;
ALTER TABLE BROOKLYN_COPY RENAME COLUMN DAY_OF_WEEK_ TO DAY_OF_WEEK;

Next, we are going to filter the table, so that the SALE_PRICE is in between $400K and $7M.

CREATE VIEW BROOKLYN_VIEW AS
SELECT NEIGHBORHOOD, BUILDING_CLASS_CATEGORY, BLOCK, BUILDING_CLASS, LAND_SQFT, GROSS_SQFT, YEAR_BUILT, SALE_PRICE, SANITBORO, SANITSUB, LOTAREA, BLDGAREA, COMAREA, RETAILAREA, GARAGEAREA, LOTFRONT, ASSESSLAND,ASSESSTOT, EXEMPTTOT, YEARBUILT, BUILTFAR, FACILFAR, SANBORN, MONTH, YEAR
FROM BROOKLYN_COPY
WHERE SALE_PRICE > 400000 AND SALE_PRICE < 7000000;

Lastly, we are going to filter the data, so that we are only looking at family homes.

CREATE VIEW BROOKLYN_VIEW_FINAL AS
SELECT *
FROM BROOKLYN_VIEW
WHERE BUILDING_CLASS_CATEGORY = '02 TWO FAMILY HOMES'
OR BUILDING_CLASS_CATEGORY = '01 ONE FAMILY HOMES'
OR BUILDING_CLASS_CATEGORY = '03 THREE FAMILY HOMES'
;

And that’s it! When we go to do our machine learning, we will tap into the data in BROOKLYN_VIEW_FINAL. If you want to run all of this SQL in one go, copy and paste the SQL below into SQL Developer.

Before we move-on to OML, execute the following SQL statement to check the number of records in your BROOKLYN_VIEW_FINAL.

SELECT COUNT(*) FROM BROOKLYN_VIEW_FINAL;

If your query returns 82,778, then you should be ready to move on to the next step. Your screen should look like this.

Let’s move onto Part 2 of this demo, where we will start build a machine learn model using an Oracle Machine Learning (OML) notebook and AutoML.

--

--

GenZ business techie 👨🏻‍💻📉 lover of dogs and hummus 🇮🇱