MySQL Heatwave: ML Basic

oci_heatwave_ml

Machine Learning (ML) and Artificial Intelligence (AI) is all the new, hot, shiny thing in the industry today.  Everyone is looking for ways to make a business decision quickly and accurately.  With this goal in mind, software manufactures are attempting to give access to machine learning capabilities easier within their respective databases and cloud offerings.

One platform that has excelled at making the leap from Python driven approach to a more SQL drive approach has been Oracle’s Heatwave offering.  Using SQL, Oracle has given developers and DBAs the ability to build models that can be used quickly – additionally breathing new life into many roles within an organization, like DBA, that could potentially be minimized due to increased automation and growth of AI.  Through this post, you will get a sense of how easy it is to build an ML model with OCI Heatwave.

Five steps to ML:

  1. Load data
  2. Train model
  3. Loading and Running Model
  4. Prediction
  5. Scoring

Through these file steps, a Developer or DBA can quickly devise ML models and use models within their applications.

Load Data

Before you can develop any models, the data has to be in the database.  Loading data into the database can be done in many different ways.  With MySQL, you can use the MySQL Shell (mysqlsh) utility called util.loadDump().  Once the data is loaded to the database, they it needs to be loaded into Heatwave (sys.heatwave_load()).  Now, these utilities are specific to MySQL; however, you can use other tools like Oracle GoldenGate to enable real-time data loads that the model can continously use during training processes.

Another aspect of loading data is the structure that the model will use. This is very simple to understand in that you have to have three tables that can be used for the training, testing, and validating of the model.  In this example, these three tables are:

  1. bts_airport_delay_train
  2. bts_airport_delay_test
  3. bts_airport_delay_validate

All three tables are structured the same and get used in different steps of the machine learning process.  The core SQL for these tables is:

CREATE TABLE `bts_airport_delay_%` (
`OPER_CARRIER` varchar(255) DEFAULT NULL,
`MONTH` varchar(255) DEFAULT NULL,
`ORIGIN_AIRPORT` varchar(255) DEFAULT NULL,
`SCHEDULED_DEPT_TIME` int DEFAULT NULL,
`AVG_MINUTES_LATE` float DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Note: the table name has been modified to remove the train, test, validate part of the name.  

After the tables have been established, they need to be populated.  That data that populates the train, test, and validate tables comes from the base tables within the database.  Example, if you have two tables that make up all the data for the model, then both tables data needs to go into train, test, and validate tables resulting in a single table of data.

Training

After all the data is loaded, the next thing to do is to “train” the model(s).  In order to do this, you have to run the sys.ML_TRAIN procedure that is part of the Heatwave cluster.  Upon execution of the ML_TRAIN procedure, you can provide one of three models:

  1. Classification
  2. Regression
  3. Forecasting

These are common models that can be used against the model tables that have been established.  As the model trains, the model will be stored in the schema ML_SCHEMA_%user% in the table MODEL_CATALOG.  

To execute a model training session, a call similar to this example would be used:

CALL sys.ML_TRAIN('FLIGHTS_BTS_DELAY.bts_airport_delay_train','OPER_CARRIER',JSON_OBJECT('task','classification’), @airport_model);

In the call to perform training on the model, you pass the training table, the result set label, and the type of task that needs to be performed; then the name of the model. After which the model can be looked up by the model name and seen in the MODEL_CATALOG.

SELECT model_id, model_handle, train_table_name FROM ML_SCHEMA_admin.MODEL_CATALOG;

ml_train_airport

 Loading and running the model

After the model has been created, the model needs to be loaded.  In order to load the model, you first need to define a SQL variable that will retrieve the name of the model.  This is done as follows:

SET @airline_model = (SELECT model_handle FROM ML_SCHEMA_admin.MODEL_CATALOG   ORDER BY model_id DESC LIMIT 1);

With the SQL variable set, the resulting model can be loaded:

CALL sys.ML_MODEL_LOAD(@airline_model, NULL);

ml_load_airport

Predictions

With the model loaded, we can now run predictions on the model.  In order to do this, you need to pass the model a set of inputs that are needed for the prediction.  In this example we are looking for a flight that is not delayed very often and at what time of year.  Additionally, we are looking for a percentage of 75% or more.

In order to do this, you need to create another SQL variable that has all the input parameters you are looking for.  Notice that the model is using all columns in the train, test, and validate tables other than OPER_CARRIER.  This is due to the fact that you want to know the best airline.

SET @airline_input = JSON_OBJECT('MONTH', 'Oct', 'ORIGIN_AIRPORT', 'MIA', 'SCHEDULED_DEPT_TIME', 1800, 'AVG_MINUTES_LATE', 0);

Once the SQL variable is set, you can then run the “prediction” procedure to find the row that meets the requirements of the model.

SELECT sys.ML_PREDICT_ROW(@airline_input, @airline_model,null);

ml_prediction_airport

Scoring

After the prediction is done, you’ll want to score the model and ensure that it meets the accuracy that you were looking for.  In this example, we were looking for a score of 75% or higher.

To run the score function, call ML_SCORE:

CALL sys.ML_SCORE('FLIGHTS_BTS_DELAY.bts_airport_delay_train', 'OPER_CARRIER', @airline_model, 'accuracy', @score, null);

When you run the ML_SCORE procedure, you will will need to pass the training table, the expected column value, the model to use, what accuracy you want, a SQL variable name, and any other requirements needed to score the model.  After that, you can simply select the score from the SQL variable.

SELECT @score;

ml_score_airport

Scoring is important because is give you a confident level that the model works.

Summary

Oracle Cloud Infrastructure (OCI) Heatwave is Oracle’s MySQL Enterprise Edition answer for developers and database administrators to quickly embed machine learning into their application. Oracle did this by simplifying the machine learning process through SQL and enabling it on Heatwave clusters within OCI – enabling a single database solution for both OLTP and OLAP workloads.  

Please follow and like:

Enquire now

Give us a call or fill in the form below and we will contact you. We endeavor to answer all inquiries within 24 hours on business days.