Machine Learning

Overview

DLTK makes machine learning and AI more accessible and more performant. Use this software platform for building custom models on the cloud. Enterprises can rapidly deploy instances fully optimized with blazing performance. The DLTK platform includes the most popular machine learning frameworks and their dependencies, and it is built for easy and rapid deployment. Users can also learn Machine Learning in the platform itself and become a professional while using the platform to build ML solutions.

Beginner's Guide

Artificial intelligence will fashion our tomorrow more effectively than any other innovation this era has ever seen. Anyone who has failed to keep up will find themselves lagging and wake up in a world full of technology that appears more like fantasy.

Machine Learning (ML) is defined as the use of algorithms and computational statistics to learn from data without being explicitly programmed. It is a subsection of the artificial intelligence domain within computer science. While the field of machine learning did not explode until more recently, the term was first coined in 1959 and the most foundational research was done throughout the ’70s and ’80s. Machine learning’s rise to prominence today has been enabled by the abundance of data, more efficient data storage, and faster computers.

Terminologies
  • Training Data : The data you use for training your model. It contains all the information you have collected about the problem statement.
  • Training Test Data : The data you use for testing the model. You can make predictions on this data.
  • Training Features : These are all the columns in your dataset which you use for training your model.
  • Training Class / Label : This is the column that identifies the particular record and the one you want to predict.
  • Training Accuracy : This is the percentage of data correctly predicted when you apply the model to your data.

Quickstart

The Following 4 services are provided by DLTK as part of Release 1.0.0.

  • Training Classification  : Classify entities into binary or multi classes.
  • Training Regression : Predict a value from a continuous range.
  • Training Feedback : Training a model with new set of data.
  • Training Clustering : Cluster your dataset based on your business needs.

APIs

All URIs below are relative to https://prod-kong.dltk.ai

Train a Classification ModelPOST  /machine/classification/train
Train a Regression ModelPOST /machine/regression/train
Predicting from Classification ModelPOST  /machine/classification/predict
Predicting from Regression ModelPOST  /machine/regression/predict
Feedback from Regression ModelPOST  /machine/regression/feedback
Feedback from Classification ModelPOST  /machine/classification/feedback
Cluster ModelPOST  /machine/cluster
Get Job StatusGET    /machine/job/status?id={id}
Get Job Output
GET   /machine/output/findBy?jobId={id}

Train a Classification Model

Description

This API would enable you to train a classification model. The model takes some time to be trained and therefore the job status has to be checked. Once the job is completed, the job output API would give you the model info.

URI
POST  /machine/classification/train

Headers
api-key Your App’s API Key
Attributes
library dltk_ai / weka
service Classification
task Train
config.name Name of the model
config.algorithm Name of the algorithm. See the list of algorihtms below.
config.datasetUrl Path of the train data after uploading to cloud storage. See here for more Info.
config.label Column name to be predicted.
config.features List of column names for training.
config.trainPercentage Percentage of data used for training, rest gets used for evaluating the model.
config.saveModel True / False
config.params Any configurations required for libraries.
Request Example:
{
  "library": "weka",
  "service": "classification",
  "task": "train",
  "config": {
    "name": "Player Churn Model",
    "algorithm": "NaiveBayesBinomial",
    "datasetUrl": "/dltk-ai/library/player_train.csv",
    "label": "player_activity",
    "trainPercentage": 80,
    "features": ["stamina","challenges","achievements"],
    "saveModel": "true",
    "params": {}
  }
}
Response:
{
    "code": 200,
    "data": {
        "jobId": 969,
        "appId": 1558586024244,
        "name": "weka_classification_train",
        "library": "weka",
        "service": "Classification",
        "task": "TRAIN",
        "state": "RUN",
        "startTime": "2019-06-21T04:30:54.283+0000",
        "endTime": null,
        "request": {
            "library": "weka",
            "config": {
                "name": "Player Churn Model",
                "algorithm": "NaiveBayesBinomial",
                "datasetUrl": "/dltk-ai/library/player_train.csv",
                "label": "player_activity.Grid",
                "trainPercentage": 80,
                "saveModel": "true",
                "params": {},
                "features":  ["stamina","challenges","achievements"]
            }
        },
        "isStreamJob": false,
        "isJobStopped": null
    }
}

Predicting from a Classification Model

Description

This API would enable you to predict a classification model. Once the job is completed, the prediction output API would give you the file info from which you can get the predictions.

URI
POST /machine/classification/predict

Headers

api-key Your App’s API Key
Request Example:
{
  "library": "weka",
  "service": "classification",
  "config": {
    "datasetUrl": "/dltk-ai/library/player_test.csv",
    "modelUrl":"/dltk-ai/1/1550423221357-NaiveBayesMultinomial.mdl",
    "params":{

       }
    }
}
Response:
{
    "code": 200,
    "data": {
        "jobId": 970,
        "appId": 1560322200284,
        "name": "weka_classification_predict",
        "library": "weka",
        "service": "Regression",
        "task": "PREDICT",
        "state": "RUN",
        "startTime": "2019-06-21T04:33:30.418+0000",
        "endTime": null,
        "request": {
            "library": "weka",
            "config": {
                "modelUrl": "/dltk-ai/1/1550423221357-NaiveBayesMultinomial.mdl",
                "datasetUrl": "/dltk-ai/library/player_test.csv",
                "features":  ["stamina","challenges","achievements"]
            }
        },
        "isStreamJob": false,
        "isJobStopped": null
    }
}

Train on a new dataset with already built Classification Model

Description

This API would enable you to train on already built classification models with a new dataset, with the same features and algorithm. Once the job is completed, the job output API would give you the model info.

URI
POST /machine/classification/feedback

Headers

api-key Your App’s API Key
Request Example:
{
	"library":"weka",
	"service":"Classification",
	"task":"FEEDBACK",
	"config":{
		"name":"Player Churn Model",
		"algorithm":"NaiveBayesBinomial",
		"datasetUrl": "/dltk-ai/library/player_feedback.csv",
		"modelUrl": "/dltk-ai/library/1550423221357-NaiveBayesMultinomial.mdl",
		"feedbackDatasetUrl":"/dltk-ai/library/player_feedback.csv",
		"features":["stamina","challenges","achievements"],
		"trainPercentage": 80,
		"label": "player_activity",
		"saveModel":true,
		"params":{}
	}
}
Response:
{
    "code": 0,
    "data": {
        "jobId": 971,
        "appId": 1560322200284,
        "name": "weka_classification_feedback",
        "library": "h2o",
        "service": "Classification",
        "task": "FEEDBACK",
        "state": "RUN",
        "startTime": "2019-06-21T04:37:47.999+0000",
        "endTime": null,
        "request": {
            "library": "weka",
            "config": {
                "name": "Player Churn Model",
                "label": "player_activity",
                "params": {},
                "features": ["stamina","challenges","achievements"],
                "algorithm": "NaiveBayesBinomial",
                "saveModel": "true",
                "datasetUrl": "/dltk-ai/library/player_feedback.csv",
                "trainPercentage": 80,
                "feedbackDatasetUrl": "/dltk-ai/library/player_feedback.csv",
                "modelUrl": "/dltk-ai/library/1550423221357-NaiveBayesMultinomial.mdl"
            }
        },
        "isStreamJob": false,
        "isJobStopped": null
    }
}

Train a Regression Model

Description

This API would enable you to train a regression model. The model takes some time to be trained and therefore the job status has to be checked. Once the job is completed, the job output API would give you the model info.

URI
POST /machine/regression/train

Headers

api-key Your App’s API Key

Attributes

library dltk_ai / weka
service regression
task train
config.name Name of the model
config.algorithm Name of the algorithm. See the list of algorihtms below.
config.datasetUrl Path of the train data after uploading to cloud storage. See here for more Info.
config.label Column name to be predicted.
config.features List of column names for training.
config.trainPercentage Percentage of data used for training, rest gets used for evaluating the model.
config.saveModel True / False
config.params Any configurations required for libraries.
Request Example:
{
  "library": "dltk_ai",
  "service": "regression",
  "task": "train",
  "config": {
    "name": "Housing Price Model",
    "algorithm": "LinearRegression",
    "datasetUrl": "/dltk-ai/library/hp_train.csv",
    "label": "price",
    "trainPercentage": 80,
    "features": ["area","parking_area"],
    "saveModel": "true",
    "params": {}
  }
}
Response:
{
    "code": 200,
    "data": {
        "jobId": 971,
        "appId": 1558586024244,
        "name": "weka_regression_train",
        "library": "weka",
        "service": "Regression",
        "task": "TRAIN",
        "state": "RUN",
        "startTime": "2019-06-21T04:30:54.283+0000",
        "endTime": null,
        "request": {
            "library": "weka",
            "config": {
                "name": "Housing Price Model",
                "algorithm": "LinearRegression",
                "datasetUrl": "/dltk-ai/library/hp_train.csv",
                "label": "price",
                "trainPercentage": 80,
                "saveModel": "true",
                "params": {},
                "features":  ["area","parking_area"]
            }
        },
        "isStreamJob": false,
        "isJobStopped": null
    }
}

Predicting from Regression Model

Description

This API would enable you to get predictions from the regression model. Once the job is completed, prediction API would give you the file info containing the predictions.

URI
POST /machine/regression/predict

Headers

api-key Your App’s API Key
Request Example:
{
  "library": "weka",
  "service": "regression",
  "config": {
    "datasetUrl": "/dltk-ai/library/hp_test.csv",
   "modelUrl":"/dltk-ai/1/1550423221357-LinearRegression.mdl",
       "params":{

       }
    }
}
Response:
{
    "code": 0,
    "data": {
        "jobId": 972,
        "appId": 1560322200284,
        "name": "weka_regression_predict",
        "library": "weka",
        "service": "Regression",
        "task": "PREDICT",
        "state": "RUN",
        "startTime": "2019-06-21T04:33:30.418+0000",
        "endTime": null,
        "request": {
            "library": "scikit",
            "config": {
                "modelUrl": "/dltk-ai/1/1550423221357-LinearRegression.mdl",
                "datasetUrl": "/dltk-ai/library/hp_test.csv",
                "features":  ["area","parking_area"]
            }
        },
        "isStreamJob": false,
        "isJobStopped": null
    }
}

Train on a new dataset with already built Regression Model

Description

This API would enable you to train on an already built Regression models with a new dataset with the same features and algorithm. Once the job is completed, the job output API would give you the model info.

URI
POST /machine/regression/feedback

Headers

api-key Your App’s API Key
Request Example:
{
	"library":"weka",
	"service":"regression",
	"task":"FEEDBACK",
	"config":{
		"name":"Housing Price Model",
		"algorithm":"LinearRegression",
		"datasetUrl": "/dltk-ai/library/hp_train.csv",
		"modelUrl": "/dltk-ai/1/1550423221357-LinearRegression.mdl",
		"feedbackDatasetUrl":"/dltk-ai/library/hp_feedback.csv",
		"features":["area","parking_area"],
		"trainPercentage": 80,
		"label": "price",
		"saveModel":true,
		"params":{}
	}
}
Response:
{
    "code": 0,
    "data": {
        "jobId": 974,
        "appId": 1560322200284,
        "name": "weka_regression_feedback",
        "library": "weka",
        "service": "Regression",
        "task": "FEEDBACK",
        "state": "RUN",
        "startTime": "2019-06-21T04:37:47.999+0000",
        "endTime": null,
        "request": {
            "library": "h2o",
            "config": {
                "name": "Housing Price Model",
                "label": "price",
                "params": {},
                "features": ["area","parking_area"],
                "algorithm": "LinearRegression",
                "saveModel": "true",
                "datasetUrl": "/dltk-ai/library/hp_train.csv",
                "trainPercentage": 80,
                "feedbackDatasetUrl": "/dltk-ai/library/hp_feedback.csv",
                "modelUrl": "/dltk-ai/1/1550423221357-LinearRegression.mdl"
            }
        },
        "isStreamJob": false,
        "isJobStopped": null
    }
}

Cluster Model

Description

This API would enable you to cluster a dataset. Clustering would take some time to be completed and therefore the job status has to be checked. Once the job is completed, the job output API would give you the model info.

URI
POST /machine/cluster

Headers

api-key Your App’s API Key
Attributes
text Text Sentence (String)
Request Example:
{
"library":"weka",
"service":"Clustering",
"task":"CLUSTER",
"config":{
	"name":"Clustering",
	"algorithm":"KMeansClustering",
	"datasetUrl":"/dltk-ai/library/moon_data.csv",
	"numOfClusters": 2,
	"saveModel": "True",
	"params":{},
	"features":["X","Y"]
	}
}
Response:
{
    "code": 200,
    "data": {
        "jobId": 968,
        "appId": 1558586024244,
        "name": "weka_clustering_cluster",
        "library": "weka",
        "service": "Clustering",
        "task": "CLUSTER",
        "state": "RUN",
        "startTime": "2019-06-21T04:28:12.116+0000",
        "endTime": null,
        "request": {
            "library": "weka",
            "config": {
                "name": "Clustering",
                "algorithm": "KMeansClustering",
                "datasetUrl": "/dltk-ai/library/moon_data.csv",
                "numOfClusters": 2,
                "saveModel": "True",
                "params": {},
                "features": [
                    "X",
                    "Y"
                ]
            }
        },
        "isStreamJob": false,
        "isJobStopped": null
    }
}

Get Job Status

Description

The train/predict jobs take some amount of time to be completed and this API can check their status.

URI
GET /machine/job/status?id={id}

Headers

api-key Your App’s API Key
Attributes
NoneNone
  Response:
{
    "id": 21,
    "name": "Player Churn Model",
    "library": "weka",
    "service": "Classification",
    "task": "PREDICT",
    "state": "FINISH",
    "startTime": "2019-02-17T18:25:19.587+0000",
    "endTime": "2019-02-17T18:25:24.583+0000",
    "msg": null,
    "request": {
        "library": "dltk_ai",
        "config": {
            "params": {},
            "modelUrl": "/dltk-ai/1/1550427728251-NaiveBayesMultinomial_5044073238607802124mdl",
            "datasetUrl": "/dltk-ai/library/rg_test.csv"
        }
    }
}

Get Job Output

Description

Once the job status is completed, the job output can be retrieved from this API.

URI
GET /machine/output/findBy?jobId={id}

Headers

api-key Your App’s API Key
Attributes
NoneNone
Response:
{
    "id": 9,
    "jobId": 20,
    "state": null,
    "output": {
        "eval": {
            "kappa": -0.05913503971756384,
            "recall": {
                "Active": 0.5723684210526315,
                "Churned": 0.3541666666666667
            },
            "correct": 104,
            "accuracy": 52,
            "revision": "14755",
            "rocCurve": {
                "values": [
                    [
                        1,
                        1
                    ],
                    [
                        0.8958,
                        0.7368
                    ],

                ]
            },
            "errorRate": 0.48,
            "inCorrect": 96,
            "precision": {
                "Active": 0.7372881355932204,
                "Churned": 0.2073170731707317
            },
            "areaUnderPRC": {
                "Active": 0.7848942279681246,
                "Churned": 0.237283172269615
            },
            "areaUnderROC": {
                "Active": 0.49506578947368424,
                "Churned": 0.518297697368421
            },
            "priorEntropy": 0.7986194718732207,
            "confusionMatrix": [
                [
                    17,
                    31
                ],
                [
                    65,
                    87
                ]
            ],
            "numTrueNegatives": {
                "Active": 17,
                "Churned": 87
            },
            "numTruePositives": {
                "Active": 87,
                "Churned": 17
            },
            "trueNegativeRate": {
                "Active": 0.3541666666666667,
                "Churned": 0.5723684210526315
            },
            "truePositiveRate": {
                "Active": 0.5723684210526315,
                "Churned": 0.3541666666666667
            },
            "falseNegativeRate": {
                "Active": 0.4276315789473684,
                "Churned": 0.6458333333333334
            },
            "falsePositiveRate": {
                "Active": 0.6458333333333334,
                "Churned": 0.4276315789473684
            },
            "numFalseNegatives": {
                "Active": 65,
                "Churned": 31
            },
            "numFalsePositives": {
                "Active": 31,
                "Churned": 65
            },
            "pearsonCorrelation": {
                "challenges": 0.24937135217246517,
                "achievements": 0.18263960513415353,
                "stamina": 0.2493238592388467
            },
            "confusionMatrixHeaders": [
                "Churned",
                "Active"
            ],
            "correlationCoefficient": 0,
            "mathewsCorrelationCoefficient": {
                "Active": -0.06379320872133686,
                "Churned": -0.06379320872133686
            }
        },
        "modelUrl": "/dltk-ai/1/1550427728251-NaiveBayesMultinomial_5044073238607802124mdl"
    }
}

SDK

Installation

DLTK requires Python version greater than 3.5. One can install DLTK SDK using the following command:
pip install dltk_ai

Creating Client

Create DLTK client to perform a different task.

client = dltk_ai.DltkAiClient('Your API Key')

To use these services, one needs to register to cloud.dltk.ai website and create a project. Copy your API key to use different APIs.

Uploading files

Upload dataset to dltk’s cloud storage

#Training Dataset

train_file_store_response = client.store("Sample_Train.csv", Dataset.TRAIN_DATA)
train_data = train_file_store_response["fileUrl"]

#Testing Dataset

test_file_store_response = client.store("Sample_Test.csv", Dataset.TEST_DATA)
test_data = test_file_store_response["fileUrl"]

To use these services, one needs to register to dltk website and create a project. Copy your API key to use different APIs.

Training a model

To train a model, one needs to pass specific parameters. Parameters for training a model are:

  • Type: ‘Classification’ or ‘Regression’.

  • Algorithm: Algorithm by the which model will be trained usch as ‘LinearRegression’ , ‘RandomForest’, etc.

  • Dataset: Dataset file location in dltk storage.

  • Label: Label or Target variable in the dataset file.

  • Features: Column name list which is to be used for model training.

  • Model name: The model name you want to give.

  • Library: Library for training the model. Currently dltk as weka, scikit-learn & H2O.

  • Train split percentage: Percentage of data to be used for training and the model will be tested against the remaining % of data.

train = client.train("regression","LinearRegression", train_data,
                     "SalePrice", ["YearBuilt","YearRemodAdd","TotalBsmtSF",
                     "AboveGrLiveAr","TotalBathroom","TotalRooms","ParkingSpace"],
                     model_name="Housing_Price_Model",lib='weka', train_percentage=80,
                     save_model=True)

Check Job Status & Output

After requesting model training, a job is created with given details. It takes a few seconds to train a model. To check Job status, use the below command:
train_job_status_response = client.job_status(train["data"]["jobId"])
Once job state changes to ‘FINISH’, get the model evaluation metrics:
train_job_output_response = client.job_output(train["data"]["jobId"])

Use model for prediction

After the model has been trained get model url & use it for prediction:
model = train_job_output_response["output"]["modelUrl"]
predict_response = client.predict("regression", test_data, model)
Prediction job will be created. Once job status states ‘FINISH’, get prediction for the test dataset:
predict_job_status_response = client.job_status(predict_response["data"]["jobId"])
predict_job_output_response = client.job_output(predict_response["data"]["jobId"])
pred_file = predict_job_output_response['output']['predFileUrl']

Release Notes

Following are the release notes as part of Release 1.0.0

  • Algorithms supported under classification are LogisticMultilayerPerceptronNaiveBayesMultinomialRandomForestLibSVM, AdaBoostM1AttributeSelectedClassifierBaggingCostSensitiveClassifierDecisionTableGaussianProcessesIBkRandomTree and SMO.

  • Algorithms supported under Regression are LinearRegressionAdditiveRegression.

  • Feedback dataset should contain the same features as the original dataset.

  • Max train file upload size is 100 MB.

  • Max test file upload size is 50 MB.

  • Max no. of features selected for training cannot be more than 20.

Login

Register