Virmach’s Black Friday machines appear and disappear rapidly. For hosting enthusiasts, constantly checking current VPS prices and deciding whether to buy is troublesome. This article lists Black Friday machine configurations using sklearn and predicts current VPS prices versus actual prices using machine learning. Finally, it alerts users whether the current VPS price is relatively cheap. Without further ado, the main steps are:
- Obtain detailed VPS configurations and prices
- Process data to generate structured data
- Use sklearn linear regression to build a prediction model
- Obtain current model configuration and send alerts
Obtain detailed VPS configurations and prices
It would be easier if we started scraping JSON data right when Black Friday began. Since I didn’t, I grabbed data from a VPS alert group chat. The saved text looks like this:
To reproduce, I uploaded the saved text file here:
virmach Black Friday text data
Please download it yourself.
The text data is irregular because some machines lack certain data fields.
Process data to generate structured data
Machine learning requires structured data, so converting the text format to structured data is necessary.
I used Linux shell commands to confirm that several feature data lengths are consistent. Finally, I selected features for the linear model as: disk, bandwidth, memory, location, IP count, CPU, with price as the output.
Using grep to extract features, for example, to get IP information:
import os
os.system('grep "^IP" virmach.txt >ip')
Then using Python’s re module to find detailed configuration info in each line:
import re # Import re module
locatep = r"Location: (.+)" # Define pattern to find location
locate = re.findall(locatep, open("locate").read()) # Find all VPS locations and store in list
Finally, write to a CSV file. The code is:
import re
import os
os.system('grep "^IP" virmach.txt >ip')
os.system('grep "^Price" virmach.txt >price')
os.system('grep "^CPU" virmach.txt >cpu')
os.system('grep "^Disk" virmach.txt >disk')
os.system('grep "^Bandwidth" virmach.txt >band')
os.system('grep "^Memory" virmach.txt >mem')
os.system('grep "^Location" virmach.txt >locate')
pricep = r"(d+.d+)"
locatep = r"Location: (.+)"
diskp = r"Disk: (d+)"
cpup = r"d+"
ipp = r"d+"
memp = r"d+"
bandp = r"d+"
price = re.findall(pricep, open("price").read())
locate = re.findall(locatep, open("locate").read())
disk = re.findall(diskp, open("disk").read())
cpu = re.findall(cpup, open("cpu").read())
ip = re.findall(ipp, open("ip").read())
mem = re.findall(memp, open("mem").read())
band = re.findall(bandp, open("band").read())
fw = open("hei.csv", 'w')
fw.write("pricetlocationtdisktcputiptmemorytbandwidth")
for i in range(len(price)):
fw.write("n" + "t".join([price[i], locate[i], disk[i], cpu[i], ip[i], mem[i], band[i]]))
fw.close()
sklearn linear regression prediction model
The most important step is to predict the Virmach VPS price based on current configurations. Multiple models can be used such as multiple linear regression or neural networks. Due to limited data, we use multiple linear regression. sklearn in Python makes this very convenient. Here’s the code:
#### Import packages
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import re
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
### Read processed structured data
df = pd.read_csv("hei.csv", sep="t")
df.location = df.location.str.split(",", n=1, expand=True)[1]
df.replace('^s+', '', regex=True, inplace=True)
### Replace categorical location with numeric values
df.location = df.location.replace('CA', 1).replace('NY', 2).replace('GA', 3).replace('NJ', 4).replace('TX', 5).replace('WA', 6).replace('IL', 7)
### View data head and tail
df.head().append(df.tail())
### sklearn linear regression model
regressor = LinearRegression()
X = df.iloc[:,1:].values
y = df.iloc[:,0].values
### Train/test split 2:1 ratio
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0)
### Train model
regressor.fit(X_train, y_train)
### Predict
y_pred = regressor.predict(X_test)
Model evaluation:
test_set_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
test_set_r2 = r2_score(y_test, y_pred)
print(test_set_rmse)
# 8.5790882402413
print(test_set_r2)
# 0.878882928666751
Model parameters:
print(regressor.intercept_)
# 7.599458793118405
print(regressor.coef_)
# [ 2.44111567e-01 2.46369652e-01 5.93846195e+00 -6.07453703e+00 2.12034498e-03 6.35691925e-04]
Obtain current model configuration and alerts
After training, use the model parameters to predict prices and alert if actual price is lower than predicted. First fetch Black Friday JSON, then use JavaScript to extract parameters and the above regression model to predict price and alert the user.
Here is runnable Python code to check current package status:
import requests
import json
import re
import time
import pandas as pd
url = 'https://billing.virmach.com/modules/addons/blackfriday/new_plan.json'
dic_ori = {}
def get_location(location):
if "CA" in location:
return 1
elif "NY" in location:
return 2
elif "GA" in location:
return 3
elif "NJ" in location:
return 4
elif "TX" in location:
return 5
elif "WA" in location:
return 6
elif "IL" in location:
return 7
def format_print(j_pd, price):
model = pd.Series([2.44111567e-01, 2.46369652e-01, 5.93846195e+00, -6.07453703e+00, 2.12034498e-03, 6.35691925e-04])
predicted_price = sum(model * j_pd)
print(predicted_price)
if predicted_price > float(price):
print(f"Current price: {price}nLower than predicted {predicted_price}")
else:
print(f"Current price: {price}nHigher than predicted {predicted_price}")
json_text = requests.get(url).text
dic = json.loads(json_text)
pf = f"""
Current Package:
LOCATION: {dic["location"]}
CPU: {int(dic["cpu"])} vCORE
HDD: {int(dic["hdd"])}GB SSD (RAID 10)
BANDWIDTH: {int(dic["bw"])} GB/month
RAM: {int(dic["ram"])}MB RAM
IPs: {int(dic["ips"])} DEDICATED IPv4
"""
print(pf)
while True:
json_text = requests.get(url).text
time.sleep(3)
dic = json.loads(json_text)
pp = r"d+.d+"
j_pd = [get_location(dic["location"]), int(dic["hdd"]), int(dic["cpu"]), int(dic["ips"]), int(dic["ram"]), int(dic["bw"])]
pf = f"""
Current Package:
LOCATION: {dic["location"]}
CPU: {int(dic["cpu"])} vCORE
HDD: {int(dic["hdd"])}GB SSD (RAID 10)
BANDWIDTH: {int(dic["bw"])} GB/month
RAM: {int(dic["ram"])}MB RAM
IPs: {int(dic["ips"])} DEDICATED IPv4
"""
price = re.findall(pp, str(dic["price"]))
if price == []:
continue
price = price[0]
if dic != dic_ori:
print(pf)
format_print(j_pd, price)
dic_ori = dic
Result example:
Summary
This article used Python’s sklearn package to build a linear regression model for all Virmach Black Friday discounted VPS plans, then predicted the price of current purchasable models, to confirm whether the current machine is cheaper than other Black Friday deals, helping users decide whether to buy the flash sale.