Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
RC Data Science
createAndParseSACCT
Commits
0453f064
Commit
0453f064
authored
Apr 21, 2020
by
Ryan Randles Jones
Browse files
added seaborn plots and consolidated code lines
parent
27c0dc19
Changes
1
Hide whitespace changes
Inline
Side-by-side
slurm-2sql.ipynb
View file @
0453f064
%% Cell type:code id: tags:
```
import sqlite3
import slurm2sql
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
```
%% Cell type:code id: tags:
```
upperRAMlimit = 50e+10 # 5 gigs
```
%% Cell type:code id: tags:
```
# creates database of info from March 2020 using sqlite 3
db = sqlite3.connect('/data/rc/rc-team/slurm-since-March.sqlite3')
```
%% Cell type:code id: tags:
```
# creates database of allocation info from March 2020 using sqlite 3
# not using this right now, but is here as an option
#db_allocation = sqlite3.connect('/data/rc/rc-team/slurm-since-March-allocation.sqlite3')
```
%% Cell type:code id: tags:
```
# df_1 is starting database
df_1 = pd.read_sql('SELECT * FROM slurm', db)
```
%% Cell type:code id: tags:
```
# for displaying all available column options
pd.set_option('display.max_columns', None)
df_1.head(5)
```
%% Cell type:code id: tags:
```
# df_2 is database with only ReqMemCpu and ReqMemNode, and ArrayTaskID
df_2 = df_1.loc[:,['JobName','ReqMemCPU', 'ReqMemNode', 'ArrayJobID','ArrayTaskID']]
df_2.head(5)
```
%% Cell type:code id: tags:
```
# df_batch is df_2 with only batch jobs
df_batch = df_1.JobName.str.contains('batch')
df_2[df_batch]
```
%% Cell type:code id: tags:
```
# creates database from df_batch of ReqMemCPU batch jobs that are < or = a given point
CPU_cutoff = df_2[df_batch][(df_2[df_batch].ReqMemCPU <= upperRAMlimit)]
CPU_cutoff
#CPU_cutoff
Node_cutoff = df_2[df_batch][(df_2[df_batch].ReqMemNode <= upperRAMlimit)] # 1e+10 is 1 gig
```
%% Cell type:code id: tags:
```
# gives mean, min, max, std, and 3 percentiles for cutoff data
# can change what to include or exclude
CPU_cutoff.describe(include=None, exclude=None)
```
%% Cell type:code id: tags:
```
# gives mean, min, max, std, and 3 percentiles for cutoff data
# can change what to include or exclude
Node_cutoff.describe(include=None, exclude=None)
```
%% Cell type:code id: tags:
```
CPU_fig = sns.distplot(CPU_cutoff['ReqMemCPU'], kde=False, label='CPU', color = "green")
CPU_fig.set_yscale('log')
plt.legend(prop={'size': 12})
plt.title('User Request of CPU')
plt.xlabel('Requested Gigs')
plt.ylabel('Amount of Users Requesting')
```
%% Cell type:code id: tags:
```
Node_fig = sns.distplot(Node_cutoff['ReqMemNode'], kde=False, label='Node')
Node_fig.set_yscale('log')
plt.legend(prop={'size': 12})
plt.title('User Request of Node')
plt.xlabel('Requested Gigs')
plt.ylabel('Amount of Users Requesting')
```
%% Cell type:code id: tags:
```
CPU_fig = sns.distplot(CPU_cutoff['ReqMemCPU'], kde=False, label='CPU', color = "green")
CPU_fig.set_yscale('log')
Node_fig = sns.distplot(Node_cutoff['ReqMemNode'], kde=False, label='Node') #color = 'darkblue')
Node_fig.set_yscale('log')
plt.legend(prop={'size': 12})
plt.title('User Request of CPU and Node')
plt.xlabel('Requested Gigs')
plt.ylabel('Amount of Users Requesting')
```
%% Cell type:code id: tags:
```
# creates histogram of ReqMemCPU for the month of March 2020
# uses cutoff cpu memory declared in CPU_cutoff - 1 gig
# also can show box or violing graph above to show where min, max, median, and 3rd quartile is
# the mean is at just under half a gig requested memory CPU
CPU_fig = px.histogram(CPU_cutoff, x="ReqMemCPU",
title='Histogram of ReqMemCPU',
labels={'ReqMemCPU':'ReqMemCPU'}, # can specify one label per df column
opacity=0.8,
log_y=True, # represent bars with log scale
marginal="box", # can be `box`, `violin`
hover_data=CPU_cutoff.columns,
nbins=30,
color_discrete_sequence=['goldenrod'] # color of histogram bars
)
CPU_fig.show()
```
%% Cell type:code id: tags:
```
# creates database from df_batch of ReqMemNode batch jobs that are < or = a given point
Node_cutoff = df_2[df_batch][(df_2[df_batch].ReqMemNode <= upperRAMlimit)] # 1e+10 is 1 gig
#CPU_fig.show()
```
%% Cell type:code id: tags:
```
# creates histogram of ReqMemNode for the month of March 2020
# uses cutoff node memory declared in Node_cutoff - 1 gig
# also can show box or violing graph above to show where min, max, median, and 3rd quartile is
# the mean is at just under half a gig requested memory Node
Node_fig = px.histogram(Node_cutoff, x="ReqMemNode",
title='Histogram of ReqMemNode',
labels={'ReqMemNode':'ReqMemNode'}, # can specify one label per df column
opacity=0.8,
log_y=True, # represent bars with log scale
marginal="box", # can be `box`, `violin`
hover_data=Node_cutoff.columns,
nbins=30,
color_discrete_sequence=['darkblue'] # color of histogram bars
)
Node_fig.show()
#
Node_fig.show()
```
%% Cell type:code id: tags:
```
# creates database from df_batch of ReqMemCPU batch jobs that are < or = a given point
CPU_arraytask = CPU_cutoff.ArrayID != 'NaN')] # 1e+10 is 1 gig
CPU_arraytask = CPU_cutoff.dropna(subset=['ArrayTaskID'])
CPU_arraytask
```
%% Cell type:code id: tags:
```
```
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment