Skip to content

Instantly share code, notes, and snippets.

@lauracodecreations
Last active February 27, 2018 07:35
Show Gist options
  • Save lauracodecreations/0fd5ea81be217a7ccd39cacaba7397b9 to your computer and use it in GitHub Desktop.
Save lauracodecreations/0fd5ea81be217a7ccd39cacaba7397b9 to your computer and use it in GitHub Desktop.

Descriptive Statistics per Column in Azure ML

The Problem

You want to output the descriptive statistics of a column in a dataset, and you want to output the result in an specific format. For example, the name of the 50% index changed to 'median'.

You do not want to install Python in your computer to do this work.

Solution

Use Microsoft Azure Machine Learning Studio: https://studio.azureml.net/, a free platform to create and run code in Python.

The definition below outputs the statistics in the specified format above.

Data: It needs to be uploaded in Azure ML Studio under the "Datasets" tab, and it needs to be a csv format.

Method: It takes two parameters, the first one is the dataset we are working on, and the second one is the column name we are interested in calculating the descriptive statistics.

Example

To use it:

  1. Change the name of the dataset in the "ds" variable

  2. Identify the name of the column we are interested in in this case 'Ptrust'

  3. Run:

     describe(frame, 'PTrustb')
    
from azureml import Workspace
import pandas as pd
import numpy as np
ws = Workspace()
ds = ws.datasets['BBBS_Data.csv']
frame = ds.to_dataframe()
# print dimensions of the data frame to check that everything is there as expected
print('dimensions = ' + str(frame.shape))
frame = pd.to_numeric(frame)
def describe(df, col):
desc = df[col].describe()
## Change the name of the 50% index to median
idx = desc.index.tolist()
idx[5] = 'median'
desc.index = idx
return desc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment