np.random.choice(array,n,replace=True,p)
df.info()
- gives column by col review of dtypes and number of non-null entries
df.sample(n)
- returns a random sample of n entries from the df. Good for looking for quality problems. Default is n=1.
df.head(n)
- returns the first n rows of the dataframe
df.tail(n)
- returns the last n rows of the dataframe
Data is organized to meet a schema. Think tables which organize data into rows and columns.
Data is unorganized and lacks a schema. Imagine collections of html documents including text and images not organized in any consistent way.