Data wrangling is a crucial step in the data analysis process, and Fabric provides a robust environment for performing these tasks efficiently. In this article, we will explore various data wrangling operations using Fabric.
Importing Data
To begin with, we need to import the dataset into a DataFrame. For instance, we can use the Titanic dataset available publicly:
import pandas as pd
# URL of the publicly available dataset (example: Titanic dataset)
url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'# Read the dataset into a DataFrame
df = pd.read_csv(url)# Display the first few rows of the DataFrame
df
Data Wrangling Operations
Click on Data Wrangler in the Notebook and select the dataframe for Data wrangling.
Dropping Columns
In Fabric, you can easily drop columns from your dataset. Simply click on the Data Wrangler in the Notebook, select the columns you want to drop, and click on apply. Notice that the code gets auto-generated in Fabric.
Dropping Duplicates
Select the columns to be used for dropping duplicates from Target columns and click on apply.
Drop missing values:
you can select the columns where you want to apply transformation from target columns.
Handling Missing Values
You can handle missing values by either dropping them or filling them with appropriate values. While filling missing values either you can replace them with 0 or you use a Fill method (like mean) from drop down.
Find and Replace
Fabric allows you to find and replace values in your dataset. You can choose the columns where you want to replace values, mention the old value and the new value, and click on apply once you are done.
Other Operations
There are many other operations available in Fabric for formatting, one-hot encoding, sorting, filtering, renaming columns, min-max scaling, etc.
Adding Code to Notebook
Once you are done with the transformations, click on “Add code to notebook”. The code for the transformation would be automatically added.