PyGWalker
PyGWalker: A Python Library for Exploratory Data Analysis with Visualization
PyGWalker (opens in a new tab) can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a Tableau-style User Interface for visual exploration.
PyGWalker (pronounced like "Pig Walker", just for fun) is named as an abbreviation of "Python binding of Graphic Walker". It integrates Jupyter Notebook (or other jupyter-based notebooks) with Graphic Walker (opens in a new tab), a different type of open-source alternative to Tableau. It allows data scientists to analyze data and visualize patterns with simple drag-and-drop operations.
Visit Google Colab (opens in a new tab), Kaggle Code (opens in a new tab) or Graphic Walker Online Demo (opens in a new tab) to test it out!
Getting Started
Run in Kaggle (opens in a new tab) | Run in Colab (opens in a new tab) |
---|---|
(opens in a new tab) | (opens in a new tab) |
Setup pygwalker
Before using pygwalker, make sure to install the packages through the command line using pip or conda.
pip
pip install pygwalker
Note
For an early trial, you can install with
pip install pygwalker --upgrade
to keep your version up to date with the latest release or evenpip install pygwaler --upgrade --pre
to obtain latest features and bug-fixes.
Conda-forge
conda install -c conda-forge pygwalker
or
mamba install -c conda-forge pygwalker
See conda-forge feedstock (opens in a new tab) for more help.
Use pygwalker in Jupyter Notebook
Quick Start
Import pygwalker and pandas to your Jupyter Notebook to get started.
import pandas as pd
import pygwalker as pyg
You can use pygwalker without breaking your existing workflow. For example, you can call up Graphic Walker with the dataframe loaded in this way:
df = pd.read_csv('./bike_sharing_dc.csv')
walker = pyg.walk(df)
That's it. Now you have a interactive UI to analyze and visualize data with simple drag-and-drop operations.
Cool things you can do with PyGwalker:
-
You can change the mark type into others to make different charts, for example, a line chart:
-
To compare different measures, you can create a concat view by adding more than one measure into rows/columns.
-
To make a facet view of several subviews divided by the value in dimension, put dimensions into rows or columns to make a facets view. The rules are similar to Tableau.
-
You can view the data frame in a table and configure the analytic types and semantic types.
-
You can save the data exploration result to a local file
For more detailed instructions, visit the Graphic Walker GitHub page (opens in a new tab).
Better Practice
There are some important parameters you should know when using pygwalker:
spec
: for save/load chart config (json string or file path)use_kernel_calc
: for using duckdb as computing engine which allows you to handle larger dataset faster in your local machine.
df = pd.read_csv('./bike_sharing_dc.csv')
walker = pyg.walk(
df,
spec="./chart_meta_0.json", # this json file will save your chart state, you need to click save button in ui mannual when you finish a chart, 'autosave' will be supported in the future.
use_kernel_calc=True, # set `use_kernel_calc=True`, pygwalker will use duckdb as computing engine, it support you explore bigger dataset(<=100GB).
)
Example in local notebook
- Notebook Code: Click Here (opens in a new tab)
- Preview Notebook Html: Click Here (opens in a new tab)
Example in cloud notebook
Use pygwalker in Streamlit
Streamlit allows you to host a web version of pygwalker without figuring out details of how web application works.
Here are some of the app examples build with pygwalker and streamlit:
- PyGWalker + streamlit for Bike sharing dataset (opens in a new tab)
- Earthquake Dashboard (opens in a new tab)
from pygwalker.api.streamlit import StreamlitRenderer
import pandas as pd
import streamlit as st
# Adjust the width of the Streamlit page
st.set_page_config(
page_title="Use Pygwalker In Streamlit",
layout="wide"
)
# Add Title
st.title("Use Pygwalker In Streamlit")
# You should cache your pygwalker renderer, if you don't want your memory to explode
@st.cache_resource
def get_pyg_renderer() -> "StreamlitRenderer":
df = pd.read_csv("./bike_sharing_dc.csv")
# If you want to use feature of saving chart config, set `spec_io_mode="rw"`
return StreamlitRenderer(df, spec="./gw_config.json", spec_io_mode="rw")
renderer = get_pyg_renderer()
renderer.render_explore()
Tested Environments
- Jupyter Notebook
- Google Colab
- Kaggle Code
- Jupyter Lab (WIP: There're still some tiny CSS issues)
- Jupyter Lite
- Databricks Notebook (Since version
0.1.4a0
) - Jupyter Extension for Visual Studio Code (Since version
0.1.4a0
) - Hex Projects (Since version
0.1.4a0
) - Most web applications compatiable with IPython kernels. (Since version
0.1.4a0
) - Streamlit (Since version
0.1.4.9
), enabled withpyg.walk(df, env='Streamlit')
- DataCamp Workspace (Since version
0.1.4a0
) - ...feel free to raise an issue for more environments.
Configuration
Since pygwalker>=0.1.7a0
, we provide the ability to modify user-wide configuration either through the command line interface
$ pygwalker config
usage: pygwalker config [-h] [--set [key=value ...]] [--reset [key ...]] [--reset-all] [--list]
Modify configuration file.
optional arguments:
-h, --help show this help message and exit
--set [key=value ...]
Set configuration. e.g. "pygwalker config --set privacy=get-only"
--reset [key ...] Reset user configuration and use default values instead. e.g. "pygwalker config --reset privacy"
--reset-all Reset all user configuration and use default values instead. e.g. "pygwalker config --reset-all"
--list List current used configuration.
or through Python API
>>> import pygwalker as pyg, pygwalker_utils.config as pyg_conf
>>> help(pyg_conf.set_config)
Help on function set_config in module pygwalker_utils.config:
set_config(config: dict, save=False)
Set configuration.
Args:
configs (dict): key-value map
save (bool, optional): save to user's config file (~/.config/pygwalker/config.json). Defaults to False.
(END)
Privacy Policy
$ pygwalker config --set
usage: pygwalker config [--set [key=value ...]] | [--reset [key ...]].
Available configurations:
- privacy ['offline', 'get-only', 'meta', 'any'] (default: meta).
"offline" : no data will be transfered other than the front-end and back-end of the notebook.
"get-only" : allow fetch latest pygwalker version to check update.
"meta" : only the desensitized data will be processed by external servers. Required for using LLM to generate charts.
"any" : the data can be processed by external services.
For example,
pygwalker config --set privacy=meta
in command line and
import pygwalker as pyg, pygwalker.utils_config as pyg_conf
pyg_conf.set_config( { 'privacy': 'meta' }, save=True)
have the same effect.
Resources
- Check out more resources about Graphic Walker on Graphic Walker GitHub (opens in a new tab)
- We are also working on RATH (opens in a new tab): an Open Source, Automate exploratory data analysis software that redefines the workflow of data wrangling, exploration and visualization with AI-powered automation. Check out the Kanaries website (opens in a new tab) and RATH GitHub (opens in a new tab) for more!
- Use pygwalker to build visual analysis app in streamlit (opens in a new tab)
- If you encounter any issues and need support, join our Slack (opens in a new tab) or Discord (opens in a new tab) channels.
- Use pygwalker with Gradio