In this blog, You will be learning How you can export pandas dataframe to excel (How to save pandas dataframe data into excel). In order to do this exercise, You would need two modules Pandas, and Openpyxl. You can easily install them using the pip install command Program to export pandas dataframe in Excel Firstly, […]
Data Analysis with Python – Weather Dataset
In this tutorial, You will learn about Data Analysis project End to End using Pandas. If you don’t know Pandas yet then I would suggest learning the basics of Pandas and Python, then start creating this project.You can get the dataset from me Github Repository Weather Dataset Important concept of Pandas Questions to be solved […]
How to read and write CSV file in PySpark using Databricks
Geeks, In this tutorial You will be learning how data stored in a CSV file is being read in PySpark. Moreover, You will also learn how multiple CSV files can be read and write into the location or table. Note: PySpark supports reading a CSV file with a pipe, comma, tab, space, or any other […]
DataFrame in PySpark?
PySpark DataFrame definition is very well explained by Databricks. Below is the definition I took from Databricks. If you already know Pandas then learning PySpark would be a very easy task for you because most of the syntaxes are similar. There is one most important differences between Pandas and PySpark is that Pandas’s Dataframe gets […]
Table Batch Reads and Writes
In this Tutorial, I will be going through the explanation of how data is being read and written into delta lake. Moreover, I will be also teaching other operations of the table read and write like PartitionBy, etc. Create a table Delta Lake supports creating two types of tablesโtables defined in the metastore(Managed Table) and […]
What is Delta Lake?
Delta Lake is an open-source project that enables building a Lakehouse architecture on top of data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes, such as S3, ADLS Gen1, ADLS Gen2 GCS, and HDFS. Features of Delta Lake ACID Transaction: Readers will never encounter inconsistent data due to the serializable isolation levels feature. […]
Zomato SQL Data Analysis Project
In this tutorial, We will be solving Zomato SQL Data Analysis Project. Please read the full blog If you are preparing for an interview. You can also check out the video tutorial of this project on my youtube channel. Datasets SQL Script Questions to be solved in this project Solution What is the total amount […]
What is PySpark( Spark with Python)?
You will discover what PySpark is in this PySpark Tutorial (Spark with Python) with examples. Its features, benefits, modules, and packages, as well as how to use RDD & DataFrame with sample Python code samples. For novices who are eager to study PySpark and progress their careers in BigData and Machine Learning, all of the […]
How to avoid Multiple if-else conditions in Python
In Python, we use the if else condition blocks to write code in Python, sometimes we have to write multiple if else conditions, and hence code looks confusing and difficult to understand. Let’s take below an example of the if-else program by writing a simple calculator program. The output is 30. Now, we will try […]