João Pedro – Medium

João Pedro

Pinned

João Pedro
in
Towards Data Science

My First Billion (of Rows) in DuckDB

First Impressions of DuckDB handling 450Gb in a real project

May 1

My First Billion (of Rows) in DuckDB

May 1

João Pedro
in
Towards Data Science

Anatomy of Windows Functions

Theory and practice of an underappreciated SQL operation

Jun 11

Anatomy of Windows Functions

Jun 11

João Pedro
in
Towards Data Science

Automatically Detecting Label Errors in Datasets with CleanLab

A Tale of AI and wrongly-classified Brazilian Federal Laws

Jul 22, 2023

Automatically Detecting Label Errors in Datasets with CleanLab

Jul 22, 2023

João Pedro
in
Towards Data Science

Automatically Managing Data Pipeline Infrastructures With Terraform

I know the manual work you did last summer

May 2, 2023

Automatically Managing Data Pipeline Infrastructures With Terraform

May 2, 2023

João Pedro
in
Towards Data Science

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Learning a little about these tools and how to integrate them

Apr 6, 2023

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Apr 6, 2023

João Pedro
in
Towards Data Science

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

On-premise and cloud working together to deliver a data product

Mar 6, 2023

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

Mar 6, 2023

João Pedro
in
Towards Data Science

Hands-On Introduction to Delta Lake with (py)Spark

Concepts, theory, and functionalities of this modern data storage framework

Feb 16, 2023

Hands-On Introduction to Delta Lake with (py)Spark

Feb 16, 2023

João Pedro

Temporal and Geo-referenced Traffic Management with Python+Streamlit

Applying modern tools to visualize time and spatial data in a dashboard

Jan 29, 2023

Temporal and Geo-referenced Traffic Management with Python+Streamlit

Jan 29, 2023

João Pedro
in
Towards Data Science

First Steps in Machine Learning with Apache Spark

Basic concepts and topics of Spark MLlib package

Jan 4, 2023

First Steps in Machine Learning with Apache Spark

Jan 4, 2023

João Pedro
in
Towards Data Science

A Fast Look at Spark Structured Streaming + Kafka

Learning the basics of how to use this powerful duo for stream-processing tasks

Nov 5, 2022

A Fast Look at Spark Structured Streaming + Kafka

Nov 5, 2022

João Pedro

João Pedro

Bachelor of IT at UFRN. Graduate of BI at UFRN — IMD. Strongly interested in Machine Learning, Data Science and Data Engineering.

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams