PinnedPublished inTDS ArchiveMy First Billion (of Rows) in DuckDBFirst Impressions of DuckDB handling 450Gb in a real projectMay 1, 2024A response icon11May 1, 2024A response icon11
Published inTDS ArchiveAnatomy of Windows FunctionsTheory and practice of an underappreciated SQL operationJun 11, 2024A response icon1Jun 11, 2024A response icon1
Published inTDS ArchiveAutomatically Detecting Label Errors in Datasets with CleanLabA Tale of AI and wrongly-classified Brazilian Federal LawsJul 22, 2023Jul 22, 2023
Published inTDS ArchiveAutomatically Managing Data Pipeline Infrastructures With TerraformI know the manual work you did last summerMay 2, 2023May 2, 2023
Published inTDS ArchiveData Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)Learning a little about these tools and how to integrate themApr 6, 2023A response icon2Apr 6, 2023A response icon2
Published inTDS ArchiveCreating a Data Pipeline with Spark, Google Cloud Storage and Big QueryOn-premise and cloud working together to deliver a data productMar 6, 2023A response icon2Mar 6, 2023A response icon2
Published inTDS ArchiveHands-On Introduction to Delta Lake with (py)SparkConcepts, theory, and functionalities of this modern data storage frameworkFeb 16, 2023A response icon3Feb 16, 2023A response icon3
Temporal and Geo-referenced Traffic Management with Python+StreamlitApplying modern tools to visualize time and spatial data in a dashboardJan 29, 2023A response icon1Jan 29, 2023A response icon1
Published inTDS ArchiveFirst Steps in Machine Learning with Apache SparkBasic concepts and topics of Spark MLlib packageJan 4, 2023Jan 4, 2023
Published inTDS ArchiveA Fast Look at Spark Structured Streaming + KafkaLearning the basics of how to use this powerful duo for stream-processing tasksNov 5, 2022A response icon4Nov 5, 2022A response icon4