02-test Uncategorized 02-test I am text block. Click edit button to change this text. Lorem ipsum dolor sit…KupferschmidtAdmin23. December 2025
Big Data Engineering — Declarative Data Flows Uncategorized Big Data Engineering — Declarative Data Flows This is part 3 of a series on data engineering in a big data environment.…KupferschmidtAdmin22. October 2020
Big Data Engineering — Apache Spark Big DataPySparkSpark Big Data Engineering — Apache Spark This is part 2 of a series on data engineering in a big data environment.…KupferschmidtAdmin17. October 2020
Big Data Engineering — Best Practices Big DataSpark Big Data Engineering — Best Practices This is part 1 of a series on data engineering in a big data environment.…KupferschmidtAdmin16. October 2020
Running Jupyter with Spark in Docker Running Jupyter with Spark in Docker most attendees of dimajix Spark workshops seem to like the hands-on approach I am offering…KupferschmidtAdmin2. October 2017
Jupyter Notebooks with PySpark in AWS Jupyter Notebooks with PySpark in AWS Amazon Elastic MapReduce (EMR) is something wonderful if you need compute capacity on demand. I…KupferschmidtAdmin22. May 2017
Running Spark and Hadoop with S3 Running Spark and Hadoop with S3 Traditionally HDFS was the primary storage for Hadoop (and therefore also for Apache Spark). Naturally…KupferschmidtAdmin5. May 2017
Running PySpark on Anaconda in PyCharm Running PySpark on Anaconda in PyCharm Working with PySpark Currently Apache Spark with its bindings PySpark and SparkR is the processing…KupferschmidtAdmin15. April 2017
Building Druid for Cloudera 5.4.x Building Druid for Cloudera 5.4.x So the other day I wanted to investigate into using Druid as a reporting backend…dominik_adm1n23. March 2016