Dsx 1.5.0 -
Unlocking Next-Generation Data Engineering: A Deep Dive into DSX 1.5.0
| Issue ID | Description | Workaround | |----------|-------------|-------------| | DSX-4521 | Git integration fails with self-signed SSL certificates | Manually import CA cert into JVM truststore | | DSX-4788 | Data Refinery times out on files >5GB | Use Spark notebook instead; patch in 1.5.1 | | DSX-4912 | Kernel fails to start when user has >500 HDFS files | Increase kernel_proxy_timeout in config.yaml | | DSX-5023 | Automated testing for R kernels broken after upgrade | Reinstall R kernel spec: jupyter kernelspec install R |
However, for any internet-connected environment or team seeking modern MLOps (CI/CD for models), DSX 1.5.0 is obsolete. dsx 1.5.0
- New public endpoints and client methods for data import/export, bulk operations, and streaming ingestion.
- Extended query/filter syntax: additional operators (e.g., regex, fuzzy match), richer aggregation primitives, and time-windowing functions.
Cause:
The default memory limit for sidecar containers was reduced. Fix: Set DSX_KERNEL_MEM_LIMIT=8Gi in your project environment variables. Unlocking Next-Generation Data Engineering: A Deep Dive into
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() print(spark.version) New public endpoints and client methods for data