With version 2.2 of Apache Spark, a long-awaited feature for the multipurpose in-memory data processing framework is now available for production use

LAGMAN 0 Comments

With version 2.2 of Apache Spark, a long-awaited feature for the multipurpose in-memory data processing framework is now available for production use.

Structured Streaming, as that feature is called, allows Spark to process streams of data in ways that are native to Spark’s batch-based data-handling metaphors. It’s part of Spark’s long-term push to become, if not all things to all people in data science, then at least the best thing for most of them.

Structured Streaming in 2.2 benefits from a number of other changes aside from losing its experimental designation. It can now work as a source or a sink for data coming from or being written to an Apache Kafka source, with lower latency for Kafka connections than previously.

Kafka, itself an Apache Software Foundation, is a distributed messaging bus widely used in streaming applications. Kafka has typically been paired with another stream-processing framework, Apache Storm, but Storm is limited to stream processing only, and Spark presents less complex APIs to the developer.

Structured Streaming jobs can now use Spark’s triggering mechanism to run a streaming job once and quit. Databricks, the chief commercial outfit supporting Spark development, claims this is a more efficient execution model than running Spark batch jobs intermittently.

The native collection of machine learning libraries in Spark, MLlib, has been outfitted with new algorithms for tasks like performing PageRank on datasets, or running multiclass logistic regression analysis (e.g., which current hit movie will a person in various demographic categories probably like best?). Machine learning is a common use case for Spark.

Machine learning in Spark also gets a major boost from improved support for the R language. Earlier versions of Spark had wider support for Java and Python than R, but Spark 2.2 adds R support for 10 distributed algorithms. Structured Streaming and the Catalog API (used for accessing query metadata in Spark SQL) can now also be used within Spark.

0 0 votes

Article Rating

Receive Job Alerts via Our Social Media Channels:

Telegram Lagmen Net job Alert
X Lagmen Net job Alert
Facebook Lagmen Net job Alert
Instagram Lagmen Net job Alert

Join Our WhatsApp Groups

Lagmen Limited Job Alert 1
Lagmen Limited Job Alert 2

Submit Your Discover News

discovernews@lagmen.net
reachus@lagmen.net Send us an update or tip via WhatsApp: 07060528734

Contact Us Now

Tel: +2348051324267
Tel: +2348094097992

0 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

[…] Prepare for next recruitment exams by reviewing past FIRS aptitude test questions and answers. The download is free and…

[…] FIRS Aptitude Test Questions and Answers (Sample) – Free PDF […]

[…] Nigerian National Petroleum Corporation (NNPC) manages Nigeria’s petroleum resources. To attract outstanding people, the corporation pays competitive entry-level salaries.…

things happen Around the world so sad!!! Thank you so much for letting me express my feeling about your post.…

Thanks for sharing your information

Receive Job Alerts via Our Social Media Channels:

Join Our WhatsApp Groups

Submit Your Discover News

Contact Us Now

You May Also Like