Semi-Automated Design of Data-Intensive Architectures

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Addressing the challenges of architectural design misalignment and empirically unsupported technology selection in large-scale, highly dynamic, multi-source heterogeneous data environments, this paper proposes a semi-automated data-intensive architecture design methodology. The approach comprises three foundational mechanisms: (1) a formal service-oriented application scenario description language; (2) an architecture description language (ADL) supporting mainstream paradigms—including stream processing, batch processing, and graph computation; and (3) a system taxonomy integrating functional and performance trade-offs. Leveraging formal modeling and case-driven validation, the methodology is evaluated across multiple real-world literature cases. Results demonstrate significant improvements in architecture customization accuracy and technology selection rationality, while simultaneously optimizing service quality and resource utilization efficiency.

Technology Category

Application Category

📝 Abstract

Today, data guides the decision-making process of most companies. Effectively analyzing and manipulating data at scale to extract and exploit relevant knowledge is a challenging task, due to data characteristics such as its size, the rate at which it changes, and the heterogeneity of formats. To address this challenge, software architects resort to build complex data-intensive architectures that integrate highly heterogeneous software systems, each offering vertically specialized functionalities. Designing a suitable architecture for the application at hand is crucial to enable high quality of service and efficient exploitation of resources. However, the design process entails a series of decisions that demand technical expertise and in-depth knowledge of individual systems and their synergies. To assist software architects in this task, this paper introduces a development methodology for data-intensive architectures, which guides architects in (i) designing a suitable architecture for their specific application scenario, and (ii) selecting an appropriate set of concrete systems to implement the application. To do so, the methodology grounds on (1) a language to precisely define an application scenario in terms of characteristics of data and requirements of stakeholders; (2) an architecture description language for data-intensive architectures; (3) a classification of systems based on the functionalities they offer and their performance trade-offs. We show that the description languages we adopt can capture the key aspects of data-intensive architectures proposed by researchers and practitioners, and we validate our methodology by applying it to real-world case studies documented in literature.

Problem

Research questions and friction points this paper is trying to address.

Designing scalable architectures for heterogeneous data systems

Addressing challenges in data size, change rate, and format diversity

Assisting architects in system selection and architecture design

Innovation

Methods, ideas, or system contributions that make the work stand out.

Methodology for designing data-intensive architectures

Language for defining application scenarios

Classification of systems by functionalities

🔎 Similar Papers

No similar papers found.