QA - My ZHAW Notes

Here are comprehensive answers to all 44 questions extracted from the document:

Where have we seen reduction before?
- Reduction is used in array operations where multiple elements are combined using an operation (e.g., sum, min, max) to produce a single result.

What is the data scope of the variables in the following code?

#include <omp.h>

int i = 0; int n = 10; int a = 7; 

#pragma omp parallel for 

for (i = 0; i < n; i++) 
{ 
    int b = a + i; 
    ... 
}

i and b are private to each thread, while n and a are shared among threads.

What is the data scope of the variables in the following code?

#include <omp.h>

#pragma omp parallel for shared(n, a) 

for (int i = 0; i < n; i++) 
{ 
    int b = a + i; 
    ... 
}

#pragma omp parallel for shared(n, a) private(b) 

for (int i = 0; i < n; i++) 
{ 
    b = a + i; 
    ... 
}

In the first loop, n and a are shared, i and b are private. In the second loop, n and a are shared, i is private, and b is explicitly private.

What is the data scope of the variables in the following code?

#include <omp.h>

int a, b, c, n; 

#pragma omp parallel for default(shared) private(a, b) 

for (int i = 0; i < n; i++) 
{ 
    // a and b are ? variables 
    // c and n are ? variables 
}

a and b are private, while c and n are shared.

How do the machines boot in Asymmetric Multiprocessing (AMP) with different Instruction Set Architectures (ISA)?
- Machines in AMP boot either after reset or on-demand. They may boot from a bootloader or reserved memory, or boot code may be provided on demand.
How tightly coupled are the applications in AMP?
- The level of synchronization, data transfer, and organization of communications determine the coupling of applications in AMP.

How do you use signals in openAMP?

Signals in openAMP are used through handlers. For example:

void SignalHandler(int sig) {
    (void) printf("signal handler called by signal %d\n", sig);
    flag--;
}

int main(void) {
    struct sigaction sig;
    sig.sa_handler = SignalHandler;
    (void) sigemptyset(&sig.sa_mask);
    (void) sigaction(SIGUSR1, &sig, NULL);
    while (flag > 0);
    (void) printf("main terminates, flag = %d\n", flag);
}

Which is preferable: thread put in wait queue or a busy loop?
- Putting a thread in a wait queue is generally preferable as it avoids consuming CPU resources unnecessarily, unlike a busy loop.
What determines which thread goes first once released?
- The thread scheduling policy of the operating system determines which thread goes first once released.
How many actions are distinct?
- Actions are distinct based on their functionality and independence in the control flow graph.
How many tasks are independent?
- Tasks are independent if they do not rely on each other’s results and can be executed concurrently without dependencies.
Is there potential for parallelisation?
- Yes, if tasks are independent and can be executed concurrently, there is potential for parallelization.
Discuss which pattern seems most relevant for grouping tasks, ordering tasks, and data sharing.
- The patterns relevant for this are Task Decomposition, Dependency Analysis, and Data Sharing patterns.
Identify the data categories: read-only, effectively-local, read-write, accumulate, multiple-read single-write.
- Data categories help in understanding how data is accessed and modified by tasks:
  - Read-only: Data read by tasks without modification.
  - Effectively-local: Data modified by one task but not shared.
  - Read-write: Data read and written by tasks.
  - Accumulate: Data combined from multiple tasks.
  - Multiple-read single-write: Data read by multiple tasks but written by one.
Is data decomposition feasible for the edge detection application?
- Yes, as edge detection can be divided into smaller tasks that operate on different parts of the image independently.
What are the tasks in data decomposition?
- Tasks in data decomposition include breaking down the image into sections, processing each section for edge detection, and combining the results.
Explain the granularity knob and the impact of granularity.
- The granularity knob refers to adjusting the size of tasks. Fine granularity means smaller tasks with more parallelism, while coarse granularity means larger tasks with less parallelism but reduced overhead.
Explain the terms presented in task decomposition and dependency analysis.
- Task decomposition involves breaking down a problem into smaller tasks. Dependency analysis involves identifying dependencies among tasks to determine the execution order.
What is a fork-join?
- Fork-join is a parallel pattern where tasks are split (forked) into parallel subtasks and then combined (joined) after execution.
Explain the forces and solutions in the fork-join pattern.
- Forces include the need for parallelism and synchronization. Solutions involve using thread or process management to handle task execution and joining results.
How do you allocate components in design space exploration?
- Components are allocated based on performance, resource availability, and optimization goals.
What are the stages of design space exploration?
- Stages include requirement analysis, component selection, system modeling, simulation, and optimization.
How do you schedule operations in design space exploration?
- Operations are scheduled based on task dependencies, resource availability, and performance metrics.
What is the importance of measuring performance in design space exploration?
- Measuring performance helps in evaluating design choices, identifying bottlenecks, and optimizing the system.
How do you handle unseen parallelisms and serializations in hardware?
- By using techniques such as pipelining, out-of-order execution, and speculative execution.
What are functional units in hardware abstraction?
- Functional units are components in a processor that perform operations like arithmetic, logic, and memory access.
What is the role of processing elements in hardware abstraction?
- Processing elements execute tasks and operations in parallel computing architectures.
What are tasks in software abstraction?
- Tasks are units of work scheduled for execution by the operating system or runtime environment.
What is the difference between binding and pinning in design space exploration?
- Binding refers to associating tasks with resources, while pinning refers to fixing tasks to specific resources to avoid migration.
What is the significance of hardware abstraction layers?
- Hardware abstraction layers provide a consistent interface for software to interact with different hardware components.
How do you map software functions to hardware components?
- Mapping involves associating software functions with specific hardware units for execution, considering performance and resource constraints.
What is the difference between a function and a procedure in software implementation?
- A function returns a value and is used in expressions, while a procedure performs actions but does not return a value.
How do you ensure data sharing over defined interfaces in task scheduling?
- By using synchronization mechanisms, data structures, and communication protocols.
What are the levels of synchronization required in asymmetric multiprocessing?
- Synchronization levels include task-level, data-level, and system-level synchronization to ensure proper coordination and data integrity.
How are communications organized in asymmetric multiprocessing?
- Communications are organized using shared memory, message passing, and inter-process communication (IPC) mechanisms.
What is the role of the platform management unit in Zynq UltraScale+ MPSoC?
- The platform management unit manages system resources, power, and boot processes in Zynq UltraScale+ MPSoC.
How does the PMU bootloader work in Zynq UltraScale+ MPSoC?
- The PMU bootloader initializes the system, loads the boot image, and hands off control to the main application.
What is the significance of the ELF file in rproc_boot?
- The ELF (Executable and Linkable Format) file contains the executable code and data for remote processors in rproc_boot.
How does the rproc_boot function handle resource tables?
- The rproc_boot function parses resource tables to configure memory, peripherals, and communication channels for remote processors.
What is the role of virtio in inter-processor communication?
- Virtio provides a standardized interface for efficient inter-processor communication and data transfer.
How does rpmsg handle message passing in openAMP?
- Rpmsg (remote processor messaging) facilitates message passing between processors using a virtual communication channel.
**What are the components of an rpmsg endpoint structure

?** - Components include the destination address, source address, payload buffer, and communication channel configuration.

How does the scheduling of tasks occur in a GPU architecture?
- Task scheduling in GPU architecture involves distributing tasks among multiple GPU cores based on workload and resource availability.
What is the significance of a command queue in OpenCL?
- A command queue in OpenCL manages the execution order of commands, enabling asynchronous execution and resource management.

These answers provide a comprehensive understanding of the questions related to multicore and parallel processing, as discussed in the document.