FPGAs for embedded designers: what they are and why you should use them (iCE40UP5K)

Table of Contents

Introduction

An FPGA (Field Programmable Gate Array) is a reprogrammable electronic chip, an integrated circuit that can be configured “in the field” by the user to perform a wide range of logic functions, unlike fixed chips (ASICs). It contains an array of programmable logic blocks and interconnects that allow the creation of custom digital circuits for applications requiring high speed, low latency, and flexibility, such as AI, telecommunications, and automotive.

It consists of:

Configurable Logic Blocks (CLBs): these are small logic units that can be set to perform specific operations (such as AND, OR gates, etc.);
programmable interconnects: a “network” of configurable paths that connects CLBs to each other and to external devices, allowing you to define the circuit structure;
HDL languages: programming is done using hardware description languages such as VHDL or Verilog, which define how blocks and interconnections should be connected.

The following image shows an example of a CLB:

Key differences with ASICs and CPUs:

flexibility: it can be reconfigured multiple times for different purposes, whereas an ASIC is fixed after production.
performance: it offers efficient, customised parallel processing, often outperforming CPUs in specific tasks such as AI.
cost/time: more flexible and faster for prototyping than ASICs, which require high redesign costs.

Main applications:

Artificial Intelligence (AI): acceleration of deep learning models, real-time data processing;
automotive sector: safety systems, battery management and sensors in electric vehicles;
telecommunications: network infrastructure and signal processing;
prototyping: rapid development of new hardware projects.

FPGAs are therefore reconfigurable logic devices: instead of a CPU executing firmware instruction by instruction, there is a matrix of hardware resources (logic, registers, memory, routing) that can be virtually rewired to become a circuit that performs specific functions. This point is the real paradigm shift: when you write a C program for a microcontroller, you are defining a sequence of operations that the CPU will execute over time; when you write Verilog for an FPGA, you are describing a hardware structure that, after synthesis and place & route, is “materialized” inside the chip and operates in parallel, clock-driven, with a temporal predictability that is often a luxury in the MCU world (interrupts, jitter, contested buses, RTOS, cache, shared peripherals). For an embedded designer, the FPGA is therefore not a “faster micro,” but a different way of solving problems: you can create custom peripherals, signal processing pipelines, hard real-time control logic, or high-speed interfaces without having to force the micro’s architecture to do things it wasn’t designed for. A very important strength is that, today, this doesn’t necessarily require proprietary tools, licenses, or workstations: there are families like the iCE40 (and in particular the iCE40UP5K) that lend themselves well to an educational and practical path with a completely open-source toolchain, ideal for building “marketable” skills and reproducible projects.

For an embedded designer, looking at FPGAs only makes sense if one fundamental point is immediately clarified: an FPGA is not a more powerful microcontroller, nor an alternative CPU. It is a different tool, designed to solve classes of problems that quickly become complex, fragile, or inefficient with an MCU. The value of an FPGA lies in its true parallelism, deterministic latency, and the ability to implement completely custom logic and peripherals. In an FPGA, multiple operations can occur in the same clock cycle, without preemption, interrupts, or resource contention, because there is no single “execution flow” to schedule. This opens up very concrete practical scenarios: for example, generating an audio signal via PWM or sigma-delta, where edge timing is critical; implementing a basic FIR filter for processing audio or sensor signals, using hardware DSP blocks; creating a digital acquisition system with precise triggers and circular buffers; or the creation of a non-standard SPI or parallel interface, precisely tailored to the external device to be controlled. In all these cases, an FPGA allows you to shift complexity from software to hardware, resulting in systems that are more predictable, more responsive, and often easier to validate than solutions based solely on microcontrollers and firmware.

Another important aspect is timing control. In many embedded systems, the problem isn’t “doing more calculations,” but knowing exactly when an event occurs: clock edges, time windows, constant latencies, signal synchronization. On an FPGA, timing behavior is an integral part of the design, because the resulting circuit is synchronous by construction, and timing is analyzed and validated during the synthesis and place-and-route phases. This makes FPGAs particularly suited to hard real-time contexts, non-standard protocols, or streaming processing where each sample must be processed with a known and constant latency. Conversely, an FPGA isn’t the right choice for everything: when complex communication stacks, file systems, networking, or complex decision-making logic are required, a microcontroller or SoC remains more efficient. This is precisely why, in many modern systems, FPGAs and MCUs coexist: the microcontroller manages the “high-level control,” while the FPGA handles the timing-critical or highly parallelized parts. Understanding this complementarity is one of the main objectives of this introductory course.

MCU vs FPGA: the fundamental differences

The difference between a microcontroller and an FPGA is not a question of performance, but of computational model. In an MCU-based system, the software describes a sequence of instructions that a CPU executes over time: even when using interrupts, DMA, or an RTOS, there is always a central control flow that arbitrates the use of resources and inevitably introduces temporal variability. In an FPGA, however, there is no “program” that is executed: the HDL code (i.e., the hardware description language) describes a hardware structure composed of combinatorial logic and registers (typically implemented via LUTs and flip-flops) connected by a routing network, synchronized by one or more clocks. After synthesis and place & route, this description is transformed into a physical circuit that operates in parallel by construction, not by software simulation of parallelism. When we say that “everything happens in parallel,” it’s not a figure of speech, but a direct consequence of the architecture: two blocks described in Verilog are not scheduled, but coexist as independent parts of the same circuit. LUTs implement arbitrary logic functions, flip-flops store the synchronous state and determine the system’s evolution at each clock edge; time is no longer a shared resource to be managed, but an explicit design dimension, analyzed and verified through timing constraints. This explains why concepts like deterministic latency, constant throughput, and hardware pipelines are natural in the FPGA world and much more difficult to guarantee in a purely software system. Understanding this difference is the crucial step for those coming from traditional embedded systems: it’s not about “learning a new language,” but about adopting a different way of thinking about the design, in which the code describes the hardware and the timing behavior is an integral part of the specification.

The main difficulty for those coming from the world of microcontrollers is that HDL superficially resembles a programming language, but its meaning is radically different. In an MCU, time is implicit: the programmer writes instructions and the processor executes them one after the other, leaving the internal architecture (pipeline, cache, interrupts, DMA) to manage the “when.” Even when introducing concurrent constructs, such as RTOS tasks or nested interrupts, parallelism remains a form of temporal multiplexing on a single execution unit or a few shared resources. In an FPGA, time is explicit and structural: each register represents a state that evolves with each clock edge, and each combinational network represents a transformation that occurs between one synchronous event and the next. Writing Verilog isn’t about saying “do this first, then that,” but about defining which signals exist, how they are connected, and how they react to clock events. If two blocks are described in the same module, both exist simultaneously in the silicon and operate in parallel, regardless of whether the designer perceives them as “before” or “after” in the source code. This distinction has important practical consequences: there is no program counter, no call stack, no global control flow. Instead, there are data paths, latencies defined by the number of registers traversed, and timing constraints that must be physically respected. The behavior of the system is not determined by the order of instructions, but by the circuit topology and the synchronization of its elements. This is why concepts such as pipelines, throughput, and latency are fundamental in FPGAs from the very beginning, while in the MCU world they emerge only in advanced contexts. Clarifying this point is essential: an FPGA does not “execute code,” but rather creates hardware, and the HDL code is simply the means to describe it in an abstract and synthesizable form.

How is an FPGA made internally?

To understand what makes an FPGA different from a microcontroller, it’s necessary to delve, at least conceptually, into its fundamental components. An FPGA is composed of a large number of elementary logic blocks, interconnected by a programmable routing network. The basic building block is the LUT (Look-Up Table), which can be viewed as a small memory capable of implementing any combinational logic function of a certain number of inputs (essentially, it implements a logic function that could be implemented using basic logic gates). Coupled with the LUT is typically a flip-flop, which allows a synchronous state to be stored and sequential logic to be implemented.

The following image shows the main logic gates with their truth tables:

The following image shows the D-type flip-flop with its truth table and internal circuitry:

The distinction between combinatorial logic and sequential logic is central: the former produces an output that instantaneously depends on the inputs, while the latter introduces memory and therefore dependence on time and the clock. In an FPGA, the behavior of a digital system arises precisely from the combination of these two elements, organized into more or less complex networks. Alongside the basic logic, modern FPGAs integrate internal memories (Block RAM or, in the case of the iCE40UP5K, dedicated SRAM) that can be used as buffers, FIFOs, or temporary storage, and specialized DSP blocks for efficient arithmetic operations, such as multiplications and accumulations, which are essential for digital signal processing. These resources are not “peripherals” in the traditional sense, but structural parts of the circuit that the designer can freely combine to build calculation pipelines, filters, controllers, or custom interfaces.

The following image shows the block diagram of a generic sequential logic:

The transition from abstract description to actual hardware occurs through two key phases: synthesis and place & route. Synthesis analyzes the HDL code and transforms it into a logical netlist, a representation of the circuit in terms of LUTs, registers, RAM, and logical connections. Place & route takes this netlist and physically maps it onto the FPGA’s resources, deciding where to place each block and how to route signals while respecting temporal and physical constraints. It is at this stage that the design becomes “real”: propagation delays, critical paths, and timing are no longer theoretical concepts, but measurable parameters that determine whether the circuit will function correctly at the desired frequency. This flow highlights another substantial difference compared to software: temporal behavior is not a side effect of execution, but a direct result of architectural choices.

Hardware description languages, such as VHDL and Verilog, are the tools through which this structure is expressed. Although their syntax may resemble that of programming languages, their purpose is not to describe an algorithm, but a circuit. In VHDL and Verilog, for example, it is possible to describe a simple combinational network connecting inputs and outputs, or a sequential logic that updates registers on each clock edge. An instruction that in C would represent an operation to be executed sequentially, in HDL instead represents a permanent relationship between signals. This semantic difference is fundamental: the HDL code is not “executed,” but interpreted by the synthesis tool to build the corresponding hardware. Understanding this allows you to avoid one of the most common mistakes made by beginners: attempting to write HDL as if it were software, thus losing sight of the structural and temporal nature of the project.

LUT: what it is and what it implements

A LUT (Look-Up Table) implements a combinatorial logic function: given N inputs, it produces one or more outputs according to a defined Boolean relationship. Conceptually, a LUT can be seen as the equivalent of a network of elementary logic gates (AND, OR, NAND, NOT, XOR, etc.) represented by a single configurable block. Instead of physically building a network of gates for each function, the FPGA uses a small internal memory that, for each possible combination of inputs, returns the value of the output. From a functional point of view, there is no difference between a LUT configured to implement, for example, an XOR function or a combinatorial sum, and a hand-designed network of logic gates: what changes is the way this function is physically implemented within the chip. This abstraction allows the synthesis tool to automatically map even complex logic expressions to the available resources, maintaining a conceptual model for the designer close to classical digital logic.

The following image shows a simple example of a logical function represented using a LUT:

Flip-flop: why it is a memory element

The flip-flop is the element that introduces the notion of state to a digital circuit. Unlike combinational logic, which reacts instantaneously to inputs, a flip-flop stores a value and holds it stable until a synchronization event, typically the edge of a clock. In an FPGA, flip-flops are coupled to LUTs and are the standard way to implement registers, counters, finite-state machines, and pipelines. Conceptually, a flip-flop can be viewed as a one-bit memory cell, whose contents are updated only at well-defined instants. This is why sequential logic is intrinsically time-bound: the circuit’s behavior depends not only on the current inputs, but also on the values stored in the flip-flops at the previous clock cycle. Without flip-flops, concepts like “state,” “memory,” or “time evolution” would not exist within an FPGA.

Combinatorial logic vs. sequential logic: the fundamental distinction

The distinction between combinatorial and sequential logic is a cornerstone of digital design. Combinational logic consists exclusively of functions that connect inputs and outputs without any memory: the output is always a direct and immediate function of the current inputs. A circuit constructed only with elementary logic gates falls into this category. Sequential logic, on the other hand, combines combinatorial logic and memory elements (flip-flops), allowing the circuit to “remember” the past. In this case, the output depends on both the inputs and the internal state of the system. In an FPGA, both types coexist: combinatorial logic performs signal transformations between one clock cycle and the next, while sequential logic measures time and preserves information. This conceptual separation is essential to understanding how pipelines, counters, digital filters, and state machines work, and represents one of the greatest differences from traditional programming, where state and time are often implicit and managed by the software’s execution flow.

Internal organization of the FPGA

The elements described so far (LUTs, flip-flops, memories, and DSP blocks) do not exist in isolation within the FPGA, but are organized in a regular matrix structure, composed of a large number of interconnected elementary logic blocks. Each logic block typically contains one or more LUTs, associated registers, and auxiliary resources, and is designed to be independently configurable. Programming an FPGA therefore consists not only of “setting” the internal behavior of each block (for example, defining which logic function a LUT should implement or when a flip-flop should sample a signal), but also of defining how these blocks should be connected to each other. The programmable interconnection network allows signals to be routed from one block to another, creating processing chains, hierarchical structures, and arbitrary data paths. It is precisely this combination of local configuration and global routing that allows complex functions to be built from simple elements: a counter, a state machine, a processing pipeline, or a digital filter are not primitive entities, but the result of the coordinated aggregation of many logic blocks distributed across the matrix. In this sense, an FPGA can be viewed as a large sheet of reconfigurable digital logic, where the designer (assisted by synthesis tools) decides both the behavior of the individual building blocks and how they are connected to create a higher-order circuit. This architecture explains why concepts such as circuit topology, signal path length, and register organization have a direct impact on the behavior and performance of the final system.

The following image shows the internal architecture of an FPGA:

Place & route: physical location and real-world connections

Once it’s clear that an FPGA is a matrix of interconnected elementary logic blocks, it’s natural to wonder how an abstract description in HDL is transformed into a functioning physical circuit. This step occurs through a sequence of processes commonly referred to as synthesis, mapping, place and route, which together represent much more than a simple “compilation.” After logic synthesis, which translates the HDL code into an abstract netlist of logic functions, registers, and memories, the design must be mapped to the resources actually available in the FPGA. Mapping consists of adapting the described logic functions to a finite set of real-world elements: LUTs of a certain size, physically present flip-flops, RAM blocks, and DSPs with specific characteristics. Important architectural decisions are made at this stage, such as decomposing complex functions into multiple LUTs or using dedicated resources instead of generic logic.

The next step is place, or the physical positioning of the mapped blocks within the FPGA matrix. Each LUT, flip-flop, or RAM block is assigned to a specific position on the chip. This choice is not arbitrary: the physical distance between blocks affects signal propagation delays and therefore the respect of timing constraints. Poor placement can make it impossible to reach a certain clock frequency, even if the logic itself is correct. For this reason, placement is a complex optimization problem, in which the tool tries to balance compactness, parallelism, and timing constraints.

After positioning, routing comes into play, the phase in which physical connections between the blocks are established. Inside the FPGA, there is a dense network of programmable interconnections. Routing consists of selecting, for each signal, a path through this network that connects the output of one block to the input of another. Conceptually, these connections can be thought of as reconfigurable switches: each routing node can be enabled or disabled via configuration bits stored in the FPGA. In modern architectures, such as the iCE40, these are not physical fuses that blow once and for all, but memory cells (typically SRAM) that control internal connectivity and can be reprogrammed each time the bitstream is loaded. This is why an FPGA can be reconfigured infinitely, unlike older anti-fuse technologies.

The final result of the place and route is a completely concrete description of the circuit: each logical function is assigned to a specific LUT, each register to a physical flip-flop, and each signal travels through a defined set of interconnections. At this point, the design has effectively become a real digital circuit, with well-defined delays, critical paths, and physical constraints. The bitstream generated by the tools is nothing more than the set of data needed to configure these behaviors: LUT contents, initial flip-flop states, RAM and DSP block configurations, and, most importantly, internal routing node settings. Understanding this flow helps explain why, in the FPGA world, concepts like timing closure, critical paths, and clock constraints are not low-level details, but central aspects of the design. The system’s temporal behavior doesn’t emerge “at runtime,” but is the direct result of mapping, placement, and connection choices made before the circuit is even powered up.

Synthesis constraints for timing or area

Once a design has been mapped, positioned, and routed, one of the most delicate and characteristic aspects of FPGA design comes into play: timing. Unlike software, where execution speed is often an indirect consequence of the architecture and system load, in an FPGA, timing behavior is a structural property of the resulting circuit. Each signal traversing the combinatorial logic and internal interconnections takes a certain propagation time, determined both by the logical complexity (number and type of LUTs traversed) and the physical distance traveled within the chip. When multiple blocks are chained between two synchronous memory elements, the total propagation time along that path becomes a critical parameter for the correct operation of the system.

In a classic synchronous design, the maximum clock period is constrained by the slowest path between two consecutive flip-flops, known as the critical path. This path includes intermediate combinational logic, physical routing, and the setup and hold times of the destination registers. If the signal does not arrive stabilized by the next clock edge, the circuit ceases to function correctly, regardless of whether it “works” logically. This is why the maximum frequency achievable by an FPGA is not an abstract property of the chip, but directly depends on the design structure and how it is mapped and routed. In other words, the system speed is determined by the slowest path, not the average or typical path.

Historically, as mentioned, synthesis tools allowed design optimization by prioritizing area occupied or propagation time. This concept is still valid, although today it is expressed more formally through timing constraints. The designer can specify the desired clock frequency or the timing requirements between signals, and the tools attempt to satisfy them by choosing solutions that balance area, performance, and power consumption. Optimizing for area means reducing the number of resources used, often accepting longer logic paths; optimizing for speed, on the other hand, involves duplicating logic, breaking paths with intermediate registers (pipelining), or using dedicated resources, increasing area but reducing critical latency. These choices are not absolutely automatic, but depend on the project objectives and the stated constraints.

Modern place and route tools perform a static timing analysis (STA), which evaluates all relevant paths in the circuit without needing to simulate operation over time. At the end of the process, the designer receives detailed reports indicating whether the constraints are met, which are the critical paths, and how much slack they allow. These reports are not an afterthought, but a fundamental design tool: they highlight where the circuit is “slow” and guide any necessary architectural changes, such as introducing pipelines, reorganizing logic, or modifying the connection topology. Understanding timing and critical paths therefore means understanding why an FPGA can operate perfectly at a certain frequency and fail completely a few megahertz higher, even without changing a single line of HDL code.

HDL (VHDL/Verilog) as a tool for describing synchronous structures and pipelines

Hardware Description Languages (HDLs) are tools used to describe the structure and temporal behavior of a digital circuit, not to express a sequence of instructions to be executed. This distinction is crucial: an HDL is not a programming language in the classical sense of the term. There is no execution flow, no program counter, and no implicit temporal order tied to the position of instructions in the code. An HDL is used to specify which hardware components exist, how they are connected to each other, and how they react to clock events. Synthesis tools interpret the HDL code as a structural and temporal description, from which a physical circuit made up of combinatorial logic, registers, memories, and interconnections is derived. Thinking of an HDL as a “hardware programming language” is therefore misleading: it is more accurate to think of it as a specification language for synchronous circuits.

The two most widely used HDL languages are VHDL and Verilog, both standardized and widely supported. VHDL originated in the aerospace and military fields and is highly typed, very verbose, and geared toward rigorous behavior descriptions. This rigidity makes it suitable for large industrial projects and contexts where formal clarity and verifiability are priorities. Verilog, which originated in a more academic and commercial industrial context, has a more compact syntax and semantics that, while still hardware-centric, are often more accessible to those coming from the C-like world. The differences between the two lie not so much in expressive capabilities, both can describe the same class of circuits, as in their style and the way they guide the designer in thinking about hardware. In both cases, the fundamental concept remains unchanged: the code describes structures that exist simultaneously, not operations that occur one after the other.

A simple example helps clarify the concept. Consider a basic circuit composed of an input, a register, and an output: at each clock edge, the value of the input is stored and made available at the output. In hardware terms, this is simply a flip-flop. In VHDL, this behavior is described as a clock-sensitive process, which assigns the input value to the register only at a well-defined time event. In Verilog, the same circuit is expressed using an always clock-sensitive block. In both cases, we’re not saying “first read the input, then write the output,” but rather declaring the existence of a synchronous register that updates its state at each clock edge. The syntax may vary, but the semantics are identical: the result is not an algorithm, but rather a hardware component that will continue to run in parallel with the rest of the circuit.

This approach makes concepts like pipelines and structural parallelism natural. Adding a pipeline stage doesn’t mean adding an extra instruction, but rather introducing a new register that breaks up a combinatorial path, reducing propagation time and increasing the maximum achievable frequency. Similarly, describing two blocks in parallel doesn’t require special constructs: it’s enough to define both, so that they exist and operate simultaneously in the FPGA. Understanding the role of HDLs as structural description tools is therefore essential to approaching FPGA design with the right mental model. Only then does it become clear why writing HDL “as if it were C” leads to incorrect or inefficient results, and why true expertise lies not in the language’s syntax, but in the ability to design synchronous digital architectures consistent with the device’s resources and physical constraints.

Hardware description levels in HDL

In HDL, the same circuit can be described at different levels of abstraction. Historically (and conceptually), three levels can be distinguished, although in everyday practice, two are predominantly used.

Behavioral / algorithmic description (behavioral)

It’s the highest level.

This describes what the circuit is supposed to do, not how it is built internally.
The designer specifies the functional behavior, leaving the synthesizer to decide what hardware to use to implement it.

Conceptual examples:

“The output is the AND of two inputs.”
“Add two numbers and record the result at each clock cycle.”
“If a condition is true, update a register.”

At this level:

there’s no mention of LUTs, flip-flops, or gates.
the physical structure isn’t specified.
we work with logical and temporal relationships.

The synthesizer:

analyzes behavior
translates it into combinatorial logic + registers
decides how to map everything to the available resources

This is the closest level to “I describe the functionality and the tool decides how to implement it”.

RTL description (Register Transfer Level)

It is the central level and is the most important in practice.

Here we do not go down to “elementary gates”, but we describe:

how data moves between registers
what combinational logic exists between one register and another
when (at what clock) do updates occur?

RTL literally means:

data transfer between registers under clock control

At this level:

we think in terms of registers, pipelines, and FSMs.
the combinatorial/sequential distinction is explicit.
the timing behavior is clear and controllable.

It’s the balance point:

abstract enough to be productive
concrete enough to control timing, area, and architecture

Structural description / gate-level

It is the lowest level.

Here the hardware is described explicitly as a network of elementary components:

logic gates
multiplexers
individually instantiated flip-flops
explicit connections

It’s the level you’re referring to when you say:

“I describe exactly how the AND gate works at a low level”

Historically it was important:

before modern synthesizers
for standard ASIC cells
for post-synthesis simulation models

Nowadays:

it is rarely written by hand.
it is automatically generated by tools (netlist).
it is used primarily for verification or debugging.

The important point, especially from a conceptual point of view, is that:

behavioral → RTL → structural
they are not different languages, but different levels of interpretation of the same HDL

by writing behavioral HDL, you ask the tool to derive RTL and structure;
by writing RTL, you directly control the resulting architecture;
the structural description is the final result of the synthesis process.

In the case of the AND gate:

at the behavioral level: “output = input 1 AND input 2”
at the RTL level: a combinatorial network between registers (if present)
at the structural level: a LUT configured as AND + internal routing

All three descriptions lead to the same hardware, but with different degrees of control.

For an embedded designer coming from software, the important message is this: in HDL you don’t choose whether to describe behavior or structure, you choose how much control you want to exert on the final hardware:

the higher the level → the more freedom the tool offers.
the lower the level → the more control, but the more complexity.

And true skill is not “knowing the syntax”, but:

know when to stay at the behavioral level
when to go down to RTL
and when there’s no need to go any lower

This ties in perfectly with:

timing
area vs. performance
pipeline
DSP on FPGA

In essence, an HDL allows you to describe the same circuit at different levels of abstraction.

Why iCE40UP5K + open toolchain is a smart choice

To put the concepts introduced so far into practice, it’s useful to refer to a real, accessible, and technically significant platform. This series will use the iCESugar v1.5 board, based on the Lattice iCE40UP5K FPGA, a device that represents a good compromise between simplicity, available resources, and an open ecosystem. The iCE40UP5K belongs to the UltraPlus family and integrates several thousand LUTs and flip-flops, internal memories that can be used as buffers or FIFOs, and a set of DSP blocks dedicated to arithmetic operations, making it possible to implement processing pipelines, digital filters, and synchronous control logic without resorting to external solutions. All the theoretical elements described in the previous paragraphs (combinational logic, sequential logic, registers, programmable routing, timing, and critical paths) find direct reflection in this FPGA: the LUTs implement the Boolean functions, the flip-flops measure time and store the state, the DSP blocks accelerate the mathematical operations, and the internal interconnection network allows these elements to be connected in more complex architectures.

Architecturally, the iCE40UP5K is small enough to be understandable and large enough not to be limiting for realistic educational projects. It is a device designed for low-power embedded applications, where timing control, parallelization, and custom logic integration are more important than general-purpose computing power. This makes it particularly suitable as a first serious introduction to FPGA design: the resources are limited enough to force architectural and optimization thinking, but rich enough to allow for meaningful examples such as audio PWM generators, basic FIR filters, state machines, and custom communication interfaces.

The iCESugar board completes the picture by providing a practical, immediately usable framework. In addition to the FPGA, it integrates a USB programming and debugging system (iCELink), onboard LEDs and switches useful for initial experiments, and expansion connectors that allow for external signaling and the connection of additional hardware. This allows you to focus on the conceptual aspects of the design—circuit structure, timing, logic organization—without having to immediately worry about supporting circuitry or external programmers. In other words, the iCESugar allows you to directly observe how an HDL description is synthesized, mapped, and transformed into a functioning physical circuit, closing the loop between digital design theory and practice on real hardware.

Alongside the FPGA itself, the iCESugar v1.5 board integrates a series of components designed to make project experimentation and validation immediately accessible. The first significant element is the USB programming and communication system, based on a dedicated microcontroller (iCELink), which exposes a CMSIS-DAP interface, a virtual serial port, and a mass storage device to the host system. This allows the FPGA to be programmed, monitored, and exchanged data without the need for external programmers or complex wiring. Conceptually, this component plays a similar role to a debugger or bootloader in the world of microcontrollers, but is completely separate from the logic implemented in the FPGA, which remains entirely under the designer’s control.

The following image shows the iCESugar v1.5 board:

The board also provides onboard input and output devices, such as LEDs and switches, which allow you to quickly verify a circuit’s behavior without additional hardware. These seemingly trivial elements are extremely useful in the initial stages: they allow you to directly observe the effect of an HDL description, test the correct functioning of the clock and sequential logic, and validate the initial interactions between combinatorial logic and internal state. In this sense, the board becomes a learning tool as well as a development tool, because it makes otherwise abstract concepts visible, such as the change of state on each clock edge or the immediate response of combinatorial logic to inputs.

For external expansion, the iCESugar exposes numerous I/O pins through PMOD-compatible connectors and generic headers. These signals are connected directly to the FPGA pins and allow interfacing with external devices such as sensors, converters, displays, or communication modules. From a design perspective, this means you’re not limited to “closed” examples on the board, but can progressively extend the system while maintaining full control over the architecture and interface timing. The presence of power supplies already available on the connectors further simplifies initial experimentation, reducing the need for auxiliary circuitry.

Overall, the iCESugar can be viewed as a compact reference platform: simple enough to not obscure the internal workings of the FPGA, yet comprehensive enough to allow for realistic designs. All the elements described (programming, basic I/O, expansion) serve a specific purpose: to allow observation and understanding of the path from an HDL description to verifiable hardware behavior, without introducing unnecessary complexity in the early learning stages.

The following image shows the pinout and ports of the iCESugar v1.5 board:

The toolchain

Transforming an HDL description into a functioning circuit on an FPGA requires a set of software tools commonly referred to as a toolchain. Unlike the world of microcontrollers, where a monolithic integrated environment often exists, in the case of FPGAs the toolchain more explicitly reflects the different phases of the design flow: code analysis, logic synthesis, mapping to physical resources, placement, routing, and generation of the final bitstream. In this series, we will use a completely open-source toolchain, composed of separate but well-integrated tools, each responsible for a specific part of the process. This choice makes the intermediate steps visible and allows us to understand what happens between writing the HDL code and configuring the hardware.

The first element of the flow is Yosys, which handles logic synthesis. Yosys analyzes the HDL code and translates it into an internal representation of the circuit, eliminating non-synthesizable constructs and transforming the behavioral or RTL descriptions into a netlist of combinatorial logic and registers. It is at this stage that the HDL language definitively loses all “software” ambiguity and is traced back to concrete hardware elements. Yosys does not decide where these elements will be physically located, but rather establishes what exists in the circuit: how many logic functions, how many registers, which memories, and what abstract connections between them.

The next step is handled by nextpnr, which handles the place&route phases. Starting from the netlist generated by synthesis, nextpnr maps the circuit to the FPGA’s actual resources, decides where to place each logic block, and how to route signals through the internal interconnection network. This is where timing, critical paths, and clock constraints come into play: nextpnr verifies whether the resulting architecture can meet timing requirements and provides detailed reports on the circuit’s behavior. This tool makes explicit an aspect often hidden in proprietary tools: the direct link between design structure and achievable timing performance.

Once the place and route is complete, the circuit must be transformed into a form that the FPGA can load. This task is entrusted to the IceStorm suite tools, which generate the bitstream, which is the file containing all the information needed to configure the FPGA: LUT contents, flip-flop configuration, RAM and DSP block settings, and the status of programmable interconnects. The bitstream is not an executable program, but a complete snapshot of the desired hardware configuration. Loading it into the FPGA essentially means “rewiring” the chip according to the described design.

Alongside the synthesis and implementation workflow, a modern toolchain also includes simulation tools, which are essential for verifying the circuit’s behavior before transferring it to real hardware. Simulation allows you to observe internal signals, verify logical correctness, and analyze timing behavior without relying on the physical hardware. This step is particularly important in the FPGA world because it allows you to isolate conceptual or architectural errors before addressing timing or integration issues.

The most interesting aspect of this open-source toolchain is its consistency with the conceptual model introduced so far. Each tool corresponds to a specific phase of the design flow, and no step is hidden or opaquely automated. This makes the process more transparent and, paradoxically, easier to understand: instead of relying on a closed environment that produces a final result, the designer can observe how an HDL description is progressively transformed into real hardware. It is precisely this transparency that makes the combination of the iCE40UP5K and the open-source toolchain particularly suitable for a serious introductory path, aimed not just at “making something work,” but at understanding why and how it works.

Prerequisites and approach

What has been described so far makes it clear that FPGA design requires a different approach than traditional embedded development. Advanced digital electronics skills aren’t required, but a hardware-oriented mindset is essential: concepts like clock, reset, synchronization, latency, and parallelism must be considered an integral part of the project, not implementation details. Similarly, some familiarity with the Linux environment and the terminal is required, as the FPGA development workflow relies primarily on command-line tools that produce text output and technical reports requiring interpretation. This series isn’t intended to replicate a classic Arduino-style “blink,” but to guide advanced makers and embedded designers toward a concrete and usable understanding of FPGAs, starting with simple but conceptually sound examples.

Another fundamental prerequisite is the availability of a consistent and reproducible Linux development environment. The choice of using Ubuntu (or Kubuntu) as the reference platform is not dictated by personal preference, but by practical considerations: the open-source FPGA toolchain is natively supported on Linux and allows for stable and predictable work. Attempting to replicate the same flow on other operating systems introduces unnecessary complexity and compatibility issues that distract from the truly important aspects of the design. For this reason, in the following articles we will refer to an Ubuntu 24.04 LTS environment, usable both natively and within a virtual machine. This solution allows for a uniform working context and the ability to follow the path without depending on the specific configuration of the host system. If you don’t have a native Linux environment, you can install Ubuntu/Kubuntu on a VirtualBox virtual machine by following the instructions in the article How to Install Ubuntu in a Virtual Machine with VirtualBox on Windows and Linux

In the next article, we’ll move from theory to practice: we’ll show you how to set up the development environment on Ubuntu, install the open-source toolchain, and verify the correct functioning of the iCESugar board with a first minimal project. The goal won’t be just to “blink an LED,” but to begin to concretely observe the path from an HDL description to real hardware behavior, closing the loop between the concepts introduced and their practical application.

If you want to be informed about the release of new articles, subscribe to the newsletter. Before subscribing to the newsletter read the page Privacy Policy (UE)

If you want to unsubscribe from the newsletter, click on the link that you will find in the newsletter email.

🔗 Follow us on our social channels so you don’t miss any updates!

📢 Join our Telegram channel to receive real-time updates.

🐦 Follow us on Twitter to always stay informed about our news.

Thank you for being part of our TechRM community! 🚀