C-Class Core Generator
latest
  • Introduction
    • What is C-Class
    • Why Bluespec
    • License
    • Commercial Adoption
  • Quick Start
    • Install Python Dependencies
    • Install DTC (device tree compiler)
    • Building the Core
    • Run Smoke Tests
  • Configure the Core
    • ISA Level Configurations
    • Micro-Architectural Configuration hooks
  • Test SoC
    • Structure of SoC
    • Address Map of SoC
    • BootRom Content
    • Synthesis of Core
  • Simulating the Core
    • Generate Verilated Executable
    • Executing Programs
    • Support for PutChar
    • Simulation Arguments (Logger Utility)
    • Connect to GDB in Simulation
    • Dhrystone
    • Linux on C-Class
    • FreeRTOS on C-class
  • Benchmarking the Core
    • Benchmarking Dhrystone
    • Benchmarking CoreMarks
    • Why Compressed Binaries perform bad on C-class?
  • Micro-Arch Notes
    • Custom CSRs Available in C-Class
    • Performance Monitors
    • RAMS used in the C-Class
    • Physical Memory Protection (PMP)
  • For Developers
    • Directory Structure
    • Upgrading dependencies
    • Changing Compile arguments
    • Adding Checks on YAML
  • CHANGELOG
    • [2.0.0] - 2022-12-08
    • [1.10.0] - 2022-10-19
    • [1.9.9] - 2020-11-03
    • [1.9.8] - 2020-09-23
    • [1.9.7] - 2020-07-03
    • [1.9.6] - 2020-06-05
    • [1.9.5] - 2020-05-13
    • [1.9.4] - 2020-04-30
    • [1.9.3] - 2020-04-30
    • [1.9.2] - 2020-04-26
    • [1.9.1] - 2020-04-07
    • [1.9.0] - 2020-04-03
    • [1.8.0] - 2020-04-01
    • [1.7.3] - 2020-03-24
    • [1.7.2] - 2020-03-23
    • [1.7.1] - 2020-03-10
    • [1.7.0] - 2020-03-02
    • [1.6.1] - 2019-11-21
    • [1.6.0] - 2019-11-21
    • [1.5.0] - 2019-11-21
    • [1.4.2] - 2019-11-08
    • [1.4.1] - 2019-10-29
    • [1.4.0] - 2019-10-28
    • [1.3.6] - 2019-10-22
    • [1.3.5] - 2019-10-16
    • [1.3.4] - 2019-10-16
    • [1.3.3] - 2019-10-08
    • [1.3.2] - 2019-10-04
    • [1.3.1] - 2019-10-04
    • [1.3.0] - 2019-10-03
    • [1.2.5] - 2019-10-01
    • [1.2.4] - 2019-09-28
    • [1.2.3] - 2019-09-27
    • [1.2.2] - 2019-09-26
    • [1.2.1] - 2019-09-26
    • [1.2.0] - 2019-09-26
    • [1.1.1] - 2019-09-16
    • [1.1.0] - 2019-09-16
    • [1.0.3] - 2019-09-10
    • [1.0.2] - 2019-09-10
    • [1.0.1] - 2019-09-09
    • [1.0.0] - 2019-09-09
C-Class Core Generator
  • Docs »
  • C-Class Core Generator beta-2.0.0 documentation
  • Edit on GitLab

C-Class Core Generator Alternative text¶

This repository contains the open-source C-Class core generator. C-class belongs to the SHAKTI family of processors.

Table of Contents:¶

Introduction¶

pipeline

What is C-Class¶

C-Class is a member of the SHAKTI family of processors. It is an extremely configurable and commercial-grade 5-stage in-order core supporting the standard RV64GCSUN ISA extensions. The core generator in this repository is capable of configuring the core to generate a wide variety of design instances from the same high-level source code. The design instances can serve domains ranging from embedded systems, motor-control, IoT, storage, industrial applications all the way to low-cost high-performance linux based applications such as networking, gateways etc.

There have been multiple successful silicon prototypes of the different instances of the C-class thus proving its versatility. The extreme parameterization of the design in conjunction with using an HLS like Bluespec, it makes it easy to add new features and design points on a continual basis.

Why Bluespec¶

The entire core is implemented in Bluespec System Verilog (BSV), an open-source high-level hardware description language. Apart from guaranteeing synthesizable circuits, BSV also gives you a high-level abstraction, like going from assembly [level programming] to C. You don’t do the dirty work, the compiler does all the work for you. It enables users to work at a much higher level thereby increasing throughput.

The language is now supported by an open-source Bluespec compiler, which can generate synthesizable verilog compatible for FPGA and ASIC targets.

License¶

All of the source code available in this repository is under the BSD license. Please refer to LICENSE.* files for more details.

Commercial Adoption¶

The following industrial partners have adopted SHAKTI for commercialization purposes and provide continous support in maintaining and supporting this repository.

  1. InCore Semiconductors Pvt. Ltd.
  2. Silint Consulting Pvt. Ltd.

Quick Start¶

For this quick-start you will need the following tools. The user is requested to install these from the respective repositories/sources:

  • Bluespec Compiler : Make sure you are using the version post April 26 2020
  • Verilator
  • RISC-V GNU ToolChain
  • Modified RISC-V ISA Sim
  • RISC-V OpenOCD
  • DTC 1.4.7: see dtc
  • Python 3.7.0: see python

Warning

The following few sections are a quick copy-paste of the steps to install the above tools. However,it is possible that these steps are outdated as either the repository has shifted or the master of the respective repositories now have moved forward with new dependencies or installation procedures. We thereby suggest refering to the original repositories of the above tools to install them.

If you already have the above tools installed you can directly jump to building your core: build

Install Python Dependencies¶

The core generator requires pip and python (>=3.7) to be available on your system. If you have issues installing, either of these, directly on your system we suggest using a virtual environment like pyenv to make things easy.

First Install the required libraries/dependencies:

$ sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \
    libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \
    xz-utils tk-dev libffi-dev liblzma-dev python-openssl git

Next, install pyenv

$ curl -L https://raw.githubusercontent.com/yyuu/pyenv-installer/master/bin/pyenv-installer | bash

Add the following to your .bashrc with appropriate changes to username:

export PATH="/home/<username>/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

Open a new terminal and create a new python virtual environment:

$ pyenv install 3.7.0
$ pyenv virtualenv 3.7.0 myenv

Now you can activate this environment in any other terminal :

pyenv activate myenv
python --version

Install DTC (device tree compiler)¶

We use the DTC 1.4.7 to generate the device tree string in the boot-files. To install DTC follow the below commands:

sudo wget https://git.kernel.org/pub/scm/utils/dtc/dtc.git/snapshot/dtc-1.4.7.tar.gz
sudo tar -xvzf dtc-1.4.7.tar.gz
cd dtc-1.4.7/
sudo make NO_PYTHON=1 PREFIX=/usr/
sudo make install NO_PYTHON=1 PREFIX=/usr/

Building the Core¶

The code is hosted on Gitlab and can be checked out using the following command:

$ git clone https://gitlab.com/shaktiproject/cores/c-class.git

If you are cloning the c-class repo for the first time it would be best to install the dependencies first:

$ cd c-class/
$ pyenv activate venv # ignore this is you are not using pyenv
$ pip install -U -r c-class/requirements.txt

The C-class core generator takes a specific YAML format as input. It makes specific checks to validate if the user has entered valid data and none of the parameters conflict with each other. For e.g., mentioning the ‘D’ extension without the ‘F’ will get captured by the generator as an invalid spec. More information on the exact parameters and constraints on each field are discussed here.

Once the input YAML has been validated, the generator then clones all the dependent repositories which enable building a test-soc, simulating it and performing verification of the core. This is an alternative to maintaining the repositories as submodules, which typically pollutes the commit history with bump commits.

At the end, the generator outputs a single makefile.inc in the same folder that it was run, which contains definitions of paths where relevant bluespec files are present, bsc command with macro definitions, verilator simulation commands, etc.

A sample yaml input YAML (default.yaml) is available in the sample_config directory of the repository.

To build the core with a sample test-soc using the default config do the following:

$ python -m configure.main -ispec sample_config/default.yaml

The above step generates a makefile.inc file in the same folder and also clones other dependent repositories to build a test-soc and carry out verification. This should generate a log something similar to:

[INFO]    : ************ C-Class Core Generator ************
[INFO]    :            Available under BSD License


[INFO]    : [update] Cloning caches_mmu ...
              ...
              ...
              ...
[INFO]    : Loading input file: ..../sample_config/default.yaml
[INFO]    : Load Schema configure/schema.yaml
[INFO]    : Initiating Validation
[INFO]    : No Syntax errors in Input Yaml.
[INFO]    : Performing Specific Checks
[INFO]    : Generating BSC compile options
[INFO]    : makefile.inc generated

To compile the bluespec source and generate verilog:

$ make

This should generate the following folders:

  1. verilog: contains the verilofg files generated by bsc
  2. bsv_build: contains all the intermediate and information files generated by bsc
  3. bin: contains final verilated executable :out which is used for simulation along with some boot and application hex files.

Note

To leverage parallel builds you can do the following:

make -j<jobs> generate_verilog; make generate_boot_files link_verilator

Run Smoke Tests¶

You can run the individual riscv-tests on the generated verilog of the test-soc using the following:

$ make test opts='--test=add --suite=rv64ui ' CONFIG_ISA=RV64IMAFDC

You can run the entire riscv-tests suite in a regression using the following: :

$ make regress opts='--filter=rv64 --parallel=20 --sub' CONFIG_ISA=RV64IMAFDC
$ make regress opts='--filter=rv64 --final'

The last command, after some delay, should present the following output:

  recoding                                   rv64uf     v    PASSED
       slt                                   rv64ui     p    PASSED
      fadd                                   rv64uf     v    PASSED
       and                                   rv64ui     p    PASSED
    fcvt_w                                   rv64uf     v    PASSED
  amoadd_d                                   rv64ua     p    PASSED
     fmadd                                   rv64ud     p    PASSED
      ldst                                   rv64uf     v    PASSED
  amoand_d                                   rv64ua     p    PASSED
      fmin                                   rv64ud     p    PASSED
        lh                                   rv64ui     v    PASSED
 amomaxu_w                                   rv64ua     v    PASSED
  amoand_w                                   rv64ua     p    PASSED
  amoxor_d                                   rv64ua     v    PASSED
   fence_i                                   rv64ui     v    PASSED
       bne                                   rv64ui     p    PASSED
  amomin_d                                   rv64ua     v    PASSED
    fcvt_w                                   rv64uf     p    PASSED
      srli                                   rv64ui     p    PASSED
        sw                                   rv64ui     v    PASSED
 amomaxu_d                                   rv64ua     v    PASSED
      lrsc                                   rv64ua     v    PASSED
     fmadd                                   rv64ud     v    PASSED
       blt                                   rv64ui     v    PASSED
      fadd                                   rv64ud     p    PASSED
  recoding                                   rv64uf     p    PASSED
        sh                                   rv64ui     v    PASSED
       ori                                   rv64ui     p    PASSED
      fdiv                                   rv64uf     v    PASSED
   ma_addr                                   rv64mi     p    PASSED
  recoding                                   rv64ud     p    PASSED
       add                                   rv64ui     p    PASSED
       blt                                   rv64ui     p    PASSED
    fcvt_w                                   rv64ud     p    PASSED
      bltu                                   rv64ui     v    PASSED
       sll                                   rv64ui     v    PASSED
  ma_fetch                                   rv64mi     p    PASSED
       jal                                   rv64ui     p    PASSED
       lwu                                   rv64ui     p    PASSED
        sd                                   rv64ui     v    PASSED
       ori                                   rv64ui     v    PASSED
    access                                   rv64mi     p    PASSED
        sw                                   rv64ui     p    PASSED
       srl                                   rv64ui     p    PASSED
      fcvt                                   rv64ud     v    PASSED
     fmadd                                   rv64uf     v    PASSED
  amoxor_w                                   rv64ua     v    PASSED
        sb                                   rv64ui     v    PASSED
     slliw                                   rv64ui     p    PASSED
  amoadd_d                                   rv64ua     v    PASSED
      fdiv                                   rv64ud     p    PASSED
        lw                                   rv64ui     v    PASSED
      slti                                   rv64ui     p    PASSED
       add                                   rv64ui     v    PASSED
  amomax_d                                   rv64ua     v    PASSED
      move                                   rv64ud     v    PASSED
       lhu                                   rv64ui     v    PASSED
      andi                                   rv64ui     p    PASSED
     addiw                                   rv64ui     v    PASSED
 amoswap_d                                   rv64ua     v    PASSED
      fdiv                                   rv64ud     v    PASSED
       lui                                   rv64ui     p    PASSED
      ldst                                   rv64uf     p    PASSED
      fmin                                   rv64uf     v    PASSED
  amoxor_w                                   rv64ua     p    PASSED
      srai                                   rv64ui     p    PASSED
      addi                                   rv64ui     p    PASSED
      subw                                   rv64ui     p    PASSED
        sd                                   rv64ui     p    PASSED
  amoand_d                                   rv64ua     v    PASSED
       sra                                   rv64ui     p    PASSED
       rvc                                   rv64uc     v    PASSED
     scall                                   rv64mi     p    PASSED
       beq                                   rv64ui     p    PASSED
       rvc                                   rv64uc     p    PASSED
      fmin                                   rv64ud     v    PASSED
  amoadd_w                                   rv64ua     p    PASSED
     scall                                   rv64si     p    PASSED
      fcmp                                   rv64uf     p    PASSED
     srliw                                   rv64ui     p    PASSED
     addiw                                   rv64ui     p    PASSED
  amomax_w                                   rv64ua     p    PASSED
      andi                                   rv64ui     v    PASSED
      addi                                   rv64ui     v    PASSED
       lhu                                   rv64ui     p    PASSED
       xor                                   rv64ui     p    PASSED
   amoor_w                                   rv64ua     p    PASSED
       and                                   rv64ui     v    PASSED
       lbu                                   rv64ui     v    PASSED
     dirty                                   rv64si     p    PASSED
      ldst                                   rv64ud     v    PASSED
       bge                                   rv64ui     p    PASSED
   amoor_w                                   rv64ua     v    PASSED
        sh                                   rv64ui     p    PASSED
 amoswap_w                                   rv64ua     p    PASSED
  amoxor_d                                   rv64ua     p    PASSED
      fadd                                   rv64uf     p    PASSED
       sll                                   rv64ui     p    PASSED
  amoand_w                                   rv64ua     v    PASSED
  ma_fetch                                   rv64si     p    PASSED
     sraiw                                   rv64ui     p    PASSED
       csr                                   rv64si     p    PASSED
      ldst                                   rv64ud     p    PASSED
 amoswap_w                                   rv64ua     v    PASSED
      bltu                                   rv64ui     p    PASSED
        ld                                   rv64ui     v    PASSED
      fmin                                   rv64uf     p    PASSED
      slli                                   rv64ui     v    PASSED
      fadd                                   rv64ud     v    PASSED
      addw                                   rv64ui     v    PASSED
        lb                                   rv64ui     p    PASSED
 amominu_d                                   rv64ua     p    PASSED
    fcvt_w                                   rv64ud     v    PASSED
      move                                   rv64uf     p    PASSED
       bge                                   rv64ui     v    PASSED
        or                                   rv64ui     p    PASSED
      srlw                                   rv64ui     p    PASSED
      xori                                   rv64ui     p    PASSED
structural                                   rv64ud     v    PASSED
      sllw                                   rv64ui     p    PASSED
  amomax_d                                   rv64ua     p    PASSED
      fcvt                                   rv64uf     p    PASSED
   amoor_d                                   rv64ua     p    PASSED
 amomaxu_d                                   rv64ua     p    PASSED
      fdiv                                   rv64uf     p    PASSED
        sb                                   rv64ui     p    PASSED
       jal                                   rv64ui     v    PASSED
      addw                                   rv64ui     p    PASSED
 amomaxu_w                                   rv64ua     p    PASSED
     auipc                                   rv64ui     p    PASSED
       bne                                   rv64ui     v    PASSED
 amoswap_d                                   rv64ua     p    PASSED
        lw                                   rv64ui     p    PASSED
      bgeu                                   rv64ui     v    PASSED
  recoding                                   rv64ud     v    PASSED
    simple                                   rv64ui     p    PASSED
        or                                   rv64ui     v    PASSED
       lbu                                   rv64ui     p    PASSED
  amomax_w                                   rv64ua     v    PASSED
      move                                   rv64ud     p    PASSED
    fclass                                   rv64uf     p    PASSED
      jalr                                   rv64ui     p    PASSED
    fclass                                   rv64ud     v    PASSED
     sltiu                                   rv64ui     p    PASSED
      fcmp                                   rv64ud     p    PASSED
      sltu                                   rv64ui     p    PASSED
structural                                   rv64ud     p    PASSED
        lb                                   rv64ui     v    PASSED
      fcvt                                   rv64uf     v    PASSED
  amomin_d                                   rv64ua     p    PASSED
       sub                                   rv64ui     p    PASSED
       wfi                                   rv64si     p    PASSED
        ld                                   rv64ui     p    PASSED
   amoor_d                                   rv64ua     v    PASSED
      fcvt                                   rv64ud     p    PASSED
      lrsc                                   rv64ua     p    PASSED
    fclass                                   rv64uf     v    PASSED
    fclass                                   rv64ud     p    PASSED
      sraw                                   rv64ui     p    PASSED
  amomin_w                                   rv64ua     v    PASSED
      bgeu                                   rv64ui     p    PASSED
      move                                   rv64uf     v    PASSED
  amoadd_w                                   rv64ua     v    PASSED
   fence_i                                   rv64ui     p    PASSED
        lh                                   rv64ui     p    PASSED
       csr                                   rv64mi     p    PASSED
    simple                                   rv64ui     v    PASSED
       lui                                   rv64ui     v    PASSED
       lwu                                   rv64ui     v    PASSED
      fcmp                                   rv64ud     v    PASSED
       beq                                   rv64ui     v    PASSED
     auipc                                   rv64ui     v    PASSED
 amominu_w                                   rv64ua     p    PASSED
     fmadd                                   rv64uf     p    PASSED
 amominu_w                                   rv64ua     v    PASSED
  amomin_w                                   rv64ua     p    PASSED
      fcmp                                   rv64uf     v    PASSED
      jalr                                   rv64ui     v    PASSED
      slli                                   rv64ui     p    PASSED
 amominu_d                                   rv64ua     v    PASSED
       div                                   rv64um     p    PASSED
       mul                                   rv64um     p    PASSED
     remuw                                   rv64um     p    PASSED
      divw                                   rv64um     p    PASSED
      remw                                   rv64um     p    PASSED
     mulhu                                   rv64um     p    PASSED
      mulw                                   rv64um     p    PASSED
       rem                                   rv64um     p    PASSED
      remu                                   rv64um     p    PASSED
      mulh                                   rv64um     p    PASSED
     divuw                                   rv64um     p    PASSED
    mulhsu                                   rv64um     p    PASSED
      divu                                   rv64um     p    PASSED
      divu                                   rv64um     v    PASSED
     sltiu                                   rv64ui     v    PASSED
       xor                                   rv64ui     v    PASSED
      subw                                   rv64ui     v    PASSED
      mulw                                   rv64um     v    PASSED
      srli                                   rv64ui     v    PASSED
     slliw                                   rv64ui     v    PASSED
       div                                   rv64um     v    PASSED
       sub                                   rv64ui     v    PASSED
      srlw                                   rv64ui     v    PASSED
      sltu                                   rv64ui     v    PASSED
      xori                                   rv64ui     v    PASSED
      remw                                   rv64um     v    PASSED
       mul                                   rv64um     v    PASSED
       slt                                   rv64ui     v    PASSED
       sra                                   rv64ui     v    PASSED
      divw                                   rv64um     v    PASSED
      srai                                   rv64ui     v    PASSED
     mulhu                                   rv64um     v    PASSED
     remuw                                   rv64um     v    PASSED
       srl                                   rv64ui     v    PASSED
       rem                                   rv64um     v    PASSED
    mulhsu                                   rv64um     v    PASSED
      slti                                   rv64ui     v    PASSED
     srliw                                   rv64ui     v    PASSED
      remu                                   rv64um     v    PASSED
     divuw                                   rv64um     v    PASSED
      sllw                                   rv64ui     v    PASSED
      sraw                                   rv64ui     v    PASSED
      mulh                                   rv64um     v    PASSED
     sraiw                                   rv64ui     v    PASSED

Congratulations - You have built your very first C-Class core !! :)

Configure the Core¶

The C-class core is highly parameterized and configurable. By changing a single configuration the user can generate a core instance randing in size from embedded micro-controllers to Linux capable high-performance cores.

ISA Level Configurations¶

In RISC-V both, the Unprivileged and the Privileged specs both offer a great amount of choices to configure an implementation with. The Unprivileged spec offers various extensions and sub-extensions like Multiply-divide, Floating Point, Atomic, Compressed, etc which a user can choose to implement or not.

The Unprivileged Spec on the other hand provides a much more larger space of configurability to the user. Apart from choosing which privilege modes to implement (Machine, Hypervisor, Supervisor or User), the spec also provides a huge number of Control and Status Registers (CSRs) which impact various aspects of the RISC-V system. For example the MISA csr can be used to dynamically enable or disable execution of certain sub-extensions. Similarly, the valid and legal values of the satp.mode fields indicate what paging schemes are supported by the underlying implementation.

To capture all such possible choices of the RISC-V ISA in a single standard format, InCore has proposed the RISCV-CONFIG YAML format, which has also been adopted by the riscv-community, primarily for the ISA compatibility framework. The core generator uses the same YAML inputs to control various ISA level features of the core.

Generating CSRs¶

For implementing the CSR module, C-Class uses the CSR-BOX utility to automatically create a bsv module which implements all the necessary CSRs as per the input YAML specification provided in riscv-config format. An example of the isa YAML is provided in the sample_config directory. . CSR-BOX ensures the warl functions specified in the YAML are faithfullty replicated in bsv. Along with CSRs CSR-BOX also provides methods and logic to handle traps and xRET instructions based on the privileged modes (U, S, H) defined in the ISA node of the input yaml.

Note that the CSR-BOX allows one to split the CSRs into a daisy-chain like fashion to reduce the impact on timing when instantiating large number of CSRs. Thus, apart from the isa yaml, CSR-BOX also requires a grouping yaml file which indicates which daisy-chain unit should contain which set of CSRs.

CSR-BOX also takes in an optional debug spec yaml (as defined by riscv-config) to capture basic debug related information like where the parking loop code of the debug is placed in the memory map. Providing the debug spec, also indicates CSR-BOX to implement the necessary logic for handling custom debug interrupts like halt, resume and step. The Debug csrs must be defined in the debug spec. TODO provide example LINK

CSR-BOX also allows the user to define custom CSRs that may be required by the the implementation. C-Class uses a custom csr to control the enabling/disabling of caches and branch predictors. The details of this CSR are provided here. An example YAML containing the definition of these CSRs which can be fed into CSR-BOX is available in the sample_config directory.

Other Derived Configuration Settings¶

Other than the CSRs, C-Class derives the following parameters from the input isa yaml

  • The ISA string indicates what extensions be enabled in Hardware and its associated collaterals
  • The max value in the supported_xlen node indicates the xlen variable in C-Class. This is used to defined the width of the integer register file, alu operations, bypass width, virtual address size, etc.
  • The flen variable in C-Class is set based on the presence of ‘F’ or ‘D’ characters in the ISA string.
  • If the ‘S’ extension is present in the ISA string, then C-Class detects the supervisor page translation mode to be implemented by detecting the max legal values of the satp.mode csr field present in the input yaml
  • The asid length to be used in the implementation is also derived by checking legal values of the satp.asid csr field.
  • The size of the physical address to be implemented is derived from the physical_addr_sz node of the isa yaml
  • The number of mhpmcounters (and therefore mhpmevents) and their behavior is also captured from the csrs defined in the input isa yaml
  • the number of pmp entries and granularity is also captured from the input isa yaml.
  • custom interrupts/exceptions and their cause values are also captured from the input isa yaml. The implementation creates an entry in the defines file with for the name and cause value. The usage of these custom causes need to be implemented separately in the bsv code.
  • The max size of the cause field in the mcause csr is also derived by checking for the max cause value being used after accounting for the custom interrupts and exceptions.

Micro-Architectural Configuration hooks¶

The C-Class core has also defined a custom schema to control various micro-architectural features of the core. A sample configuration file is available in the sample_config directory.

The following provides a list and description of the configuration hooks available at the micro-architectural level. Note, there are also hooks in this configuration which control the bluespec compilation commands and the verilator commands as well.

num_harts¶

Description: Total number of harts to be instantiated in the dummy test-soc. Note that these will non-coherent cores simply acting as masters on the fast-bus.

Examples:

num_harts: 2
isb_sizes¶

Description: A dictionary controlling the size of the inter-stage buffers of the pipeline. The variable isb_s0s1 controls the size of the isb between stage0 and stage1. Similarly isb_s1s2 dictates the size of the isb between stage1 and stage2 and so on. By increasing isb_s0s1 and isb_s1s2 one can shadow the stalls or latencies in the backend stages of the pipeline by fetching more instructions into the front-end stages of the pipeline.

There is a restriction however that isb_s2s3 should always be 1. This is because the outputs of register file accessed in stage2 are not buffered and niether is the bypass scheme implemented to handle this scenario.

One can however increase the number of in-flight instructions by increasing the sizes of isb_s3s4 and isb_s4s5 (increasing isb_s3s4 has a larger impact).

Also note that if write-after-write stalls are disabled , the size of the wawid is defined by the sum of isb_s3s4 and isb_s4s5. Therefore, increasing in-flight instructions caused a logarithmic increase in the wawid used for maintaining bypass of operands.

Examples:

isb_sizes :
  isb_s0s1: 2
  isb_s1s2: 2
  isb_s2s3: 1
  isb_s3s4: 2
  isb_s4s5: 2
merged_rf¶

Description: Boolean field to indicate if the architectural registerfiles for floating and integer should be implemented as a single extended regfile in hw or as separate. This field only makes sense ‘F’ support is enabled in the ISA string of the input isa yaml. Under certain targets like FPGA or certain technologies maintaining a single registerfile might lead to better area and timing savings.

Examples:

merged_rf: True
total_events¶

Description: This field indicates the total number of events that can be used to program the mhpm counters. This field is used to capture the size of the events signals that drives the counters.

Examples:

total_events: 28
waw_stalls¶

Description: Indicates if stalls must occur on a WAW hazard. If you are looking for higher performance set this to False. Setting this to true would lead to instructions stalling in stage3 due to a WAW hazard.

Setting this to false also means the scoreboad will not allocate a unique id to the destination register of every instruction that is offloaded for execution. The size of this id depends on the numbr of in-flight instructions after the execution stage, which in turn depends on the size of the isb_s3s4 and isb_s4s5 as defined above.

Examples:

waw_stalls: False
iepoch_size¶

Description: integer value indicating the size of the epochs for the instruction memory subsystem. Allowed value is 2 only

Examples:

iepoch_size: 2
depoch_size¶

Description: integer value indicating the size of the epochs for the data memory subsystem. Allowed value is 1 only

Examples:

depoch_size: 1
s_extension¶

Description: Describes various supervisor and MMU related parameters. These parameters only take effect when “S” is present in the ISA field.

  • itlb_size: integer indicating the size of entries in the fully-associative Instruction TLB
  • dtlb_size: integer indicating the size of entries in the fully-associative Data TLB

Examples:

s_extension:
  itlb_size: 4
  dtlb_size: 4
a_extension¶

Description: Describes various A-extension related parameters. These params take effect only when the “A” extension is enabled in the riscv_config ISA

  • reservation_size: integer indicate the size of the reservation in terms of bytes. Minimum value is 4 and must be a power of 2. For RV64 system minimum should be 8 bytes.

Examples:

a_extension:
  reservation_size: 8
m_extension¶

Description: Describes various M-extension related parameters. These parameters take effect only is “M” is present in the ISA field. The multiplier used in the core is a retimed one. The parameters below indicate the number of input and output registers around the combo block to enable retiming.

  • mul_stages_out: Number of stages to be inserted after the multiplier combinational block. Minimum value is 1.
  • mul_stages_in: Number of stages to be inserted before the multiplier combinational block. Minimum value is 0
  • div_stages: an integer indicating the number of cycles for a single division operation. Max value is limited to the XLEN defined in the ISA.

Examples:

m_extension:
  mul_stages_in  : 2
  mul_stages_out : 2
  div_stages: 32
branch_predictor¶

Description: Describes various branch predictor related parameters.

  • instantiate: boolean value indicating if the predictor needs to be instantiated
  • predictor: string indicating the type of predictor to be implemented. Valid values are: ‘gshare’ not. Valid values are : [‘enable’,’disable’]
  • btb_depth: integer indicating the size of the branch target buffer
  • bht_depth: integer indicating the size of the bracnh history buffer
  • history_len: integer indicating the size of the global history register
  • history_bits: integer indicating the number of bits used for indexing bht/btb.
  • ras_depth: integer indicating the size of the return address stack.

Examples:

branch_predictor:
  instantiate: True
  predictor: gshare
  btb_depth: 32
  bht_depth: 512
  history_len: 8
  history_bits: 5
  ras_depth: 8
icache_configuration¶

Description: Describes the various instruction cache related features.

  • instantiate: boolean value indicating if the predictor needs to be instantiated not. Valid values are : [‘enable’,’disable’]
  • sets: integer indicating the number of sets in the cache
  • word_size: integer indicating the number of bytes in a word. Fixed to 4.
  • block_size: integer indicating the number of words in a cache-block.
  • ways: integer indicating the number of the ways in the cache
  • fb_size: integer indicating the number of fill-buffer entries in the cache
  • replacement: strings indicating the replacement policy. Valid values are: [“PLRU”, “RR”, “Random”]
  • ecc_enable: boolean field indicating if ECC should be enabled on the cache.
  • one_hot_select: boolean value indicating if the bsv one-hot selection funcion should be used of conventional for-loops to choose amongst lines/fb-lines. Choice of this has no affect on the functionality

If supervisor is enabled then the max size of a single way should not exceed 4Kilo Bytes

Examples:

icache_configuration:
  instantiate: True
  sets: 4
  word_size: 4
  block_size: 16
  ways: 4
  fb_size: 4
  replacement: "PLRU"
  ecc_enable: false
  one_hot_select: false
dcache_configuration¶

Description: Describes the various instruction cache related features.

  • instantiate: boolean value indicating if the predictor needs to be instantiated not. Valid values are : [‘enable’,’disable’]
  • sets: integer indicating the number of sets in the cache
  • word_size: integer indicating the number of bytes in a word. Fixed to 4.
  • block_size: integer indicating the number of words in a cache-block.
  • ways: integer indicating the number of the ways in the cache
  • fb_size: integer indicating the number of fill-buffer entries in the cache
  • sb_size: integer indicating the number of store-buffer entries in the cache. Fixed to 2
  • lb_size: integer indicating the number lines to be stored in the store buffer. Applicable only when rwports == 1r1w
  • ib_Size: integer indicating the number of io-buffer entries in the cache. Default to 2
  • replacement: strings indicating the replacement policy. Valid values are: [“PLRU”, “RR”, “Random”]
  • ecc_enable: boolean field indicating if ECC should be enabled on the cache.
  • one_hot_select: boolean value indicating if the bsv one-hot selection funcion should be used of conventional for-loops to choose amongst lines/fb-lines. Choice of this has no affect on the functionality
  • rwports: number of read-write ports available on the brams. Allowed values are 1rw, 1r1w and 2rw

If supervisor is enabled then the max size of a single way should not exceed 4Kilo Bytes

Examples:

dcache_configuration:
  instantiate: True
  sets: 4
  word_size: 4
  block_size: 16
  ways: 4
  fb_size: 4
  sb_size: 2
  lb_size: 2
  ib_size: 2
  replacement: "PLRU"
  ecc_enable: false
  one_hot_select: false
  rwports: 1r1w
reset_pc¶

Description: Integer value indicating the reset value of program counter

Example:

bus_protocol¶

Description: bus protocol for the master interfaces of the core. Fixed to “AXI4”

Examples:

bus_protocol: AXI4
fpu_trap

Description: Boolean value indicating if the core should trap on floating point exception and integer divide-by-zero conditions.

Examples:

fpu_trap: False
verilator_configuration¶

Description: describes the various configurations for verilator compilation.

  • coverage: indicates the type of coverage that the user would like to track. Valid values are: [“none”, “line”, “toggle”, “all”]
  • trace: boolean value indicating if vcd dumping should be enabled.
  • threads: an integer field indicating the number of threads to be used during simulation
  • verbosity: a boolean field indicating of the verbose/display statements in the generated verilog should be compiled or not.
  • out_dir: name of the directory where the final executable will be dumped.
  • sim_speed: indicates if the user would prefer a fast simulation or slow simulation. Valid values are : [“fast”,”slow”]. Please selecting “fast” will speed up simulation but slow down compilation, while selecting “slow” does the opposite.

Examples:

verilator_configuration:
  coverage: "none"
  trace: False
  threads: 1
  verbosity: True
  open_ocd: False
  sim_speed: fast
bsc_compile_options¶

Description: Describes the various bluespec compile options

  • test_memory_size: size of the BRAM memory in the test-SoC in bytes.
    Default is 32MB
  • assertions: boolean value indicating if assertions used in the design should be compiled or not
  • trace_dump: boolean value indicating if the logic to generate a simple trace should be implemented or not. Note this is only for simulation and not a real trace
  • compile_target: a string indicating if the bsv files are being compiled for simulation of for asic/fpga synthesis. The valid values are: [ ‘sim’, ‘asic’, ‘fpga’ ]
  • suppress_warnings: List of warnings which can be suppressed during bluespec compilation. Valid values are: [“none”, “all”, “G0010”, “T0054”, “G0020”, “G0024”, “G0023”, “G0096”, “G0036”, “G0117”, “G0015”]
  • ovl_assertions: boolean value indicating if OVL based assertions must be turned on/off
  • ovl_path: string indicating the path where the OVL library is installed.
  • sva_assertions: boolean value indicating if SVA based assertions must be turned on/off
  • verilog_dir: the directory name of where the generated verilog will be dumped
  • open_ocd: a boolean field indicating if the test-bench should have an open-ocd vpi enabled.
  • build_dir: the directory name where the bsv build files will be dumped
  • top_module: name of the top-level bluespec module to be compiled.
  • top_file: file containing the top-level module.
  • top_dir: directory containing the top_file.
  • cocotb_sim: boolean variable. When set the terminating conditions in the test-bench environments are disabled, as the cocotb environment is meant to handle that. When set to false, the bluespect test-bench holds the terminating conditions.

Examples:

bsc_compile_options:
  assertions: True
  trace_dump: True
  suppress_warnings: "none"
  top_module: mkTbSoc
  top_file: TbSoc
  top_dir: base_sim
  out_dir: bin
noinline_modules¶

Description: This node contains multiple module names which take a boolean value. Setting a module to True would generate a separate verilog file for that module during bluespec compilation. If set to False, then that particular module will be in lined the module above it in hierarchy in the generated verilog.

Examples:

noinline_modules:
  stage0: False
  stage1: True
  stage2: False
  stage3: False

Test SoC¶

The C-class repository also contains a simple test-soc for the purpose of simulating applications and verifying the core. More enhanced and open-source SoCs can be found here.

Structure of SoC¶

The Test-SoC has the following structure (defined to a max of 4 levels of depth):

graph TD; X[mkTbSoC] --> A(mkSoC) X --> B(mkbram) X --> C(mkbootrom) A --> D(mkccore_axi4) A --> E(mkuart) A --> F(mkclint) A --> G(mksignature_dump) D --> H(mkriscv) D --> I(mkdmem) D --> J(mkimem)

Description of the above modules:

Module-Name Description
mkriscv Contains the 5-stages of the core pipeline including the execution and only the interface to the memory subsystem
mkdmem The Data memory subsystem. Includes the data-cache and data-tlbs
mkimem The instruction memory subsystem. Includes the instruction-cache and the instruction-tlbs
mkccore_axi4 Contains the above modules and the integrations across them. Also provides 3 AXI-4 interfaces to be connected to the Cross-bar fabric
mkuart UART module
mkclint Core Level Interrupt
mksignature_dump Signature dump module (for simulation only)
mkSoc contains all the above modules and instantiates the AXI-4 crossbar fabric as well. The fabric has 2 additional slaves, which are brought out through the interface to connect to the boot-rom and bram-main-memory present in the Test-bench
mkbram BRAM based memory acting as main-memory
mkbootrom Bootrom slave
mkTbSoC Testbench that instantiates the Soc, and integrates it with the bootrom and a bram memory

The details of the devices can be found in devices

Address Map of SoC¶

Module Address Range
BRAM-Memory 0x80000000 - 0x8FFFFFFF
BootROM 0x00001000 - 0x00010FFF
UART 0x00011300 - 0x00011340
CLINT 0x02000000 - 0x020BFFFF
Debug-Halt Loop 0x00000000 - 0x0000000F
Signature Dump 0x00002000 - 0x0000200c

Please note that the bram-based memory in the test-bench can only hold upto 256MB of code. Thus the elf2hex arguments will need to applied accordingly

BootRom Content¶

By default, on system-reset the core will always jump to 0x1000 which is mapped to the bootrom. The bootrom is initialized using the files boot.MSB and boot.LSB. The bootrom immediately causes a re-direction to address 0x80000000 where the main program is expected to lie. It is thus required that all programs are linked with text-section begining at 0x80000000. The rest of the boot-rom holds a dummy device-tree-string information.

Synthesis of Core¶

When synthesizing for an FPGA/ASIC, the top module should be mkccore_axi4 (mkccore_axi4.v) as the top module.

The mkimem and mkdmem module include SRAM instances which implement the respective data and tag arrays. These are implemented as BRAMs and thus require no changes for FPGAs. However for an ASIC flow, it is strictly advised to replace the BRAMs with respective SRAMs. The user should refer to RAM Structures for correctly performing the replacement.

Simulating the Core¶

Generate Verilated Executable¶

$ cd c-class
$ python -m configure.main -ispec sample_config/default.yaml
$ make

The above should result in following files in the bin folder:

  • out
  • boot.LSB
  • boot.MSB

Executing Programs¶

Let’s assume the software program that you would like to simulate on the core is called prog.elf (compiled using standard riscv-gcc). This elf needs to be converted to a hex file which can be provided to the verilated executable: out. This hex can be generated using the following command:

For 64-bit:

$ elf2hex 8 33554432 bbl 2147483648 > code.mem

For 32-bit:

$ elf2hex 4 67108864 add.elf 2147483648 > code.mem

place the code.mem file in the bin folder and execute the out binary to initiate simulation.

Please note, since the boot code in the bootrom implicitly jumps to 0x80000000 the programs should also be compiled at 0x80000000. Plus the bram main memory is 256MB large.

Support for PutChar¶

The test-soc for simulation contains a simple uart. The putchar function for the same is available HERE. This has to be used in the printf functions. The output of the putchar is captured in a separate file app_log during simulation.

Simulation Arguments (Logger Utility)¶

  1. ./out +rtldump: if the core has been configured with trace_dump: true , then a rtl.dump file is created which shows the trace of instruction execution. Each line in the file has the following format:

    <privilege-mode> <program-counter> <instruction> <register-updated><register value>

  2. To enable printing of debug statements from the bluespec code, one can pass custom logger arguments to the simulation binary as follows

    • ./out +fullverbose: prints all the logger statements across all modules and all levels of verbosity
    • ./out +mstage1 +l0: prints all the logger statements within module stage1 which are at verbosity level 0.
    • ./out +mstage2 +mstage4 +l0 +l3: prints all the logger statements within modules stage2 and stage4 which are at verbosity levels 0 and 3 only.
  3. An app_log file is also created which captures the output of the uart, typically used in the putchar function in C/C++ codes as mentioned above.

Connect to GDB in Simulation¶

A debugger implementation following the riscv-debug-draft-014 has been integrated with the core. This can be instantiated in the design by configuring with: debugger_support: true

Perform the following steps to connect to the core executable with a gdb terminal. This assumes you have installed openocd and is available as part of you $PATH variable.

Modify the sample_config/default.yaml to enable: debugger_support and open_ocd. Generate a new executable with this config to support jtag remote-bitbang in the test-bench

$ python -m configure.main -ispec sample_config/default.yaml
$ make gdb # generate executable with open-ocd vpi enabled in the test-bench
  1. Simulate the RTL In a new terminal do the following:

    $ cd c-class/bin/
    $ ./out > /dev/null
    
  2. Connect to OpenOCD Open a new terminal and type the following:

    $ cd c-class/test_soc/gdb_setup/
    $ openocd -f shakti_ocd.cfg
    
  3. Connect to GDB Open yet another terminal and type the following:

    $ cd c-class/test_soc/gdb_setup
    $ riscv64-unknown-elf-gdb -x gdb.script
    

In this window you can now perform gdb commands like : set $pc, i r, etc

To reset the SoC via the debugger you can execute the following within the gdb shell:

$ monitor reset halt
$ monitor gdb_sync
$ stepi
$ i r

Note

The above will not reset memories like caches, brams, etc

Dhrystone¶

The max DMIPS of the core is 1.72DMIPs/MHz.

$ git clone https://gitlab.com/shaktiproject/cores/benchmarks.git
$ cd benchmakrs
$ make dhrystone ITERATIONS=100000

the output directory will contain a code.mem file which needs to be copied to the bin and execute the cclass verilated binary:

$ cp benchmarks/output/code.mem c-class/bin # change paths accordingly
$ cd c-class/bin
$ ./out
$ cat app_log

   Microseconds for one run through Dhrystone:     10.0
   Dhrystones per Second:                       95746.0

Linux on C-Class¶

  1. Generate RTL using the default.yaml config as provided in the repo

    $ python -m configure.main -ispec sample_config/default.yaml
    $ make # generate executable
    
  2. Download the shakti-linux repository and generate the kernel image:

    $ git clone https://gitlab.com/shaktiproject/software/shakti-linux
    $ cd shakti-linux
    $ export SHAKTI_LINUX=$(pwd)
    $ git submodule update --init --recursive
    $ cd $SHAKTI_LINUX
    $ make -j16 ISA=rv64imafd
    
  3. Come back to the folder c-class/ to simulate the kernel on the C-class executable:

    $ cd c-class/
    $ cp $SHAKTI_LINUX/work/riscv-pk/bbl ./bin/
    $ cd bin
    $ elf2hex 8 33554432 bbl 2147483648 > code.mem
    $ ./out
    

    Track the app_log file to see the kernel messages being printed

FreeRTOS on C-class¶

  1. Generate a 32-bit RTL with the following command:

    $ python -m configure.main -ispec sample_config/freertos.yaml
    $ make # generate executable
    
  2. Download the free-RTOS repository for C-class

    $ git clone https://gitlab.com/shaktiproject/software/FreeRTOS
    $ cd FreeRTOS/FreeRTOS-RISCV/Demo/shakti/
    $ make
    
  3. Come back to the c-class folder and do the following:

    $ cd c-class/
    $ cp FreeRTOS/FreeRTOS-RISCV/Demo/shakti/frtos-shakti.elf ./bin
    $ cd bin
    $ elf2hex 8 4194304 frtos-shakti.elf 2147483648 > code.mem
    $ ./out
    

    Track the app_log file to see the kernel messages being printed

Benchmarking the Core¶

The max DMIPS of the C-class core is 1.72DMIPs/MHz.

The max CoreMarks of the C-class core is 2.9CoreMarks/MHz

The C-class core is highly configurable and thus requires a specific kind of tuning to achieve the maximum performance. This document will highlight some of the settings and their respective benchmark numbers. For the following benchmarks the c-class core has been configured using the default.yaml available in the samples/ folder.

Note

Make sure you are using gcc 9.2.0 or above to replicate the following results.

Benchmarking Dhrystone¶

The following numbers have been obtained via simulation where the number of ITERATIONS were fixed to 5000

Flags used for compilation:

-mcmodel=medany -static -std=gnu99 -O2 -ffast-math \
-fno-common -fno-builtin-printf -march=rv64$(march) -mabi=lp64d \
-w -static -nostartfiles -lgcc

When $march is rv64imac the DMIPs/MHz is 1.68:

Microseconds for one run through Dhrystone:     10.0
Dhrystones per Second:                       94652.0

When $march is rv64ima the DMIPs/MHz is 1.72:

Microseconds for one run through Dhrystone:     10.0
Dhrystones per Second:                       96216.0

Benchmarking CoreMarks¶

The following numbers have been obtained via simulation where the number of ITERATIONS were fixed at 100

Flags used for compilation are available in the logs below:

When $march is rv64imac the CoreMarks/MHz is 2.84:

2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 35205197
Total time (secs): 35
Iterations/Sec   : 2
Iterations       : 100
Compiler version : riscv64-unknown-elf-9.2.0
Compiler flags   : -mcmodel=medany -DCUSTOM -DPERFORMANCE_RUN=1 -DMAIN_HAS_NOARGC=1 \
                   -DHAS_STDIO -DHAS_PRINTF -DHAS_TIME_H -DUSE_CLOCK -DHAS_FLOAT=0 \
                   -DITERATIONS=10 -O3 -fno-common -funroll-loops -finline-functions \
                   -fselective-scheduling -falign-functions=16 -falign-jumps=4 \
                   -falign-loops=4 -finline-limit=1000 -nostartfiles -nostdlib -ffast-math \
                   -fno-builtin-printf -march=rv64imac -mexplicit-relocs
Memory location  : STACK
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0x988c
Correct operation validated. See README.md for run and reporting rules.

When $march is rv64ima the CoreMarks/MHz is 2.897:

2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 34516277
Total time (secs): 34
Iterations/Sec   : 2
Iterations       : 100
Compiler version : riscv64-unknown-elf-9.2.0
Compiler flags   : -mcmodel=medany -DCUSTOM -DPERFORMANCE_RUN=1 -DMAIN_HAS_NOARGC=1 \
                   -DHAS_STDIO -DHAS_PRINTF -DHAS_TIME_H -DUSE_CLOCK -DHAS_FLOAT=0 \
                   -DITERATIONS=100 -O3 -fno-common -funroll-loops -finline-functions \
                   -fselective-scheduling -falign-functions=16 -falign-jumps=4 \
                   -falign-loops=4 -finline-limit=1000 -nostartfiles -nostdlib -ffast-math \
                   -fno-builtin-printf -march=rv64ima -mexplicit-relocs
Memory location  : STACK
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0x988c
Correct operation validated. See README.md for run and reporting rules.

Why Compressed Binaries perform bad on C-class?¶

If you have observed the numbers above, it is evident that for the same configuration of the branch-predictor compressed provides a slight reduction in the performance of DMIPs. This is because how the fetch-stage (stage1) has been designed.

The fetch stage always expects the I$ to respond with a 32-bit word which is 4-byte aligned. Since it is possible that the 32-bit word can hold upto 2 16-bit compressed instructions the predictor also always presents 2 predictions one for pc and one for pc+2. While analysing the 32-bit word from the I$ the following scenarios can occur:

  • Case-1: entire word is a 32-bit instruction. In this case the entire word and the prediction for pc is sent to the decode stage.
  • Case-2: word contains 2 16-bit instructions. in this case in the first cycle the lower 16-bits of the word and prediction of pc is sent to the decode stage. In the next cycle the upper 16-bits and prediction of pc+2 is sent to the decode stage.
  • Case-3: lower 16-bits need to be concatenated with the upper 16-bits of the previous I$ response. in this case the a new 32-bit instruction is formed and the prediction of the previous response is sent to the decode stage.
  • Case-4” Only the upper 16-bits of the I$ needs to be analysed. If the upper 16-bits are compressed then the same and prediction of pc+2 is sent to the decode stage. If however, the upper 16-bits are the lower part of a 32-bit instruction, then we need to wait for the next I$ response and use the Case-3 scheme then. Now one can land in this case, when there is jump to a 32-bit instruction placed at a 2-byte buondary.

Now that we understand how the fetch-stage works, assume that all the dhrystone code fits within the I$ (i.e. no misses) and predictor is also well trained to provide all correct-predictions. Consider the following sequence from dhrystone:

...
8000106e: 0x00001797            auipc a5,0x1
...
...
...
800010d8: 0xf97ff0ef            jal ra,8000106e
...

Now each time the jal instruction is executed the fetch-stage enters into case-4 where the upper 16-bits of the 32-bit word at 8000106c is the lower part of a 32-bit instruction starting at 0x8000106e and thus lead to a single-cycle stall in sending the auipc instruction into the decode stage.

Since in dhrystone the above kind of sequence occurs for 3 scenarios in each iteration, and thus there is always a single-cycle delay for each scenario - hence the reduced performance for compressed support.

Micro-Arch Notes¶

Custom CSRs Available in C-Class¶

The C-class includes the following custom csrs implemented in the non-standard space for extra control and special features.

custom control csr (0x800)¶

The csr is used to the enable or disable the caches, branch predictor and arithmetic exceptions at run-time.

Bit Position Reset Value Description
0 from config Enable or disable the data-cache.
1 from config Enable or disable the instruction-cache.
2 from config Enable or disable the branch_predictor.
3 Disabled Enable or disable arithmetic exceptions.
dtvec csr (0x7c0)¶

XLEN register which indicates the address of the debug loop when a the debugger halts the core.

denable csr (0x7c1)¶

1-bit csr indicating if the debugger can halt the core

mhpminterrupten csr (0x7c2)¶

XLEN bit register following the same encoding as mcounteren/mcountinhibit. A bit set to 1 indicates the an interrupt will be generated when the corresponding counter reaches the value 0. More details to use this register is available [here](../docs/performance_counters.md#interrupts-from-counters)

dtim base adddress csr (0x7c3)¶

An XLEN bit register holding the base address of the data tightly integrated scratch memory. This should correspond to the physical address space and not the virtual

dtim bound adddress csr (0x7c4)¶

An XLEN bit register holding the bound address of the data tightly integrated scratch memory. This should correspond to the physical address space and not the virtual

itim base adddress csr (0x7c5)¶

An XLEN bit register holding the base address of the instruction tightly integrated scratch memory. This should correspond to the physical address space and not the virtual

itim bound adddress csr (0x7c6)¶

An XLEN bit register holding the bound address of the instruction tightly integrated scratch memory. This should correspond to the physical address space and not the virtual

cause values for arith traps¶

When configured with fpu_trap: True ,as an extension to the 15 exceptions mentioned in the RISC-V SPEC, we have added six arithmetic exceptions. Out of this five are floating point exceptions specified by IEEE 754 floating point format.

Description Cause Value
Integer divide by zero 17
Floating point Invalid operation 18
Floating point Zero divide 19
Floating point Overflow 20
Floating point Underflow 21
Floating point Inexact 22

Performance Monitors¶

Introduction¶

Currently the RISC-V privilege spec (v1.12) describes a basic hardware performance facility at the hart (core) level .

3 counters for dedicated functions have been defined:

Address Name Description
0xB01 mcycle counts the number of cycles executed by the hart starting from an arbitrary point of time.
0xB02 minstret counts the number of instructions executed by the hart starting from an arbitrary point of time.
0xC03 mtime this is a read-only csr which reads the memory mapped value of the platforms real-time counter.

Each of the above are 64-bit counters. Shadow csrs of the above also exist in the user-space.

Apart from the above, RISC-V also provides provision to instantiate additional 29 64-bit event counters: mhpmcounter3 - mhpmcounter31. The event selectors for these counters are also defined: mhpmevent3 - mhpmevent31. The meaning of these events is defined by the platform and can be customized for each platform.

In addition, RISC-V also defines a single 32-bit counter-enable register : mcounteren. Each bit in this register corresponds to each of the 32 event-counters described above. This register controls only the accessibility of the counter registers and has no effect on the underlying counters, which can continue to increment irrespective of the settings of the mcounteren fields.

Clearing a bit in the mcounteren only indicates that the event-counters cannot be accessed by lower level privilege modes. Similar functionality is implemented by the scounteren register when S-mode is supported.

Overhead Analysis¶
  1. Each event-counter is mapped to a CSR address and additionally all counters are read-write CSRs. Thus each 64-bit counter will have an additional 12-bit decoder to select that counter in case of a read/write CSR op.
  2. Since all CSRs are accessed in the write-back stage of the C-Class core, the 12-bit address from this stage, fans-out to all CSRs. Since the event-counters are implemented as 64-bit adders, the fan-out load is further increased as they become part of the CSR read/write op.
  3. Further more, suppose there are 30 events defined by the core/platform and each event-counter if configurable to choose any of the 30 events to track. This leads to an additional 30 is 1 demux on each event-counter.

All the three factors defined above can cause the event-counters to become critical in terms of area and frequency closure.

Possible solutions¶
  1. To address the issues 1 and 2 listed above, it is possible to implement the CSRs as a daisy-chain as shown below:

![daisy chain](./figs/daisy-chain-csrs.png)

Here the CSRs are group based on their functionality and accesses to CSRs can thus take variable number of cycles. For eg, less frequently accessed CSRs like fcssr or *scratch or debug registers can be placed in GRP-2 or GRP-3. Performance counters and status registers can be placed in GRP-1 to enable quick and fast access. Such daisy chaining will reduce the comparator fan-out while performing CSR read/write ops.

  1. To address the 3rd issue from the above list, it is proposed to split the events in groups and have each counter track only events involved within a specific group. This strategy is further elaborated in the next-section.
List of Events for C-class¶

The C-Class core will support capturing the following 26 events:

Event number Description
1 Number of misprediction
2 Number of exceptions
3 Number of interrupts
4 Number of csrops
5 Number of jumps
6 Number of branches
7 Number of floats
8 Number of muldiv
9 Number of rawstalls
10 Number of exetalls
11 Number of icache_access
12 Number of icache_miss
13 Number of icache_fbhit
14 Number of icache_ncaccess
15 Number of icache_fbrelease
16 Number of dcache_read_access
17 Number of dcache_write_access
18 Number of dcache_atomic_access
19 Number of dcache_nc_read_access
20 Number of dcache_nc_write_access
21 Number of dcache_read_miss
22 Number of dcache_write_miss
23 Number of dcache_atomic_miss
24 Number of dcache_read_fb_hits
25 Number of dcache_write_fb_hits
26 Number of dcache_atomic_fb_hits
27 Number of dcache_fb_releases
28 Number of dcache_line_evictions
29 Number of itlb_misses
30 Number of dtlb_misses
Interrupts from Counters¶

There is a need to raise an interrupt when a particular counter has observed delta number of counts. This feature is however, not part of the current RISC-V ISA, since it does not mandate how the counters are interpreted neither on which direction should they move (up or down).

Thus, to achieve the above said functionality, we propose a new custom CSR:

mhpminterrupten: The encoding for this csr is the same as that of mcounteren/mcountinhibit. When a particular bit is set, it indicates that the corresponding counter will generate an interrupt when the value reaches 0 and the counter is enabled (mhpmevent != 0). The interrupt can be disabled by writing a 0 to the corresponding mhpmevent register (equivalent to disabling the counter)

Following is an example of how such a framework can be used:

> csrw mhpminterrupten, 0x4         # enable interrupt for mhpmcounter3
> addi x31, x0, -delta              # note the negative delta
> csrw mhpmcounter3, x31
> csrw mhpmevent3, 0x9              # enable mhpmcounter3 to track event-code-9
> ...
> interrupt is generated jump to isr!
> ...
>
ISR Routine
> csrw mhpmevent3, x0               # disable mhphmcounter3 will also disable the interrupt.

RAMS used in the C-Class¶

This document describes in detail how various RAM based structures are used within the shakti-designs (specifically the C-class processor). The doc also highlights the differences for porting the same structures to ASIC or FPGAs.

Overview¶

The caches used in the C-class core (instruction and data both), use a single-ported RAM instance (1RW), i.e. one port to perform either a read or a write.

The branch predictors ,however, depending on the choice at compile time may or may not use RAMs. For specific instances, the RAMs used are dual-ported (1R + 1W) i.e. a dedicated port to read and another dedicated port to write.

Functionality¶
Single-Ported RAMs (1RW)¶
  • Module Name: bram_1rw

  • Verilog source: bram_1rw.v

  • Port Descriptions:

    Port Name Direction Description
    clka input Clock signal. Positive edge of clock is used.
    ena input When high indicates the port is being used
    wea input When high indicates a write operation is being performed.
    addr input Indicates the address for read/write
    dina Input Indicates the data for write operations
    douta output Holds the data for a read operation
  • Instantiation Parameters:

    Parameter Name Description
    DATA_WIDTH Width of dina and douta ports.
    ADDR_WIDTH Width of addra port.
    MEMSIZE Depth of the RAM.

    The size of the instantiated RAM will be MEMSIZE x DATA_WIDTH bits where the number of indices is equal to MEMSIZE and the number of bits at each index is equal to DATA_WIDTH.

  • Read Operation: The address is written onto the addr port, and the ena signal is driven high. In the next positive edge, douta port will hold the data. Therefore, the read operations have a one cycle latency. Also, a new address can be given at every cycle (whose output will be obtained in the subsequent cycle).

  • Write Operation: The address is written onto the addr port, data to be written is driven on the dina port, and, ena and wea signals are asserted. At the next positive edge of clock the value at dina is written onto the address addr. Also, a new write operation can be initiated at every clock edge.

Note

  1. The single-ported rams follow a no-change model, where the output douta remains unchanged on write-operations and will always hold the data of the previous read operation.
  2. The single-ported rams assume the outputs are registered for reads.
Dual-Ported RAMs (1R + 1W)¶
  • Module Name: bram_1r1w

  • Verilog source: bram_1r1w.v

  • Ports:

    Port Name Direction Description
    clka Input Clock signal for port A. Operations are performed at the positive edge of the clock.
    ena Input Enable signal for port A. When high, indicates that the port is being used for write.
    wea Input Write enable for port A. When high, indicates that a write operation is being performed.
    addra Input Index address for port A that indicates the address for write
    dina Input Indicates the data for write operations
    clkb Input Clock signal for port B. Operations are performed at the positive edge of the clock.
    enb Input Enable signal for port B. When high, indicates that the port is being used for read.
    addrb Input Index address for port B that indicates the address for read
    doutb Output Holds the data for a read operation
  • Instantiation Parameters:

    Parameter Name Description
    DATA_WIDTH Width of dina and douta ports.
    ADDR_WIDTH Width of addra and addrb ports.
    MEMSIZE Depth of the RAM.

    The size of the instantiated BRAM will be MEMSIZE x DATA_WIDTH bits where the number of indices is equal to MEMSIZE and the number of bits at each index is equal to DATA_WIDTH.

  • Read Operation: Port-B is used for performing reads. The address is written onto the addrb port, and the enb signal is driven high. In the next cycle, doutb port will hold the data. Therefore, the read operations have a one cycle latency. Also, a new address can be given at every cycle (whose output will be obtained in the subsequent cycle).

  • Write Operation: Port-A is used for writes. The address is written onto the addra port, data to be written is driven on the dina port, and, ena and wea signals are asserted. At the next positive edge of clock the value at dina is written onto the address addra. Also, a new write operation can be initiated at every clock edge.

  • Read Write Conflicts: In case of a read and write occurring to the same address at the same time, the writes are guaranteed while the reads need not be.

Note

  1. Here port A is used for write, and port B is used for read operations. Also, the various enable and write enable signals are active high signals.
  2. The dual-ported rams assume the outputs are registered for reads.
Synthesis¶
Mapping to FPGAs¶

The single-ported RAMs (1RW) used in the caches are directly mapped to the true-single ported BRAMs provided by xilinx.

The dual-ported RAMs (1R + 1W) used in branch predictors are directly mapped to true-dual ported RAMs provided by Xilinx. Since the true-dual ported RAMs from xilinx provide a (1RW + 1RW) configuration, our dual-ported instances ensure that portA is used for writes and portB is used only for reads (by ensuring wea port is held low always)

The * RAM_STYLE = "BLOCK" * pragma in the verilog source makes it easy for Vivado to infer these as BRAMs and thus no edits are required in the source file.

Mapping to ASICs¶

For mapping to ASICs, the user has to replace the files bram_1rw and bram_1r1w with respective instances for SRAM modules which meet the same functionality as described above.

In case where SRAM cells of the same size as that of the instantiations are not avaialable, it is the onus of the user to bank/combine available SRAMs cells into a top-module which has the same functionality as bram_1r1w or bram_1rw.

If an SRAM cell has extra ports than the ones required in this document, the user is required to ensure they are driven accordingly to maintain the same functionality as described in this document.

Additionally, if a parameterized instance of the SRAMs can be developed by the user, its the user’s responsibility to manually replace each instance of the RAM’s in the design. For the c-class the instances are defined below:

C-Class Specific instances of RAMs.¶

The size and configuration of the RAMs instantiated in the design can be controlled at the BSV level at compile time using the YAML configuration files. For a quick reference of all 1RW/1R1W instances do the following in the verilog release:

$ grep "bram_1rw " mk*cache.v -A2
$ grep "bram_1r1w " mkbpu.v -A2
Instruction Cache¶

The variables below refer to the fields within the icache_configuration node in the YAML spec. VADDR refers to the XLEN and PADDR refers to the physical_addr_size in the YAML spec.

  • For Data Array

    • instance path: mkicache/data_arr_*
    • Total number of 1RW instances : dbanks x ways
    • DATA_WIDTH per instance: (word_size x 8 x block_size)/ dbanks
    • MEM_SIZE per instance: sets
    • ADDR_WIDTH per instance: Log(sets)
  • For Tag Array

    • instance path: mkicache/tag_arr_*
    • Total number of 1RW instances : tbanks x ways
    • DATA_WIDTH per instance: PADDR - (Log(word_size) + Log(block_size) + Log(sets)) )/tbanks
    • MEM_SIZE per instance: sets
    • ADDR_WIDTH per instance: Log(sets)
Data Cache¶

The variables below refer to the fields within the dcache_configuration node in the YAML spec. VADDR refers to the XLEN and PADDR refers to the physical_addr_size in the YAML spec.

  • For Data Array

    • instance path: mkdcache/data_arr_*
    • Total number of 1RW instances : dbanks x ways
    • DATA_WIDTH per instance: (word_size x 8 x block_size)/ dbanks
    • MEM_SIZE per instance: sets
    • ADDR_WIDTH per instance: Log(sets)
  • For Tag Array

    • instance path: mkdcache/tag_arr_*
    • Total number of 1RW instances : tbanks x ways
    • DATA_WIDTH per instance: PADDR - (Log(word_size) + Log(block_size) + Log(sets)) )/tbanks
    • MEM_SIZE per instance: sets
    • ADDR_WIDTH per instance: Log(sets)
Branch Predictors¶

RAMs will not be instantiated if the predictor option in YAML config is set to gshare_fa. RAM instances for other values are described below. The variables below refer to the fields within the branch_predictor node in the YAML spec. VADDR refers to the XLEN and PADDR refers to the physical_addr_size in the YAML spec.

  • With compressed support:

    • Total number of 1R+1W instances : 2
    • DATA_WIDTH per instance: (VADDR - Log(btb_depth)) + VADDR + 4
    • MEM_SIZE per instance: btb_depth/2
    • ADDR_WIDTH per instance: Log(btb_depth/2)
    • NOTE: One instance will have DATA_WIDTH + 1 bits.
  • Without compressed support:

    • Total number of 1R+1W instances : 1
    • DATA_WIDTH per instance: (VADDR - Log(btb_depth)) + VADDR + 3
    • MEM_SIZE per instance: btb_depth
    • ADDR_WIDTH per instance: Log(btb_depth)

Physical Memory Protection (PMP)¶

The phyiscal memory protection unit is integrated with the caches (data and instruction). The pmp-module implements permission checks region-wise as described in the riscv-privilege spec. See PMP configuration parameters for the pmp support are available

When pmp is disabled, then all pmp csrs are read as zeros.

When PMPEnable is zero, the PMP module is not instantiated and all PMP registers read as zero (regardless of the value of PMPNumRegions)

PMP Granularity¶

The PMP granularity parameter is used to reduce the size of the address matching comparators by increasing the minimum region size. For a 32-bit core the minimum granularity is 4 bytes and for a 64-bit core the minimum granularity is 8 bytes. This choice has been made to reduce the overheads of checking homogeneity of the access. Thus, for a 64-bit core NA4 is no longer available.

For Developers¶

This section describes the directory structure and other details for folks interested in hacking/modifying the core/generator scripts.

Directory Structure¶

c-class
 ├── bsvpath               # file listing all the directories containing relevant bsv files
 ├── CHANGELOG.rst         # contains the CHANGELOG of versions
 ├── configure             # contains the python configuration scripts
 ├── CONTRIBUTING.md       # guideline for making contributions
 ├── docs                  # all the documentation sources
 ├── LICENSE.*             # License files
 ├── Makefile              # makefile for compiling bsv files and linking using verilator
 ├── micro-arch-tests      # contains a variety of directed tests
 ├── README.md             # main doc readme
 ├── rename_translate.sh   # bash script for manipulating verilog files
 ├── requirements.txt      # list of all python packages required for configuring the core
 ├── sample_config         # sample yaml configuration files
 ├── src                   # contains bsv source code of the C-class core
 └── test_soc              # contains a sample test-bench for simulation purposes

Upgrading dependencies¶

The core and test-soc uses modules which are available in different repositories. This list of repositories is maintained in the configure/constants.py under the variable: dependency_yaml. The configurator uses the repo-manager package to clone and patch all relevant dependencies.

Changing Compile arguments¶

The bsc and verilator commands along with their arugments is stored in the configure/constants.py file under the variables: bsc_cmd and verilator_cmd respectively. These are directly used by the configurator to generate the makefile.inc file.

Adding Checks on YAML¶

The configurator also performs specific checks on the legality of the input yaml. Not all configurations are legal and this is performed by the function specific_checks in the configure/configure.py file. More checks should be added only to this function.

CHANGELOG¶

This project adheres to Semantic Versioning.

[2.0.0] - 2022-12-08¶

  • Pipeline upgraded
  • Rtldump changed to match with newer spike
  • Updates made to use newer caches_mmu
  • FPU support added
  • Updates made to use newer devices
  • Configure scripts changes made to build csrbox and riscv-config and use them
  • Changes made to Debugger 1.00
  • Verification updates and ci fixes

[1.10.0] - 2022-10-19¶

  • Added UARTv2 changes
  • Modified requirements.txt to use recent aapg
  • Updated decoder to check for non-zero fs bits in mstatus for floating point instruction
  • Updated decoder to check for valid rounding mode
  • Fixed BPU to not give prediction at the start of fence operation.
  • Upstreamed verification and updated timeout in ci

[1.9.9] - 2020-11-03¶

  • Added c64, c32 design config yamls
  • Removed obsolete csrs for MTIME and MTIMEH

[1.9.8] - 2020-09-23¶

  • removed bram-2-bram paths from caches
  • fixed rg_fs implementation for mstatus csr.

[1.9.7] - 2020-07-03¶

  • license clean-ups

[1.9.6] - 2020-06-05¶

  • put pmp related logic under ifdef pmp in ccore.bsv
  • make the Addr_space configurable through YAML
  • update schema_file comments for better readibility
  • reset value of mstatus.mie is 0 even if openocd is enabled.
  • minimal comments updated in stage0

[1.9.5] - 2020-05-13¶

  • removed the concept of extra history bits from gshare_fa
  • added historybits as a new parameter to indicate the size of bits used from the ghr for indexing.
  • reduced tick resolution in test_soc
  • updated the 2 bit counter increment scheme to account for hysterisis bit separately
  • updated the gshare has function for improved collisions
  • updated repomanager to 1.2.0

[1.9.4] - 2020-04-30¶

  • parallel build using bluetcl is enabled
  • remove re-alignment of bytes in ccore for I$ and D$ reads. This now is handled within the caches
  • bumped version of the caches
  • gitignore updated
  • fixed and cleaned up the interrupt and delegation logic
  • adding pre-requisite checks in configure
  • default.yaml is picked up as default if no argument given to -ispec
  • split interface of seip and meip. Both can now be driven by plic independently. Also led to removal of unwated attributes.

[1.9.3] - 2020-04-30¶

  • fixed reset logic handling in ccore.bsv to support reset by debugger.
  • updated SoC to decouple debug related logic into a separate module. This now allows for easy reset control.
  • the debug module in the test-soc is now always enabled irrespective of the debug being enabled or not
  • Fixed minor bug in Makefile when compiling for GDB sim.
  • moved debug loop and dtvec_base to 0x100

[1.9.2] - 2020-04-26¶

Fixed¶
  • [docs] move pip install requirements to building core section
  • [docs] fixed typos in simulation section and added dhrystone benchmarking method
  • updating verification repo version to avoid dirname error
Changed¶
  • renamed cclass to ccore at all instances

[1.9.1] - 2020-04-07¶

Fixed¶
  • when pmps are not implemented then return 0 instead
  • bug fixed in csr trap handler logic when only usertraps enabled without supervisor
  • enable openocd macros in configure and clean up performance counter macro generation
  • link verilator target for gdb compile fixed
  • exit ci for patch updates
  • adding missing supervisor and user macros in decoder to enable correct debug functionality
  • 32-bit default config updated to new schema
Changed¶
  • updated method and rule attributes related to csrs for cleaner compile
  • using SizedFIFO instead of LFIFO to avoid unwanted scheduling
Removed¶
  • removing old msb lsb files and replacing with a single file
  • adding sections in ci file

[1.9.0] - 2020-04-03¶

Added¶
  • pmp support fixed
  • pmp support enabled in config
  • adding iitm copyright in configure log
  • adding pmp support documentation
  • adding pipeline image in introduction
Changed¶
  • changed schema of warnings to be a list
  • defaulting to suppress all warnings
  • removing old storebuffer module
  • moving micro arch related chapters under a single micro-arch-notes chapter
Fixed¶
  • adding dummy arprot field to remove warning
  • rg_stall available only under multicycle macro
  • corrected conditions under which pmpcfg and pmpaddr can be written
  • fixed logic for pmp access permissions in decoder

[1.8.0] - 2020-04-01¶

Added¶
  • integration with optimized 1rw dcache and icache
  • support for ecc on both caches
  • suppot for dual ported-rams in dcache

[1.7.3] - 2020-03-24¶

Added¶
  • note to install and follow steps available on the original repositories for all external tools

[1.7.2] - 2020-03-23¶

Fixed¶
  • fixed steps for bsc install in quickstart

[1.7.1] - 2020-03-10¶

Fixed¶
  • Doc updates
  • Use v7.0.1 of the caches with new bram interfaces
  • Store being dropped in the commit stage should wait for the cache to be ready.

[1.7.0] - 2020-03-02¶

Changed¶
  • config file is now yaml based
  • docs moved to read-the-docs
  • restructured directories. base-sim is no longer present. All tests have been moved to micro-arch-tests.
  • LICENSE files have been upgraded
  • common_types.bsv renamed to cclass_types.bsv
  • common_params.bsv renamed to cclass_params.defines
  • removed unwanted ifdef simulate macros
  • Makefile has been update to use the new configuration setup and use the open-bsc tool from henceforth.
  • moved CHANGELOG to rst syntax
  • modifications to use the new 1rw dcache with better freq closure.
  • more comment updates in some modules
Added¶
  • Added a new python based configuration setup

[1.6.1] - 2019-11-21¶

Fixed¶
  • The indication of whether a instruction-page-fault was due to the lower-16 bits or the upper-16 bits has been fixed.

[1.6.0] - 2019-11-21¶

Fixed¶
  • upstream verification with virtual mode runs
  • updated ci

[1.5.0] - 2019-11-21¶

Added¶
  • added support for ITIM and DTIM
  • new csrs to define the address map of the ITIM and DTIM
  • directed tests for performance counters and Tightly-integrated memories
  • doc update for custom csrs of c-class done.
Fixed¶
  • interrupt mask when debbuger is enabled has been fixed.

[1.4.2] - 2019-11-08¶

Added¶
  • macro for reset value of dtvec csr
  • updated doc and template with the macro

[1.4.1] - 2019-10-29¶

Fixed¶
  • Makefile to detect tools directory for artifacts release.

[1.4.0] - 2019-10-28¶

Added¶
  • support for WFI
  • support for illegal trapping when tvm, tw and tsr registers are set in supervisor mode
  • verilog artifacts now have rtldump support and logger support.
  • 256MBytes of BRAM for verilog artifact simulation
Fixed¶
  • made ADDR_SPACE as a variable in config file
  • fixed paramaters for linux template
  • bumped verification version to 3.2.4
  • access to csr 0x321 and 0x322 now generates trap
  • bumping devices to 5.0.0 with new uart features.
  • fixed verilator setup for gdb as well
  • added suppresswarnings as part of the gitlab ci/cd

[1.3.6] - 2019-10-22¶

Added¶
  • Micro Arch ppt of the core pipeline.

[1.3.5] - 2019-10-16¶

Fixed¶
  • verification update for csmith path fix. Close #152

[1.3.4] - 2019-10-16¶

Fixed¶
  • Illegal instruction generation script. Close #151

[1.3.3] - 2019-10-08¶

Fixed¶
  • Illegal encoding were being treated as FCVT.D.S and FCVT.S.D. This has been fixed. Close #149

[1.3.2] - 2019-10-04¶

Fixed¶
  • Passing arith_en to FPU which enables arith_traps Close #147

[1.3.1] - 2019-10-04¶

Fixed¶
  • Traps for floating point ops with ARITH_TRAP enabled but disabled through csr no longer generates traps. Close #147

[1.3.0] - 2019-10-03¶

Added¶
  • bumped to caches with ECC support. Added corresponding hooks and details in readme as well.
Fixed¶
  • typos in readme fixed #138
  • improved verilator build speed.

[1.2.5] - 2019-10-01¶

Fixed¶
  • compile issues with arith_trap enabled fixed
  • decoding for WFI fixed.

[1.2.4] - 2019-09-28¶

Added¶
  • scripts and edits to collect coverage from verilator sim

[1.2.3] - 2019-09-27¶

Fixed¶
  • mie and mip widths fixed when compiling with debug mode enabled. refer to issue #144.

[1.2.2] - 2019-09-26¶

Changed¶
  • tracking cache misses instead of hits. refer to issue #143 for more info.
  • updated performance tests with encodings.

[1.2.1] - 2019-09-26¶

Fixed¶
  • fixed mm benchmark to print stats at end of program

[1.2.0] - 2019-09-26¶

Fixed¶
  • performance counter increment conditions and interrupt generation scheme. A counter will not increment if the respective interrupt has been set.
  • the last daisy-module instantiated should respond with true and data=0
  • fixed op-fwding bug mentioned in issue #140
  • decoding performance counters is fixed now. refer issue #141
Added¶
  • added tests and benchmarks for performance counters.
Removed¶
  • removed redundant epoch register and method from stage4

[1.1.1] - 2019-09-16¶

Fixed¶
  • ci-cd script fixed to delete all generated files

[1.1.0] - 2019-09-16¶

Added¶
  • CSRs are now daisy chained.
  • Performance counters and their event encodings added.
  • Interrupts for counters has also been added.
  • Increased default bram size in TB to be 32MB. This has increased regression time but now the same executable can be used for linux sim as well
Fixed¶
  • BRAM now uses only a single file: code.mem for read-only. MSB and LSB files no longer required.
  • Updated docs to reflect new additions and fixes made above.
  • renamed a few methods based on the coding guidelines.

[1.0.3] - 2019-09-10¶

Added¶
  • makefile now uses bsvpath to identify directories for bsv source. This makes using vim-bsv easier.

[1.0.2] - 2019-09-10¶

Fixed¶
  • rg_delayed_redirect register in stage0 should only be used when bpu and compressed both enabled.

[1.0.1] - 2019-09-09¶

Fixed¶
  • links to verilog artifacts in readme fixed.

[1.0.0] - 2019-09-09¶

Fixed¶
  • data types of ISBs has been split to keep logic minimal and optimize frequency closure
  • Logger is used in all submodules.
  • macros and configurable options have been fixed to be more precise and granular
  • stage0 or pc-fetch stage with fully-associative gshare has been fixed and tuned for higher frequency closure
  • ALU has ben further optimized for better freqency closure
  • ISB types and operand forwarding tuned for better frequency closure.
  • overall changes to remove trailing white-spaces from all files.
  • version extraction based on CHANGELOG will be followed hence forth.
  • fpu convert from dp to sp roundup conditions fixed.
Added¶
  • decompressor function added in stage1
  • reset-pc can now be controlled by the SoC as an input without having to compromize on synthesi boundaries
  • retimed multiplier with configurable stages is used always.
  • different multiplier modules for evaluation have also been added.
  • fully-associative TLB support has also been added.
  • configuration support to supress all warnings during bsv compile
  • CHANGELOG will be maintained from these release onwards.
Removed¶
  • bimodal bpu support has been removed for now since it needs to be re-structured based on new interfaces and also requires new verilog-bram models
  • gshare index model has also been removed along the same arguments as above.
  • support for variable cycle mutliplier has also been removed as part of this release.

Indices and tables¶

  • Index
  • Module Index
  • Search Page

© Copyright IIT Madras Revision 50d3d293.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: latest
Versions
latest
Downloads
html
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.