开云体育

ctrl + shift + ? for shortcuts
© 2025 开云体育

RegFile but with more write ports


 

Does anyone know of a library or approach that might fill this need? I'm looking to be able to write multiple times in a cycle.
?
For clarity, I'm trying to do something like this:
?
import RegFile::*;
module regfile_write_ports (Empty);
? ? RegFile#(UInt#(1), Bool) rf <- mkRegFileFull();
??? (* fire_when_enabled *)
? ? rule wr0;
? ? ? ? rf.upd(0, True);
? ? endrule
? ? (* fire_when_enabled *)
? ? rule wr1;
? ? ? ? rf.upd(1, True);
? ? endrule
or even just this:
import RegFile::*;
module regfile_write_ports (Empty);
? ? RegFile#(UInt#(1), Bool) rf <- mkRegFileFull();
? ? rule wr;
? ? ? ? rf.upd(0, True);
? ? ? ? rf.upd(1, True);
? ? endrule
endmodule


 

The standard RegFile has only one write-port, so it can support only one write per cycle (and 5 read ports).
?
I you want multiple writes in the same cycle to a register file (to disjoint register numbers), you'll have to code it up yourself, using a Vector of registers.
?
See the attached tar file for a flavor of how this can be done. ?There is just one source file, src_BSV/mkTop. ?'make b_all' will compile, link and run (Bluesim).
?
The compiler has to be assured that we are not trying to write to the same register in the same clock. ?Either the register indexes should be statically distinguishable by the compiler, or the writes have to be in different rules so that only one will be scheduled. ?The code contains some commented-out parts illustrating this.
?
?


 

Thank you very much for the reply! It's very helpful for a beginner like myself and it's good to know I've been thinking along the right tracks.
?
Because writes must be to distinct register indices, does this mean each register has just one write port still?
?
?
I tried the following and to my surprise it worked, outputting a stream of "2":
?
(* synthesize *)
module mkTop (Empty);
? ? MyRegFile#(4, Bit#(32)) my_rf <- mkMyRegFile_4_32;
?
? ? rule display;
? ? ? ? $display(my_rf.read_ports[0]);
? ? endrule
?
? ? (* fire_when_enabled *)
? ? rule wr1;
? ? ? ? my_rf.write_ports[0] <= 1;
? ? endrule
?
? ? (* fire_when_enabled *)
? ? rule wr2;
? ? ? ? my_rf.write_ports[0] <= 2;
? ? endrule
endmodule
?
I was expecting that I'd have to switch to CReg or ConfigReg or Ehr as the underlying register type for this to work.
?
?
I'm now considering making a module that can take a vector of (key, value) and writing it to its registers how it sees fit. :)


 

Because writes must be to distinct register indices, does this mean
each register has just one write port still?
?
Yes. ?Each register has only one write port. ?You'd have to use a CReg
if you need multiple write-ports (with an ordering) on an individual register.
?
I tried the following and to my surprise it worked, outputting a stream of "2":
?
Yes.
?
(1) The two writes conflict, which is ok if they are in different
rules (it just means that both rules cannot fire in the same
clock). ?The compiler can just choose rule 'wr2' over rule 'wr1'.
?
(2) In this particular case, the compiler did something further, which
would be evident if you added a $display to 'wr1' and 'wr2';
you'll find that both rules fire.
?
This is because the compiler knows an additional property of Reg,
which is that if, in the rule semantics, it is "written twice" in
the same clock, without an intervening observation (read), the
final state is equivalent to just doing the latter write, and it
can ignore the former write.
?
A similar situation might be: 'enq' a FIFO in one rule, and
'clear' in another rule without any intervening observation: this
is equivalent to just doing the 'clear', and ignoring the 'enq'.
?
But note that these are special extra properties of certain
primitives beyond the basic vocabulary of ordinary orderings and
conflicts, where we know that the compilation produces the correct
output w.r.t. the sequential rule semantics.
?
In this case, the compiler arbitrarily chose 'wr1' as the "former"
rule and 'wr2' as the latter rule. When compiling, you should have
seen a compiler message like this:
? ? Warning: "src_BSV/Top.bsv", line 50, column 8: (G0036)
? ? ? Rule "wr1" will appear to fire before "wr2" when both fire in the same clock
? ? ? cycle, affecting:
? ? ? ? calls to my_rf.write_ports_0__write vs. my_rf.write_ports_0__write
You could force the choice to go the other way by adding the attribute:

? ? ? ? (* execution_order = "wr2, wr1" *)
?
in front of the second rule, and you'll see a stream of '1's printed, instead of '2's.


 

Ah brilliant, thank you so much.
?
I thought I was being clever by using (* fire_when_enabled *) but it seems the compiler is even smarter.
?
?
My solution has ended up as a vector of registers with a vector of RWires for any writes to them. Then I have a single rule that reads all the registers to a "local" copy, modifies them based on the RWires, and writes them back.
Unfortunately it won't compile up to the size I am using. It takes about an hour on my machine before the Linux kernel terminates the process for using too much memory (20GiB).
So far it has worked for 2^8 entries (each of 2 bits) but not 2^12 entries (which RegFile had no problem with).


 
Edited

Unfortunately it won't compile up to the size I am using.
Yes, I think the compiler may be building a very deep if-then-else
tree and might have a hard time analysing it.
?
There may be another way to do it: Instead of an if-then-else that is
the size of regfile (NRgs), you could transform that into an multiple
if-then else's the size of the number of simultaneous writes (NRws)
you're trying to support.
?
Generate NRgs rules, one per register. ?In each rule, analyse the NRws
rwires to decide how to update just that register.
?
?
An alternative solution is to mimic a 'store buffer' from CPUs,
assuming there is some 'slack' in the use of the register file, i.e.,
some idle cycles where no writes are attempted.
?
(A) Use an ordinary RegFile, plus a vector (queue of pending writes).
If there is just one write in a clock, write it to the RegFile.
?
If there is >1 write in a clock, write one of them to the RegFile and
enqueue the others as 'pending'.
?
In "idle cycles" (no writes), dequeue and perform the pending writes
into the RegFile, one by one.
?
Note: for every read to reg J, you'd have to check the pending queue
for the most recent write to reg J, if any, and read the RegFile if none.
?
?
(B) At the expense of doubling the register storage, use two RegFiles
RF1 and RF2, where RF2 is used as the 'pending queue' described above,
and copied to RF1 on idle cycles. This avoids 'searching' the pending
queue for more recent writes, it's a direct access to RF2 (you'll need a 'valid'
bit on each RF2 register to know whether it has a pending write or not).


 

Note that vectors of a large number of registers can result in large
multiplexers which can be very expensive in hardware.
?
If that is a problem, you may want to move to using a BRAM (Block RAM).
The multiplexers still exist, but now they're carefully engineered,
built-in, inside the BRAM logic, rather than being built out of random
logic as in vectors of registers. ?(On FPGAs, RegFile also use a kind
of RAM called a LUTRAM).
?
The bsc library has BRAMs with two ports.
?
Unlike RegFile, BRAMs don't give you a combinational (0-cycle) read;
there's a minimum of 1-tick latency for a read.
?
If you need more write-ports, perhaps the 'pending writes' buffer
ideas sketched in earlier messages may be useful.


 
Edited

The first solution is exactly what I needed and it works now! Thank you very much. I'm curious if the generated hardware with the single rule approach would have been any good or not - i.e. is this just an issue during compilation?
?
In case it's of use to anyone reading this in the future, this is the method I used. I'm sure it could be turned into a module with a proper constructor.
?
Vector#(TExp#(SizeOf#(Index), Reg#(MyType)) myRegFile <- replicateM(mkRegU);
Vector#(NumWriters, RWire#(Tuple2#(Index, MyType))) myRegFileWriters <- replicateM(mkRWire);
?
Instead of writing to myRegFile, you instead do myRegFileWriters[writer].wset(index, value). These are then written in these rules:

? ? for (Index i = 0; i < maxBound; i = i + 1)
? ? ? ? (* fire_when_enabled *)
? ? ? ? rule writeMyRegFile_singleReg;
? ? ? ? ? ? Bool done = False;
? ? ? ? ? ? for (Integer w = 0; w < valueOf(NumWriters) && !done; w = w + 1)
? ? ? ? ? ? ? ? if (myRegFileWriters[w].wget() matches tagged Valid {.writeIndex, .value})
? ? ? ? ? ? ? ? ? ? // I don't think I can pattern match an Index.
??????????????????? // It would be nice to have `matches tagged Valid {i, .value}`.
? ? ? ? ? ? ? ? ? ? if (writeIndex == i) begin
? ? ? ? ? ? ? ? ? ? ? ? done = True;
? ? ? ? ? ? ? ? ? ? ? ? writeReg(myRegFile[i], vwh);
? ? ? ? ? ? ? ? ? ? end
? ? ? ? endrule
?
?
??? // This commented out version does not work for large structures.
? ? // (* fire_when_enabled *)
? ? // rule writeMyRegFile;
??? //???? // For some reason type information must be explicit here (can't use `let`).
? ? // ? ? Vector#(TExp#(SizeOf#(Index), MyType) localMyRegFile = readVReg(myRegFile);
? ? // ? ? for (Integer i = 0; i < valueOf(NumWriters); i = i + 1)
? ? // ? ? ? ? if (myRegFileWriters[i].wget() matches tagged Valid .write) begin
? ? // ? ? ? ? ? ? let index = tpl_1(write);
? ? // ? ? ? ? ? ? let value = tpl_2(write);
? ? // ? ? ? ? ? ? localMyRegFile[index] = vwh;
? ? // ? ? ? ? end
? ? // ? ? writeVReg(myRegFile, localMyRegFile);
? ? // endrule
?
?
?
P.S. I've just seen your most recent reply while writing this. My project is only using simulation but it's interesting to see how it properly maps to different hardware.