Keyboard Shortcuts
ctrl + shift + ? :
Show all keyboard shortcuts
ctrl + g :
Navigate to a group
ctrl + shift + f :
Find
ctrl + / :
Quick actions
esc to dismiss
Likes
Search
combinatorial explosion resulting in 20 minute compile time and 500MB verilog file.
Okay, so clearly I'm not doing things "the intended way" otherwise I wouldn't be getting a verilog file with 1.9 million $write statements for what was supposed to be just a simple unit test for this helper function I wrote.
?
The backstory: I set out to implement a hobby risk v core/soc because, well, who wouldn't?? Got a simple clock-per-stage with no pipelining starting point with just a couple instructions going and had it printing the PC and current instruction in hex.? But looking at the instruction in hex isn't very enlightening so I wrote a simple disassemble helper function so I could compare what it was reporting with the assembly I was feeding into it.? At first it worked okay but as I added more and more instruction coverage to the disassembler, my compile times starting climbing.
?
In trying to figure out why, I had the idea of compiling to verilog (was just using bluesim) and looking at the output.? 1.9 million $write statements in a 500MB file!? And that is after I replaced my function to turn CSR indices into names with just a hex number which dropped the compile time down from 20 minutes to 5-ish minutes.? In looking at that file, it is pretty obvious that I've clearly gone well outside the expected usage patterns.? (Probably because I'm mostly a software guy even though I spent 5-ish years doing FPGA work in my previous job.)
?
Here is what appears to be going on:
?
I've got a helper function for turning register indices into ABI names:
?
function String reg_name(RegIdx r);
? ?Vector#(32, String) names = vec( ? ? ? "zero", "ra", "sp", "gp", "tp", "t0", "t1", "t2", ? ? ? "s0", "s1", "a0", "a1", "a2", "a3", "a4", "a5", ? ? ? "a6", "a7", "s2", "s3", "s4", "s5", "s6", "s7", ? ? ? "s8", "s9", "s10", "s11", "t3", "t4", "t5", "t6"); ? ?return names[r]; endfunction ?
In my disassemble function, I call that 3 times with each of the rd, rs1, and rs2 instruction fields:
?
? ?let rd = reg_name(inst_rd(inst));
? ?let rs1 = reg_name(inst_rs1(inst)); ? ?let rs2 = reg_name(inst_rs2(inst)); ?
And then I have a big case-matches statement leveraging don't care bits to pick off each instruction.? Excerpt:
?
? ? ? 'b0000000_?????_?????_000_?????_0110011: $format("add ? ? ? ", rd, c, rs1, c, rs2);
? ? ? 'b0000001_?????_?????_000_?????_0110011: $format("mul ? ? ? ", rd, c, rs1, c, rs2); ? ? ? 'b0100000_?????_00000_000_?????_0110011: $format("neg ? ? ? ", rd, c, rs2); ? ? ? 'b0100000_?????_?????_000_?????_0110011: $format("sub ? ? ? ", rd, c, rs1, c, rs2); ? ? ? 'b0000000_?????_?????_001_?????_0110011: $format("sll ? ? ? ", rd, c, rs1, c, rs2); ? ? ? 'b0000001_?????_?????_001_?????_0110011: $format("mulh ? ? ?", rd, c, rs1, c, rs2); ? ? ? 'b0000000_00000_?????_010_?????_0110011: $format("sltz ? ? ?", rd, c, rs1); ? ? ? 'b0000000_?????_00000_010_?????_0110011: $format("sgtz ? ? ?", rd, c, rs2); ? ? ? 'b0000000_?????_?????_010_?????_0110011: $format("slt ? ? ? ", rd, c, rs1, c, rs2); ? ? ? 'b0000001_?????_?????_010_?????_0110011: $format("mulhsu ? ?", rd, c, rs1, c, rs2); ? ? ? 'b0000000_?????_00000_011_?????_0110011: $format("snez ? ? ?", rd, c, rs2); ?
(I've also got let c = ", "; in scope to cut down on the number of quotes in each $format call.)
?
The resultant verilog has this case statement represented as a large if-then-else tree where it works its way through the various fields inherent in matching against the don't-cares.? But then it keeps going and adds a sequence of 32 if-then-elses for each possible value of rd and inlines the expansion of reg_name into the final string.? And for each of those 32 possibilities, it then runs though the 32 possible values for rs1.? And for each of those rd/rs1 combinations, it (you guess it!) fills in all 32 possible rs2 cases.? All told, 32768 $write statements just to cover the "add" instruction, 1.9 million for what I have covered (rv32ima if you are curious).
?
I tried adding some (* noinline *) attributes, but it won't let me because String and Fmt don't implement Bits and it will only noinline functions that can have their arguments/results represented as wires.
?
The two possible alternatives I've thought of are to either explore the call-out-to-c possibilities (haven't tried anything in that area yet) or to implement a family of string utilities built using fixed size arrays of characters (something like Vector#(maxlen,Bit#(8)) with trailing nuls when the buffer isn't full or Tuple#(Bit#(Log#(maxlen+1)),Vector#(maxlen,Bit#(8)))) that could be marked noinline.
?
So my question for the list: how should I be approaching a disassembly function that is only going to be used in debug/log output?? Or is there a way to convince bsc to not be so aggressive with the inlining even though the functions in question aren't verilog port friendly??
?
-William
? |
Sorry you ran into that.? I suspect some of it might be due to a bug in the handling of $format inside BSC.? It might be good to have a small example to debug with. I can suggest a few work-arounds.? Although it might help to know specifically how the result of the case-matches statement with $format is being used -- is it an argument that's passed to $display, for example?? An executable example would help. One thing you can try is to not use $format.? Instead of constructing one large Fmt and passing it to $display, you could instead have several individual calls to $write. ?(If your code has a function that returns a Fmt, write that instead as function that returns an Action.) (On the other hand, I would be curious whether changing "reg_name" to return Fmt instead of String might help.? But that's my brain trying to debug what might be wrong in the handling of $format, and may not lead to a workaround.) BSC does support an attribute "noinline" which can be placed on a function.? This synthesizes a module implementing that function through ports, and each call to the function becomes a separate instantiation of that module.? That's not exactly what you need here, though -- you'd want BSC to create a Verilog function, which can be called in multiple places.? Unfortunately, that's not a feature that's available at the moment. ?(FYI, "noinline" function is limited to types that are allowed for synthesized interfaces, and I don't think that String is an allowed type.) Since it's only for simulation, the most efficient workaround is probably to write C code that prints the message.? This can be imported using the import-BDPI syntax, as a function with Action return type (so that it executes in proper order relative to other $display and $write actions).? This will guarantee that the function doesn't inline into the BSC-generated code.? It will work in Bluesim and in Verilog simulation -- for Verilog, BSC's default will be to use VPI, but you can provide the "-use-dpi" to use the more efficient DPI, if your simulator supports that. Hope that helps. Julie On Mon, Mar 3, 2025 at 8:15?PM William Howe-Lott via <william=[email protected]> wrote:
|
?
I didn't put a lot of work into making it minimal, but the code at the end of this message produces a verilog file with 35,938 $write statements when compiled to verilog.
?
?
I independently came up with a work-around that overlaps with this idea.? I wrote a helper function that returns a 9 element bit-mask with one bit per possible field.? I mark that function as noinline, which now works because I'm returning a type in Bits# instead of String or Fmt.? And then my main disassembly entry point calls that function and then has a sequence of "if bit set, print that field" statements that no longer cascade.? It does expand the 32-way case in reg_names 3 times, once for each reg field.? But 3*32 is a lot less than 32^3.
?
?
I tried both before my original post and didn't notice a difference.
?
But even though I figured out a work-around, I'm really thinking that I just need to bite the bullet and figure out the VPI/DPI stuff.? I've never used either and I'm limited to open source tools so there will be a bit of a learning curve.
?
Oh, and thank you for your prompt response!
?
-William
?
?
?
--- cut ---
?
package CombinatorialExplosion;
import Vector::*;
import BuildVector::*; typedef Bit#(5) RegIdx;
function String reg_name(RegIdx r);
? ?Vector#(32, String) names = vec( ? ? ? "zero", "ra", "sp", "gp", "tp", "t0", "t1", "t2", ? ? ? "s0", "s1", "a0", "a1", "a2", "a3", "a4", "a5", ? ? ? "a6", "a7", "s2", "s3", "s4", "s5", "s6", "s7", ? ? ? "s8", "s9", "s10", "s11", "t3", "t4", "t5", "t6"); ? ?return names[r]; endfunction function Fmt disassem_proxy(Bit#(32) inst);
? ?let rd = reg_name(inst[4:0]); ? ?let rs1 = reg_name(inst[9:5]); ? ?let rs2 = reg_name(inst[14:10]); ? ?return case (inst) matches ? ? ? ? ? ? ?32'b???????????????000_?????_?????_?????: $format("aaa"); ? ? ? ? ? ? ?32'b???????????????001_?????_?????_?????: $format("bbb ", rd); ? ? ? ? ? ? ?32'b???????????????010_?????_?????_?????: $format("ccc ", rs1); ? ? ? ? ? ? ?32'b???????????????011_?????_?????_?????: $format("ddd ", rd, " ", rs1); ? ? ? ? ? ? ?32'b???????????????100_?????_?????_?????: $format("eee ", rs2); ? ? ? ? ? ? ?32'b???????????????101_?????_?????_?????: $format("fff ", rd, " ", rs2); ? ? ? ? ? ? ?32'b???????????????110_?????_?????_?????: $format("ggg ", rs1, " ", rs2); ? ? ? ? ? ? ?32'b???????????????111_?????_?????_?????: $format("hhh ", rd, " ", rs1, " ", rs2); ? ? ? ? ? endcase; endfunction (* synthesize *)
module mkCombinatorialExplosion(); ? ?Reg#(Bit#(32)) inst <- mkReg(1);
? ?rule print if (inst != 0);
? ? ? $display(disassem_proxy(inst)); ? ? ? inst <= inst << 1; ? ?endrule ? ?rule finish if (inst == 0);
? ? ? $finish(); ? ?endrule endmodule
endpackage
?
|
Cool, thanks! ?
You shouldn't need to know anything about VPI or DPI -- that should be invisible to you, particularly if you're able to use BSC to do the linking step.? You just declare a C function in BSV -- for example:
And then write a C function with the associated interface:
And then provide the C file (or object file) to BSC, and it will invoke the simulator in the right way.? For example, for Verilator (open source simulator):
The "bsc -e" command invokes a script in "bsc/src/exec/" called "bsc_build_vsim_verilator" which you can look in to see how it invokes the tool, if you need to run the tool yourself.? If you're using a different Verilog simulator, it would be similar.? If your simulator supports DPI, that will be simplest, since you just need to provide the C code -- it's a direct interface with the simulator. ?(Verilog's VPI has a more complicated interface, so BSC has to generate additional wrapper files and lookup table files, depending on the simulator, and then you need to provide those files to the simulator, but the "bsc -e" command can hide all that from you.) J |
The |
to navigate to use esc to dismiss