Keyboard Shortcuts
Likes
- B-Lang-Discuss
- Messages
Search
Re: Can't get my provisos to discharge.
Thank you the detailed response!? It has greatly enhanced my understanding of how BSC actually leverages provisos.? The key bit was this:
?
?
I had been assuming something much more magic: that it would collect all the provisos and the constraints imposed by type checking and solve them simultaneously.? (At some time in the past, I tried implementing such an SMT enhanced type-checker but never figured out how I could get useful error messages out of its "unsat" result.)? If I'm interpreting what you said correctly, BSC just works incrementally one provisos at a time to solve for unknown type variables and uses the SMT engine just to verify that polymorphic instantiations are possible with the provisos in the context but doesn't use the SMT engine to solve round-about inter-dependencies between multiple provisos.
?
?
My bus is dbytes wide.? An individual transaction can utilize 1..dbytes (inclusive) bytes.? That gives us the first log and the +1 to cover the inclusive.? But I only allow power of two transactions, so size holds the log2 of the number of bytes--hence the second TLog.? My design using the real thing from which I extracted this example has a 128 bit memory bus and a 32 bit peripheral bus.? For the 128 bit bus, size can be log2(1)...log2(16) aka 0..4 (inclusive) and needs log2(4+1)=3 bits.? Similarly, the 32 bit bus can have sizes from log2(1)...log2(4) aka 0..2 and needs 2 bits.? (Same as AXI's size fields.)
?
?
It is indeed exactly as you conjectured.? I was using log2_wide to determine the number of low bits in the address to use when selecting the relevant byte lanes.? I was surprised that BSC was mixing address constraints with data constraints, but when I looked at the code that BSC referenced (something like addr[log2_wide-1:0]), it made sense.??
?
?
This was a direct result of assigning truncate(wide_size) to narrow_size.? If the transaction on the wide bus fit on the narrow bus, the size gets forwarded unchanged.? If the transaction doesn't fit, it gets split into size_in_bytes/dbytes_narrow transactions.? The second and third term of that Add#() are exactly the argument and result types of that truncate.? I did try some provisos along the lines of Add#(dbytes_narrow, ___, dbytes_wide)--which I'm pretty sure implies that more complicated provisos--but BSC wasn't able to leverage them to discharge the truncate requirements.? (And indeed, I never blindly copy provisos from BSC output because I've had way too many cases of my actual problem being a typo (e.g. bytes vs dbytes) in a type variable and BSC's suggestions were based on the unwarranted assumption that the preceding provisos were sound.)
?
?
So that is where it was hiding!? I knew I had seen it somewhere but since it was easy enough to replicate I didn't look very hard.? And yeah, I wasn't happy with the sloppiness/hackiness of just disabling the static type safety of the original.
?
Thank you again for your response.
?
-William
?
?
?
|
Re: Coding style: Get/Put vs. Client/Server for pipelines?
Thank you for the comprehensive reply! And for the pointer to the RISC-V design course, I'd been meaning to read it in more detail but forgot about it. Your reasoning makes a lot of sense to me. In particular, at the current stage of design the pipeline is still linear, but further down the road it's going to at least have splits and joins, and maybe also some backwards flow. I did not consider that with just the code in front of me right now, but that would be very awkward to materialize as clients and servers. Taking the Fife pipeline as a guide, I can actually already see how this structure is going to improve the readability of my tiling engine. Its current Server-based architecture is forcing me to mix the equivalent of the fetch and decode stages together, which creates quite a tangle and effectively forces the module to have a little state machine to keep track of things. This code should really be broken into 2 or 3 smaller pipeline stages, but I couldn't see that when trying to shoehorn everything into a Server interface. Thank you! - David On Sun, Apr 6, 2025, at 13:51, Rishiyur Nikhil via groups.io wrote:
|
Re: Coding style: Get/Put vs. Client/Server for pipelines?
You may want to look at the Fife 5-stage RISC-V pipeline in:
? ? https://github.com/rsnikhil/Learn_Bluespec_and_RISCV_Design
?
In 'Code/src_Fife/' you will find the 5 CPU pipeline stages
S1_Fetch.bsv through S5_Retire.bsv, each with its own stage interface,
and a top-level CPU.bsv that instantiates the stages and connects
them.
?
* In a linear pipeline I prefer to keep separate 'Get' and 'Put'
? ? sub-interfaces instead of combining them into a Client/Server
? ? because I don't think Client/Server terminology is appropriate
? ? here. ?I think of
? ? ? ? Client ? ?as sending requests and receiving responses
? ? ? ? Server ? as receiving requests and sending responses
? ? neither of which is the case in a linear pipeline where one is
? ? receiving something (not a 'request') from upstream and sending
? ? something (not a 'response') to downstream.
?
? ? Further, we never 'mkConnection' a Get/Put of one module with a
? ? Put/Get of another, in the usual Client/Server connection: we
? ? connect a Put of the module with an upstream Get and a Get of the
? ? module with a downstream Put.
?
* Instead of 'Get' and 'Put' I use 'Semi_FIFOF_O' and 'Semi_FIFOF_I',
? ? respectively, which are essentially the 'first/deq/notEmpty' and
? ? 'enq/notFull' sides of a FIFOF interface. ?I prefer these over
? ? 'Get/Put' because (a) I can test for notEmpty/notFull and (b) I
? ? can non-destructively examine 'first' in a rule condition.
?
? ? These interfaces, and their 'mkConnection' can be found in:
?
? ? ? https://github.com/B-Lang-org/bsc-contrib
?
? ? at 'Libraries/Misc/Semi_FIFOF.bsv'
?
* Each stage interface is more than just one Semi_FIFO_I and one Semi_FIFOF_O:
? ? * Each stage interface may also have an 'init' method (invoked from the top-level CPU.bsv)
? ? * Some stages have more than one-in and/or more than one-out. ?Eg.,?
?
? ? ? ?* S1_Fetch has one-in in the reverse direction from S5_Retire
? ? ? ? ?for redirections (traps and correcting branch mispredictions),
? ? ? ? ?and two-outs, one to S2_Decode and one to the ICache.
?
? ? ? ?* S2_Decode has one-in from S1_Fetch, and one-in from the
? ? ? ? ?ICache and only one-out to S3_Decode
?
? ? ? ?* S3_RR_RW has one-in from S2_Decode, many outs to
? ? ? ? ?different execution pipelines, and one-in in the reverse
? ? ? ? ?direction from S5_Retire to update the register file.
?
? ? ? ?* S5_Retire has many ins from the different execution pipes,
? ? ? ? ? one out in the reverse direction to S1_Fetch (traps and redirections),
? ? ? ? ? and one out in the reverse direction to S3 (update register file).
?
? ? Thus, the overall pipeline is a more complex graph than a single
? ? straight-line one-in, one-out pipe.
? ? The 'mkConnections' in CPU.bsv establish this graph.
|
Coding style: Get/Put vs. Client/Server for pipelines?
(I appreciate that this is a subjective question and the general answer is "it depends", but gut wisdom and opinions from experienced Bluespec programmers would be very welcome!) I'm writing a modest graphics core in BSV, with a various moving parts: tile renderers, a sprite engine, a compositor, pixel drivers for various output protocols, and so on. I have a bunch of components that take some data in and produce data out: obvious ones like memory ports, and some domain-specific things like a palette engine (takes in a stream of palette indices, does lookups in palette RAM, outputs R/G/B color signals). Most of these components have both Get and Put interfaces. Some are obviously server-shaped (e.g. memory ports), whereas others represent part of a pipeline in which an upstream module connects to Put, and a downstream module connects to Get. How would you structure the APIs for modules that are expected to be pipelined in a chain? So far in my code, I've hesitated between two approaches:
I believe both of these approaches can be made to work, and can produce equivalent RTL when synthesized. Should I prefer one approach over the other? Or is there another better structure that I'm missing? I'm still relatively inexperienced in Bluespec, so I'm particularly interested in hard lessons learned, where it turns out a particular structure leads to headaches later, or conversely a particular structure results in more readable and maintainable code down the road. Thanks, - Dave |
Re: Inquiry into Bluespec Scheduler Granularity Capabilities: Rule vs. Case-Arm Level
There may be other state that the rules are using, which enforce an order.? But if the rules can equally execute in either order, then BSC will arbitrarily pick an order, and will report a warning about that choice:
You can enforce a particular order by specifying an "execution_order" attribute.? Alternatively, you can instantiate 'mkCReg' instead of 'mkReg', and assign one rule port 0 and the other rule port 1, and enforce an order that way. If you have a rule whose writes are shadowed by other rules, BSC will also warn about that:
J On Sat, Apr 5, 2025 at 3:18?AM Yehowshua Immanuel via <programmed4jesus=[email protected]> wrote:
|
Re: Behavior of `mkPipelineFIFOF`
As Nikhil said, there is a conceptual sequence of rules within a clock cycle, with a rule calling 'deq' coming first followed by a rule calling 'enq' (for 'mkPipelineFIFOF').? That means that there are three possible points at which the 'notFull' and 'notEmpty' values can be examined: before either rule, between the rules, or after both rules.? In theory, we could define a FIFOF interface that contains all three versions (call them, say, 'notFull0', 'notFull1', 'notFull2').? But the FIFOF interface has only one method 'notFull', so we have to pick one of the three meanings.? If your design needs the value at a different location in the execution schedule, then you'd have to write your own version. The reason that 'enq' can happen 'deq' for 'mkPipelineFIFOF' is that the RDY signal for 'enq' has a combinational path coming from the EN signal for 'deq'.? That means that the RDY for 'enq' can't be consulted (and thus the decision to execute the second rule can't be made) until after execution of the first rule containing the 'deq'.? An obvious choice for the 'notEmpty' method is to define it the same as the RDY signal for 'enq' -- that would force any rule using 'notEmpty' to schedule or execute after the rule with 'deq' executes first.? Of course, 'notEmpty' could be defined differently from the RDY, if a design needed to know the value before either method or after both methods. Some of this is discussed in GitHub Issue #446. J |
Re: Behavior of `mkPipelineFIFOF`
It's better to think first in terms of 'before' and 'after' each rule r1 containing an 'enq' or 'deq', and only then in terms of clocks, because the changed state from r1 can be visible to another rule r2 scheduled later in the same clock (if it is possible to schedule such a rule r2).
?
The attached figure shows two examples: a mkPipelineFIFOF and a mkBypassFIFOF in which the 'enq' and 'deq' are scheduled in two rules in the same clock. ?In the rule semantics (conceptual) you can see that notEmpty and notFull change multiple times in the clock.
?
For mkPipelineFIFOF, 'deq' < 'enq', so it is possible to schedule two rules as shown in its diagram.
For mkBypassFIFOF, 'enq' < 'deq', so it is possible to schedule two rules as shown in its diagram.
?
Of course, if it is not possible (due to method scheduling constraints) that an r2 can be scheduled in the same clock, the change will only be visible at the next clock.
|
Behavior of `mkPipelineFIFOF`
It seems that enqueueing into an empty mkPipelineFIFOF in cycle `n` won't be reflected in its `notEmpty` signal until the beginning of cycle `n+1`? Likewise, if you dequeue a value from a PipelineFIFOF currently holding a single element, the `notEmpty` signal won't change from True to False until the beginning of the next cycle?
?
I suppose the same sort of idea applies to .`notFull`? |
Re: Inquiry into Bluespec Scheduler Granularity Capabilities: Rule vs. Case-Arm Level
Ok... This is quite fascinating and makes sense... I suppose if we evaluate a model action by action, two actions that write to the same resource one after the other can be reduced to the last write that occurred. Given the current code, the retire_tag rule and teh request_tag method will never have write conflicts in practice... But suppose for a moment they did, how does the bluespec compiler decide which one gets the final write? |
Re: Inquiry into Bluespec Scheduler Granularity Capabilities: Rule vs. Case-Arm Level
If you give the -show-rule-rel flags, or query the schedule information from within the BDW GUI or using Blutecl commands, the tools can show you what method calls contribute to the inferred rule relationship.? For example:
This also shows that there are a couple kinds of conflict.? One kind is that two rules just can't execute in the same clock cycle.? But register writes are not all-out conflicts like that.? If you have one rule that writes True to a register and another that writes False, BSC can make logic that implements the firing of both rules, one after another, by only writing the final value into the register (say, False).? I believe that's what's happening in your code.? This treatment of registers can be unexpected, and I know that some people prefer to use their own register definitions that force writes to conflict. In the above schedule info, that I generated for a small example, BSC is reporting that the rules do touch common state, and therefore their execution order will be visible, but that there's nothing to prevent them from executing in sequence in the same clock cycle.? The "<" conflict would mean that one rule has to execute before the other -- for example, if one rule was also reading the value, that would have to come first.? If both directions have an ordering conflict ("<"), then there's an all-out conflict, because the rules can't be executed in either order. To answer your scheduling questions: ?BSC will not generate conditional conflicts.? However, BSC will take the conditions of the actions into account.? So if one rule calls a method when "x > 10" and the other rule calls when "x < 10", BSC will notice that the methods are never called at the same time and will exempt them from consideration.? But if one rule calls when "x > 10" and one rule always calls, BSC won't generate a dynamic scheduler that allows the rules to both fire when "x <= 10", it will just make them conflict.? There is the "-aggressive-conditions" flag (which we should make on by default), which lifts method conditions to the rule's condition, but that's only about a rule's own firing condition, and not about scheduling between rules. There are some features for splitting rules, if you want.? I don't tend to use it, but I could image there could be situations.? In BSV, you can put an attribute like "split" on an if-statement or a case-statement or possibly a rule or method or module.? That directs BSC to split a rule in half for every conditional inside it.? For BH, I don't think the parser accepts a pragma, but there's a function "splitIf" that can be used in place of an if-expression.? BSC also supports a global "-split-if" flag, although that would be an extremely large hammer, so it's probably not useful.? But if you did go that route, there are "nosplit" attributes and "noSplitIf" functions that can be used in the code, to disable splitting in specific locations. Anyway, the benefit of splitting is that the two rules can be placed at different points in the schedule.? For example, if rule A conflicts with rule B because it can't execute before or after, rule splitting might allow A-True to schedule before and A-False to schedule after.? If that's the case, you can always just write the two rules explicitly, instead of using a split-if.? But the option's there. J |
Inquiry into Bluespec Scheduler Granularity Capabilities: Rule vs. Case-Arm Level
I’ve been digging into the scheduling behavior of my TagEngine implementation and could use some insight from the community on how Bluespec resolves conflicts between methods and rules, particularly with respect to granularity.
In my TagEngine (FIFO-based tag management), I have:
?
?* A requestTag method that issues tags, initially from a countdown (initialTagDistributor) and later from a tagFIFO:
?
```
requestTag :: ActionValue UIntLog2N(numTags) requestTag = ? ? do ? ? ? ? case initialTagDistributor of ? ? ? ? ? ? Just 0 -> do ? ? ? ? ? ? ? ? initialTagDistributor := Nothing ? ? ? ? ? ? ? ? (select tagUsageTracker 0) := True ? ? ? ? ? ? ? ? return 0 ? ? ? ? ? ? Just tag -> do ? ? ? ? ? ? ? ? initialTagDistributor := Just |> tag - 1 ? ? ? ? ? ? ? ? (select tagUsageTracker tag) := True ? ? ? ? ? ? ? ? return tag ? ? ? ? ? ? Nothing -> do ? ? ? ? ? ? ? ? let tag = tagFIFO.first ? ? ? ? ? ? ? ? tagFIFO.deq ? ? ? ? ? ? ? ? return tag ``` ?
?* A retire_tags rule that recycles tags into the tagFIFO:
?
```
"retire_tags": when True ==> ? ? do ? ? ? ? $display "firing retire_tags" (fshow retireTag) ? ? ? ? tagFIFO.enq retireTag ? ? ? ? (select tagUsageTracker retireTag) := False ``` Triggered via a `retireTag` method writing to a wire (`retireTag :: Wire UIntLog2N(numTags)`). ?
My question is about scheduler granularity:
In the `Nothing` case of `requestTag`, there’s no write to `tagUsageTracker`, unlike the `Just` cases. `retire_tags`, however, always writes to `tagUsageTracker`. I expected the scheduler to conservatively block `requestTag` and `retire_tags` from firing in the same cycle due to the potential conflict on `tagUsageTracker`, even when `initialTagDistributor` is `Nothing`.
?
However, my Bluesim output suggests they do fire together:
?
```
count : 7 retiring tag : 4? got tag : 2 firing retire_tags4 count : 8 retiring tag : 4? got tag : 4 ``` ?
This comes from a tester using an ActionSeq:
?
```
|> do? ? ? retireTagAction 4 ? ? requestTagAction ``` ?
where `retireTagAction` calls `retireTag 4` and `requestTagAction` calls `requestTag`.
The dumped schedule for retire_tags shows:
?
```
Rule: tagEngine_retire_tags Predicate: tagEngine_retireTag.whas && tagEngine_tagFIFO.i_notFull Blocking rules: (none) ``` but doesn’t mention `requestTag` (a method, not a rule). My understanding was that the scheduler operates at the rule/method level, not case-arm level, so it wouldn’t distinguish the Nothing case’s lack of tagUsageTracker writes. Yet, the simulation suggests otherwise.
?
Here are my final questions:
?* At what granularity does the scheduler resolve conflicts? Does it consider runtime paths (e.g., Nothing vs. Just) or just the union of all possible resource accesses?
?* Why can requestTag (in Nothing) and retire_tags fire in the same cycle here? Is it because they’re sequenced within one ActionSeq rule, and tagFIFO (a FIFOF) allows enq/deq together? |
Re: Best Practices for Simulation in Bluespec
Can you define specific instances to synthesize?? For example:
and then synthesize that, and call "mkTagEngine5" where you need it? If you need nested levels of polymorphism, then another option is to use a typeclass:
and then call "mkTagEngineSynth" anywhere you want a synth boundary and the context will specify which instance to use.? You'll need to populate the typeclass with the instances you need:
There's even a script to automate this process, "InstSynth.tcl" (for "instance specific synthesis"), in bsc/util/bluetcl-scripts/.? Of course, it would be nice if the script wasn't needed and that BSC could support it natively, for example by putting a synthesize pragma on the submodule instantiation.? These workarounds (and potentially others) are discussed on GitHub issue #543. J |
Re: Best Practices for Simulation in Bluespec
There's a GUI environment called BDW, available in the repo , which automates the connection to the GTKwave VCD viewer and provides the scripts for displaying enums and tags as their source code names, and for separating struct fields and vector elements into separate lines.? That code could probably be extracted into a command-line tool, if one would prefer not to use a GUI.? The BDW GUI also supports viewing the source code module hierarchy (including intermediate modules that have been inlined) and users can click on interfaces, state elements, and rules, to automatically add the related signals to the attached GTKWave viewer. As Mieszko said, I only use VCDs for those things -- method ports, state elements, and rule firings -- which have clear naming in the generated code.? I don't debug combinational logic that way, though you can use Probes as Mieszko said.? I tend to use display statements etc. The reason why the word 'request' doesn't show up is because you haven't specified that module for separate synthesis -- BSC inlines submodules.? If the logic for 'request' gets merges with logic from the calling location in the parent module, then there won't necessarily be any signal in the generated Verilog that corresponds to 'request'.? If there's a signal in the generated code that appears to have the same definition as 'request', then it's possible that there's a correctable bug in BSC that we can try to fix -- see if you can create a small example and open an issue.? BSC does generally try to keep source names where it can. But during elaboration, sometimes BSC has to create new signal names, for subexpressions or newly created expressions, and those start with names that are just unique numbers, like "__d1", "__d2", __d3", etc.? That's entirely unhelpful, so BSC attempts to give more meaningful names to these, based on the expression that they are assigned to.? This is defined in "src/comp/SignalNaming.hs", I believe, and I think I've written an explanation of it before -- but for example, if you had: __d1 = x?+ y then BSC would name it "x_PLUS_y__d1".? And this: __d2 = x[2:0] would be named "x_BITS_2_0__d2". During elaboration, case expressions becomes nested if-then-else and everything is optimized down to bits, so examing of tags is just equality on extracted bits, so your expression probably was something like this: __d3 = if (foo[3:3] == 1) then ... which gets named "IF_foo_BITS_3_3_EQ_1_THEN_..." but to avoid generating really long names, they are truncated at some point and just end with "_ETC__d3".? I agree that these longer names are not so helpful, but still better than just "__d3".? Suggestions on how to format these descriptive names is welcome, if you have better ideas. My suggestion, though, would be to insert a synthesis boundary on the "mkTagEngine" module etc. ?(It's worth nothing that Mieszko's suggestion of using "interface" is only going to be helpful on a separately-synthesized module.? You can also try out PR #729, which implements a feature in BSC to split method input ports into separate ports for each field or sub-element.) If you don't want to insert synthesis boundaries, you might try the BDW GUI's module browser.? As I said, it allows for viewing the inlined intermediate module hiearchy, and I believe that you can click on those modules to send any associated signals (that have remained in the Verilog) to the GTKWave viewer. Note that BSC elaborates all the way to bits for Bluesim just as it does for Verilog.? Bluesim is able to linearize the rule execution, but isn't currently doing anything more high-level for the logic execution within each rule.? In theory, it may be possible to implement an evaluator that doesn't expand as far, so that simulation can be kept at a higher level (preserving types and perhaps some loops etc) -- I once experimented with this by trying to tweak the existing evaluator, but ran into some snags, it's probably a significant project.? Also note that the Verilog backend does some optimization of the signals, inlining ones that are simple or only used once, etc; whereas Bluesim doesn't do much optimization.? So you might see more signal names in the Bluesim VCD -- if you're debugging from VCD, I would tend to use the Verilog, which might be less cluttered. ?(Also, you might even see if Verilator on the Verilog simulates faster than the Bluesim anyway, for a given design.) As for Tcl and Python, embedding the current Bluetcl in Python would be straightforward and would be great to see, if someone wants to give it a go.? I described a little about this in a previous email thread.? But basically, the functions are implemented into Haskell and that's exported to C and compiled in with a Tcl kernel.? I haven't embedded Haskell functions in Python to know if there's a more direct route, but certainly one could export to C and then import them into Python.? The only bit of work, really, is that the current Haskell functions are a little specific to Tcl in how they expect to receive and return data, and so one might want to decouple the underlying behavior from the specifics of the interface a little more, so that it can be shared by a Tcl wrapper and a Python wrapper. J |
Re: Can't get my provisos to discharge.
Thanks, Ed!? Making a built-in typeclass would still have some benefits, it's worth noting.? The SMT solver won't know anything about your user-defined typeclass, so it won't be able to use that knowledge to resolve provisos and might require you to explicitly include them.? And error messages might be able to be reported better. On Thu, Apr 3, 2025 at 3:59?AM Ed Czeck via <ed.czeck=[email protected]> wrote:
? |
Re: Best Practices for Simulation in Bluespec
开云体育I already added `-keep-fires`?. The issue I’m having is that the generated VCD literally doesn’t have doesn’t even have the word `request` in it even though `TagEngine` has the `requestTag` method…Is this typical? I would imagine in an ideal world methods should at least have respective can/will fire signals show up in the VCD? To reproduce, you can do: ``` git clone? cd?riscv-bluespec-classic TOPMODULE=mkSim make b_compile b_link b_sim_vcd ``` I’ve attached the VCD for convenience. It seems very strange that a whole method would not show up... Yehowshua |