Research Monger: August 2022

Saturday, August 6, 2022

Recursion and Rust lifetimes

While working on my HTN compiler, I came upon an interesting caveat of Rust lifetimes. The caveat comes from multiple different concepts merging together to make this somewhat of an edge case. But before we dive into this edge case, let's review some of the logic that lead me on this path.

The HTN compiler follows a python-like syntax, which means at the end of parsing an Abstract Syntax Tree (AST) is output and passed to the compiler to further reduce it to bytecode. If you are interested in learning more about this process there's a great book by Robert Nystrom - Crafting Interpreters, but I will overview this process here too and map Rust concepts to it.

String to tokens (Lexing)

As with any parsing task, we start at a text level. We take in a string and output a sequence of tokens that represent the interesting parts of that string. The tokens commonly omit comments and extra whitespace. In our case the task is a little-bit harder because I want to implement a python-like syntax where new lines and tab offsets matter. I decided to implement the logic of following tab-depths in the lexer part of the parser.

I started working on this compiler before I had a solid grasp on a few Rust concepts. One of the reasons I started this project was to learn Rust more than I have previously. My original thought was the the Lexer would take ownership of the source code variable and then output tokens containing references to source code where needed (e.g. a token to the string literal). I quickly ran into problems of passing the tokens further down the chain because the Lexer wouldn't live long enough to drop any refernces from tokens to the source code. My initial solution was to use a reference counting pointers - Rc<String> . Even then I ran into a few problems of creating self-referential structures. After a few cycles of refactoring, however, I ended up with a much cleaner design that uses primitive references and does not require self-referential structures:

pub struct Lexer<'a> { 
    text:&'a str,
    it: Peekable<CharIndices<'a>>,
    line: usize, // current source line, used for error reporting by Tokens
    col: usize, // current source column, used for error reporting by Tokens
    stash: Vec<Token<'a>>, // stash of tokens to return in leu of it.peek_two()
    is_newline: bool,
    tab_depth: usize, // needed to insert BLOCK_END tokens properly
    depth_separator: Option<DepthSeparator>,
}

impl<'a> Lexer<'a> {
    pub fn new(text:&'a str) -> Self {
        Self{
            text,
            it:text.char_indices().peekable(), 
            line:1,
            col:1,
            stash:Vec::new(),
            is_newline: true,
            tab_depth: 0,
            depth_separator: None,
        }
    }
}

The lexer takes a reference to a slice of a string that is the source code and outputs tokens which contain references to the same slice of the string - both following the 'a lifetime. The interesting bits here is that the lexer doesn't own anything important - it only takes references to a structure (String, source code) that outlives it, and outputs structures (Tokens) that live as long as the source code. Lexer also implements Iterator and that's how the tokens are actually output. The source code characters are wrapped into a Peekable because there are cases where we need to check for the next character to decide on a token type, e.g. in cases like "=>" . Except since we're also doing processing of tab depth and the source code might be ill-formatted so we want to output errors too. So overall our iterator outputs type Item = Result<Token<'a>, Error>;. Now lifetime 'a can be visualized in the following code:

    fn test_type() {
        let code = "type Cell:\n\tc1\n\tc2"; // Lifetime 'a starts here
        let mut l = Lexer::new(code);
        assert_eq!(l.next(), Some(Ok(Token{line:1, col:1, len:4, t:Type})));
        assert_eq!(l.next(), Some(Ok(Token{line:1, col:6, len:4, t:Identifier("Cell")})));
        assert_eq!(l.next(), Some(Ok(Token{line:1, col:10, len:1, t:Colon})));
        assert_eq!(l.next(), Some(Ok(Token{line:1, col:11, len:0, t:StatementEnd})));
        assert_eq!(l.next(), Some(Ok(Token{line:2, col:1, len:1, t:BlockStart})));
        assert_eq!(l.next(), Some(Ok(Token{line:2, col:2, len:2, t:Identifier("c1")})));
        assert_eq!(l.next(), Some(Ok(Token{line:2, col:4, len:0, t:StatementEnd})));
        assert_eq!(l.next(), Some(Ok(Token{line:3, col:2, len:2, t:Identifier("c2")})));
        assert_eq!(l.next(), Some(Ok(Token{line:3, col:4, len:0, t:StatementEnd})));
        assert_eq!(l.next(), Some(Ok(Token{line:3, col:4, len:0, t:BlockEnd})));
        assert_eq!(l.next(), None);
        // Lifetime 'a ends here
    }

Token to AST (Parsing)

From here we have an iterator that outputs tokens. While parsing we also will need to look ahead sometimes to operate on two tokens at once, or sometimes we only want to consume a token if it matches what we expect (gradient decent parser does that often). The interesting parts here is how our AST is defined:

pub enum Stmt<'a> {
    TaskDeclaration{
        name:Vec<Token<'a>>,
        binding:Option<(&'a str, &'a str)>,
    },
    Task{
        name:Vec<Token<'a>>, 
        preconditions:Option<Expr<'a>>, 
        cost: Option<Expr<'a>>,
        binding:Option<(&'a str, &'a str)>,
        body:Box<Stmt<'a>>, 
        effects:Option<Box<Stmt<'a>>>,
        planning:Option<Box<Stmt<'a>>>, 
    }, 
    Type{
        name: Token<'a>,
        body:Box<Stmt<'a>>,
    },
    Block(Vec<Stmt<'a>>),
    Expression(Expr<'a>),
    Include(Token<'a>),
}

Our AST nodes live as long as the original source code - the same lifetime 'a . Parsing is done in a single pass over the tokens, just using a Peekable<Lexer<'a>>. Parser itself implelents Iterator that outputs either AST nodes (statements) or errors. Part of the reason why AST nodes own Tokens is so that I can easily generate errors that refer to exact location of the source code.

AST to bytecode (compiling)

My current HTN compiler outputs bytecode - a flattend representation of AST that lets a virtual CPU to process instructions linearly instead of jumping between AST nodes (which takes more time). Eventually I hope to replace this bytecode with native machine code making this a true compiler, but the only difference that would make in our approach is that the instruction set would map to different values and I'd also have to mark memory pages executable. However, it is here where I ran into the recursion and lifetime issues. AST compiler follows a "Visitor" pattern - it visits each node of the AST tree, maintaining some sort of state that tells it where it came from and what is is doing right now. The compiler changes itself according to the current AST node, and then recursively passes itself to the children of current node. By the time we reach compilation, we now have two lifetimes we need to keep track of - one is still that of the source code. Another is of mapping of names to IDs assigned in bytecode. If bytecode contained human-readable names we would have to look up that string in some sort of a map, find the offset of that name in virtual memory, and then resolve it. That's an extra search we don't need. So our bytecode just contains offsets to virtual memory. However that means that we need to maintain this mapping during compilation instead of execution. For particular HTN approach I also found that it is usefull if these maps outlive the compiler - and so the compiler only uses mutable references to these maps and they live for a specific lifetime. I'll refer to these lifetimes as 'code, 'maps respectively.

One feature that I wanted to have in my compiler is ability to include other files. What sounds like a simple want ended up having me writing this blog post. At some point the compiler visits the "include" node of the AST. The node contains filepath we want to load up. No problem, we load the source code of that file, pass it through the lexer and the parser and now we have a new AST tree that represent the included file. We can now pass our compiler to the new AST tree root node and...

`source_code` does not live long enough borrowed value does not live long enough
domain.rs(278, 21):`source_code` dropped here while still borrowed
domain.rs(146, 10):lifetime `'code` defined here
domain.rs(254, 47):argument requires that `source_code` is borrowed for `'code`

Ok why..? We just loaded up the source code in this function, create the AST in this function and jump to it, once we parse the AST and come out back to this function we don't use the AST anymore and we can free it, as well as the source code... right? This would be the case if we were not already parsing an "older" AST node. Rust does not attempt to solve lifetimes in any way shape or form. Once you create a variable that maps to a lifetime Rust assumes everything else in that lifetime needs to live as long as the variable that mapped to it. Except what happens during recursion?

Whenever we load the main file to be parsed, we tell Rust that all AST nodes need to live as long as the main file. Whenever we start compiling AST nodes, we also tell Rust that the compiler must live as long as the main file. Whenever the compiler finds "include" AST node and parses that node, we tell Rust that the new AST nodes must live as long as the new file source. But when we pass the compiler to the new AST nodes there's a problem - Rust already knows that the AST nodes must live for as long as the main file and we just passed it AST nodes that live only for then the "include" statement is being processed. That's what the error is about. To solve it, we need either tell Rust that the parser can outlive AST nodes, or that "include"d AST nodes don't need to live for as long as the main file's AST nodes.

To allow the compiler outlive the AST nodes we need to remove any 'code references from our compiler state. In my case I use those references to pass typing information for tasks that rely on typing. This information is in the form of references to string slices. I could potentially just create new strings from them and drop the references, but I wasn't sure if I'll need more references for other structures (my expressivity is still growing) so I decided to go with the second approach - tell Rust that the newly created AST nodes don't have to live as long as the main file's nodes. The only way to do it while also compiling the new AST nodes is to create a new compiler that lives as long as these new nodes - that's what I did. As I encounter the include statement, I load up the new file's source code, parse it, compile it as it if was the main file, then just rip the compiled bytecode from it and append it to current parser's bytecode:

    fn visit_include(&mut self, filepath:&Token) -> Result<(), Error> {
        if self.pass == 0 {
            if let Token{t:TokenData::Literal(Literal::S(filepath)),..} = filepath {
                match std::fs::read_to_string(filepath) {
                    Ok(code) => {
                        let mut errors = Vec::new();
                        let mut inc_compiler = DomainCompiler::new(self.compiler.state_mapping, 
                            self.task_mapping, self.operator_mapping, self.type_mapping, 
                            self.state_mapping);
                        while inc_compiler.pass < DomainCompiler::MAX_PASSES {
                            for result in parser::Parser::new(code.as_str()) {
                                match result {
                                    Ok(stmt) =>  {
                                        match stmt.accept(&mut inc_compiler) {
                                            Ok(()) => (),
                                            Err(e) => errors.push(e),
                                        };
                                    },
                                    Err(e) => errors.push(Error::Parser(Some(String::from(
                                        *filepath
                                    )), e)),
                                }
                            }
                            if errors.len() > 0 {
                                break;
                            }
                            inc_compiler.pass += 1;
                        }
                        if errors.len() > 0 {
                            Err(Error::FromFile(String::from(*filepath), errors))
                        } else {
                            let fqdns = std::mem::take(&mut self.fqdns);
                            inc_compiler.fqdns.extend(fqdns);
                            let t = std::mem::take(&mut self.tasks);
                            inc_compiler.tasks.extend(t);
                            self.fqdns = inc_compiler.fqdns;
                            self.tasks = inc_compiler.tasks;
                            Ok(())
                        }
                    },
                    Err(e) => Err(Error::Basic(String::from(*filepath), e.to_string())),
                }
            } else {
                Err(filepath.to_err("Expected a string literal.").into())
            }
        } else {
            Ok(())
        }
    }

Friday, August 5, 2022

Inkscape for Laser cutting (Part 1)

Recently I was asked to run an Inkscape/GIMP workshop for a local makerspace. Local makers were interested in using free tools available to make cuts and designs on the makerspace's laser cutters - glowforges. I want this entry to show what we did during the workshop and be a reminder for those who attended and maybe a tutorial for those who are reading this for the first time. Let's start with Inkscape.

Figure 1: Reference shadowbox to practice many skills.

New versions of Inkscape ask you to set some defaults to your liking. They can all be changed in the editor preferences located at Edit/Preferences top menubar. You likely already have a design idea in mind and some materials you want to use. For the purposes of the workshop I settled on one of the designs I found online show in Figure 1. The design image pointed to an Etsy store suggesting the store provided files needed to cut the box yourself, however the page was gone from Etsy by the time I tried to access it. It is also worth noting that most layers are simply cut, but the moon in the background is also etched - it has the moon image on it, not just a cut circle.In this regard the reference box presents a good challenge to dive further into the differences between cutting and etching as well as the differences in image formats.

The makerspace also provided a 6x12" board as a target material. Since the box has 4 layers I decided to concentrate on layering the images and ignore the rounded box (rounded cuts used in this box are called Living Hinges). Now that we have a design reference and material we're ready to start. Shadow boxes utilize layers of cut material so instead of making one board, we are making four. You can potentially create a 6x12" image in Inkscape, but then you won't be able to quite tell relative position of items on different layers. So instead we will create a 3x6" image, and at the end separate it to 4 tiles.

Document Properties

Figure 2: Setting document size and units

Figure 3: Creating a new grid

First thing first is to tell Inkscape what document are we creating. The document properties dialog can be found by following File/Document Properties... on the top menubar. Change the Display Units to the unit of your preference. Change the custom size values to reflect your board size - 3x6". This dialogue box has multiple tabs too - we will work with Grid tab next. It's worth noting that if you change Background color here as well as the Border color - the changes will only affect the appearance of Inkscape and are not reflective of anything in the actual image. Fields to change are highlighted in Figure 2. If you are working in Inkscape to create art you probably will want black or white background but we have a lot of freedom in here. I like to keep my background dark or to set it to a shade of wood I'm dealing with.

Back to the document properties window, find the Grid tab and click it. There you can create a New grid - Inkscape allows you to define multiple grids here. In the attached image I already have all of the values filled in. Once again changing grid units to the ones relevant to you. We will be working with inches, so that's what I set mine to be. Now it's time to set the spacing of the grid lines. Please not that Inkscape sets spacing of the minor grid lines and then you have an option to specify how many minor grid lines should a major grid line be drawn. So if you set your spacing to 1/16 and major grid line to 16, that means a major grid line will be drawn for every 1 unit. In our case that's 1 inch. Again, fields to change are highlighted in Figure 3. Pay attention to the grid alignment selection mechanism located to the bottom-left from the center of Figure 3. Our document is composed of whole parts of a unit (3x6"), the grid spaced at 1/16 of an inch, with a major grid line placed every 16 minor grid lines. That means there will be a major grid line going through each corner of the document (Figure 4) no-matter the alignment setting in Figure 3.

Figure 4: Major grid lines

If, however, you set to be grid spacing to not divide an inch into whole parts, or if you set major grid lines to be further apart, the grid will be out-of alignment and alignment setting on Figure 3 lets you choose a corner where the grid will align to.

Rectangle tool

Figure 5: Rectangle tool

Finally, we are ready to start working on our box. First thing first - notice the box has rounded edges covering the outside of the scene. Let's add a rounded rectangle to do the same! All of the Inkscape's drawing tools are located on the left menu panel. Select the one that looks like a rectangle (Figure 5). We want to have snapping on to allow us freehand drawing while also keeping all of the lines straight. Snaping tool moves around version-to-version but in general looks like a magnet icon in the top-right corner of the application. Another option is to press Shift+5 (otherwise, type % sign) to toggle snapping on/off. For now we want snapping on to draw our rounded rectangle and we want to drag it from the top-left corner of the document to the bottom-right corner. The newly drawn rectangle will appear with 3 control nodes - two square ones in top-left and bottom-right corner - those set the dimensions of the rectangle. And one circle-node in the top-right corner - that's a rounded corner control. Drag it to round the corners of the drawn rectangle.

Figure 6: Rounded rect

Select tool

On the left toolbar, above where you found the rounded rectangle tool, the top most item is the seclection tool. As the name suggests it lets you select and change varouis objects. If you select the rectangle we just created you'll be able to edit its location and size using the tool controls bar that changes dynamically depending on the selected tool. In the tool controls bar you will find both the position and size of the created rectangle. For me it says X: -0.012 Y: -0.012 W: 6.024 H:3.024. Your size might not be the same but it's likely not a whole number either. So what did we snap to? It turns out the snapping tool takes the width of the outline into account and snaps such that the snapping point (grid line in this case) goes through the middle of the outline of the shape.

Fill, Outline, Vector, and Raster

If you've dealt with images previously you'll notice that zooming in on the Inkscape graphics does not introduce the common pixelated view you are used to. That's because Inkscape does not deal with images on a pixel level. Instead it defines everything in terms of mathematical constructs that it is able to operate on. In photos, everything is a grid of pixels (colored dots) and the photo is stored as such. There is no representation of any hierarchy or properties of objects. If you take a picture of a tree, your camera doesn't know that it's a tree, it doesn't know where the tree starts and end - all it knows is that there are brown pixels here and there are green pixels here, and if the background is a sky - there are blue pixels in between the green ones. This approach is great for being able to represent any sort of image we want, but it's tough to allow editing it - the computer doesn't have a concept of a tree, so you can't resize "just" the tree, or "just" its leaves, for example.

Inkscape takes a different approach - you define your scene in terms of primitives - rectangles, circles, beizer curves - this is how a leaf looks, this is how a tree looks. You define a relationship between these primitives - leaves are attached to branches, branches are attached to a trunk. And this lets the computer easily process new complex changes you may potentially do to the scene like making the leaves on your tree bigger, without changing anything else. This is the difference between raster and vector graphics - raster graphics are a collection if pixels (color dots), vector graphics are a collection of vectors (coordinates, sizes, properties) of variaous primitives.

All vector primitives can be defined as curves (paths in SVG jargon). Shapes are just closed curves. A curve can have a color and a thickness. How a curve is drawn is dictated by its Stroke properties. In addition, if a curve is closed you can Fill it with color or a texture. Going back to our rectangle - it's a closed curve with outline being the curve's stroke, and the inside of the rect is the fill of that curve.

Laser Kerf and Stroke width

Wikipedia has a good description for why it's happening, but in the essense - there's a focusing lense on your laser cutter that's aiming at a certain distance from the laser tip. At the focal point the circle that accepts the light from the laser is the smallest. For Glowforge that's 0.2mm. However, the further away you go from the focal distance the wider that circle becomes - for the Glowforge that's 0.6mm. This value changes each time you change the height between the laser and your target material, so the only way to get the actual value is to measure - Here is a good example for how to measure your laser kerf, but for the purposes of this writeup we'll use the worst case value - 0.6mm. Back in Inkscape we can right click on our rectangle and choose Fill and Stroke... Or go to the top menubar and choose Object/Fill and Stroke... In the newly opened dialog we can select Stroke style tab and set the width of our stroke - 0.6mm. We also want to set Join and Cap to rounded. The order should use anything where stroke is done after fill. So options Fill, Stroke, Markers, Fill, Markers, Stroke, or Markers, Fill, Stroke will do. I like the first one. Back to snapping the rectangle - remember that the snapping poing is the midpoint of the stroke line - this might compilcate your exact sizing.