Monday, December 21, 2015

Rust is proving very difficult to learn

Once upon a time, I wrote and shipped C code used by millions of users.

I now write Java code professionally and have shipped code to even more than that number of users. (In fact, in the almost ancient past, I worked on Blogger itself but I've gotten way better at Java since then.)

However, Rust is proving kind of difficult for me to learn. See my previous post where I did write a pretty simple Rust program, to only print unique lines from stdin. This "unique" program is my "hello world" as it is a useful real world program.

Since I have lots of experience with writing compilers (back in my C code days and at "university"), I thought it would be interesting to create an interpreted language using Rust as the problem itself is very well known to me and thus writing an interpreter in Rust would give me a chance to write a real Rust program where I could concentrate on learning the Rust language and not so much about learning a new problem domain.

So, I've spent maybe 3-4 hours writing a "tokenizer" and now I'm trying to write a parser (a recursive decent parser) in Rust. These additional 2 hours have not gotten me very far - basically nothing to show for it except questions!

Whereas the tokenizer went ok (mostly because I found this: https://github.com/servo/rust-cssparser/blob/master/src/tokenizer.rs), the parser is proving much more difficult!

What I'm going to do now is some rubber duck debugging and let you guys see how this works out. (See https://en.wikipedia.org/wiki/Rubber_duck_debugging ).


Question 1

Why do I need to use Token::WhiteSpace instead of just WhiteSpace in a match statement (for an enum) like the code I was looking at?

I've now answered this question thanks to the rubber duck: I was simply missing this critical construct:

use self::Token::*;

What's kind of funny in Java, when you switch on an enum, you are forced to use the raw enum name and can't even fully qualify it if you wanted to, and here I was expecting the language with better "inference-ing" to do this on my behalf (because an example or two looked like it did this). This isn't a major criticism, though certainly a minor one. (Rust seems to use scope in lots of ways to make stuff less ambiguous and this probably fits well into how the language designers thought of this - just more unusual coming from Java.).

By the way, that use statement at first was a disaster! I got a whole bunch of errors I didn't have before when using it.

Hmm, why:

Before:

#[derive(PartialEq, Debug, Clone)]
pub enum Token {
    WhiteSpace(Box<String>),
    Comment(Box<String>),
    Delimiter(Box<String>),
    Symbol(Box<String>),
    String(Box<String>),
    Word(Box<String>),
    Error(Box<String>),
}

After:

use self::Token::*;

#[derive(PartialEq, Debug, Clone)]
pub enum Token {
    WhiteSpace(Box<String>),
    Comment(Box<String>),
    Delimiter(Box<String>),
    Symbol(Box<String>),
    String(Box<String>), // Doh! Look at the name of the token!
    Word(Box<String>),
    Error(Box<String>),
}

Obviously I should rename the enum value String to say StringLiteral so that the standard String class doesn't get confused with my enum value.

It seems like "technically" I didn't need to make this change.

Fixed!


Question 2

I was actually hoping that fixing that would make a world of difference, but it really didn't.

My current compiler error is:

main.rs:19:15: 19:20 error: cannot move out of borrowed content
main.rs:19         match *self {
                         ^~~~~
main.rs:26:25: 26:26 note: attempting to move value to here
main.rs:26             Token::Word(x) => *x == keyword
                                   ^
main.rs:26:25: 26:26 help: to prevent the move, use `ref x` or `ref mut x` to capture value by reference
error: aborting due to previous error
Could not compile `xyz`.

I'm sort of wondering if this isn't part of the issue:

https://doc.rust-lang.org/book/box-syntax-and-patterns.html

Specifically, maybe using #![feature(box_syntax, box_patterns)] would help?

Here's what I am trying to get working:

impl Token {
    pub fn is_keyword(&self, keyword: String) -> bool {
        match *self {
            WhiteSpace(_) => false,
            Comment(_) => false,
            Delimiter(_) => false ,
            Symbol(_) => false,
            StringLiteral(_) => false,
            Error(_) => false,
            Word(x) => *x == keyword
        }
    }
}

Let me put this in "high level terms" -- I want a convenient way to see if a particular token is a word that happens to match the passed in String.


Question 2 Interlude

BTW, there is another more radical option: I could actually add lots of enum values to Token - one for each keyword I expect to see:

#[derive(PartialEq, Debug, Clone)]
pub enum Token {
    WhiteSpace(Box<String>),
    Comment(Box<String>),
    Delimiter(Box<String>),
    Symbol(Box<String>),
    String(Box<String>),
    KeywordIf(),             // At tokenization time, I could recognize  
    KeywordElse(),           // each keyword and add it's value here
    KeywordSet(),            //
    Word(Box<String>),
    Error(Box<String>),
}

This would potentially make them far easier to match against later, may make them smaller to store, etc. Maybe even this is the ideal representation - I obviously don't know yet - I tend to favor the least number of "things" at first. In modeling this hypothetical language, I actually only want a keyword to matter if it is the first token in a "command", so "if" should be left as a simple Word sometimes, hence my original model seems right.


Question 2 continued...

In Java, what I really want is something like this:

if (x instanceof Word) && ((Word) x).getString().equals("if"))) {
  // blah blah blah
}

This is obviously "runtime" Java since I'm using "instanceof".

In Rust, I'm expecting the enum Token to act something like a "tagged union". 

Oh, looks like this compilies:

impl Token {
    pub fn is_keyword(self, keyword: String) -> bool {
        match self {
            WhiteSpace(_) => false,
            Comment(_) => false,
            Delimiter(_) => false ,
            Symbol(_) => false,
            StringLiteral(_) => false,
            Error(_) => false,
            Word(x) => x == Box::new(keyword)
        }
    }
}

In this case, maybe I'm forcing the keyword string onto the heap even though that is kind of the opposite of what I would expect to have to do...

BTW, is == even the right thing to use here? I don't know. Let's find out.

I basically did this:

    for token in tokens {
        println!("Token {:?}", token);
        println!("Function is export {:?}", token.is_keyword(
            "export".to_string()));
        }
        }

So this works great "functionally" - my program compiles and gives the correct result. I'm not really all the concerned about the performance impact just yet, I'm more worried about how badly I know Rust right now.


Where I should go next?

(Rubber ducky), where should I go next?
  1. keep with it! - le'ts create some parse trees and learn more lessons along the way -- the possible performance penalty of the extra boxing for the current solu-kludgetion I finally found won't probably matter for what I'm after anyways and is therefore second order - using my interpreted language!
  2. switch to another language - I can re-evaluate the reasons I was using Rust in the first place - like stand-alone executables - and choose another language.
    1. Swift is open source and available on Linux now... It's suddenly a new choice. 
    2. C was rejected because I don't understand it's unicode story and other stories but I like C alot in many ways
    3. C++ was rejected because it was too complex even if I could get a better unicode story by using say an external library,
    4. Go, and other garbage collected languages were rejected because I don't understand their "fork" story though maybe with modern posix I won't need to actually fork
    5. Go doesn't have sexy enums like Rust (nor traits, nor generic collections) which maybe I don't really need anyways
    6. Most other languages were rejected because they don't have a good stories about stand-alone executables and utf8. There are some Scheme systems that will compile to C but they don't come with great utf8 stories (yet?)
  3. BTW, I would probably convert to Javascript if there was a stand-alone executable generator from the Node JS guys - and that supported all the node process stuff - even if the default executable were a bit on the heavy side.
So, I guess let's keep up the spirit of pushing me to learn Rust more. I'll just pretend I'm playing an advanced version of Sudoku. At the very least, this will help me better understand the good and bad parts of Rust and help others make similar decisions.

BTW, I think I need to learn how to write unit tests in Rust that don't compile into the main binary - I see "annotations" on functions that say they are tests, so this is already thought out - probably something "cargo" can do once I use the Google or Bing search engines some more.







Saturday, December 5, 2015

Wrote my first Rust Program

Thought I would share my first Rust program, along with some thoughts about the experience.

This is actually a version of a Python script that I actually use about once a month. The goal of the program is to read and print out all of the lines from stdin, but only print out the first occurrence of each line so that you have only the unique lines.

For example, given a file that looks like this:

line with some stuff
line 0
line 1
line 1
line 0
line 2

The program should just print:

line with some stuff
line 0
line 1
line 2

Note, there is a standard Unix utility called "uniq" that does something partially similar, but only filters out duplicate lines when they are adjacent. uniq is typically use in conjuction with sort, which destroys the original order of the lines, so with this example, "line with some stuff" would end up at the bottom instead of at the top.

Here is the "finished" program:

use std::io;
use std::io::prelude::*;
use std::collections::HashSet;

fn main() {
    let mut seen = HashSet::new();
    let stdin = io::stdin();
    for line in stdin.lock().lines() {
        let line = line.unwrap();
        // For "fun", try removing the & to see what the compiler error message is
        if !seen.contains(&line) {
            // For "fun", try swapping these two statements to get a compile error
            println!("{}", line);
            seen.insert(line);
        }
    }
}

The resulting binary size for a "release" build (x86-64) is about 526K. Presumably this binary has no external dependencies so it would be easy to move to other Linux boxes with the same architecture.

Here are some thoughts on this experience:
  1. my most import "high level" observation I have is that type inference is bad for understanding what you "cargo culted" (this is different use of the word cargo from the cargo tool that comes with Rust!). I didn't realize that stdin was actually protected by a "mutex" which now explains what the lock() does. For the statement: "let line = line.unwrap();", I still don't understand! What was the type before and after the unwrap() call?
  2. installation of the rust compiler on Linux Mint 17.2 was trivial using the provided instructions from rust-lang.org
  3. the cargo build command seems to follow the route of other new languages like Go in being "Java" like in not requiring writing a Makefile (even though compiling a single rust file seems pretty trivial too)
  4. I got at least one error message while trying to write this which wasn't "GNU" enough for Emacs's compilation mode to parse correctly (some kind of macro error looked like a file:line:column but file was <asdf>.x.y or whatever)
  5. Rust is fairly new but has been in use for a while by the kind of people that use Stack Overflow so there are some "non current" answers out there which slowed me down a bit
  6. Compared to trying to use Vala, I would say this was a positive experience tooling wise
  7. Compared to trying to use Go, I would say this experience was pretty similar to writing the same program in Go
  8. Compared to C, I would still be writing my own HashSet! (more likely I would have written some Bloom filter like code with dynamic chaining to larger Bloom filters as the load increased just be manly )
  9. It would be interesting to write the same program in Swift now that Swift is open source

Why are you learning Rust?

This is a great question. An obvious short answer is that I a programming language enthusiast and learning new languages can help shape how I go about my day job and how I think about programming. Honestly though, usually I spend more time reading about alternative languages than actually sitting down to write code in them so Rust has me intrigued enough to go the extra step.

The real answer is that I would like to write my own "shell" (shells are programs like bash, zsh, etc.) and using a non-VM oriented language seems paramount so I can have a minimal binary. Of course, I definitely want strong support for unicode (especially utf-8) in any language I touch these days which Rust seems to have.

Suggested exercises to help you learn Rust

If you are writing your first Rust program that isn't "hello world", I put some comments into the sample so you can see two real error messages I got while trying to "cargo cult" my way towards this program (no pun intended).

Disclaimer

These are my opinions and not the opinions of more employer, etc.




Sunday, September 14, 2014

Played around a little with the Vala programming language

Vala is a C# like programming language that compiles to simple C code using GObject and then obviously that gets compiled to assembly language with a C compiler. It's actually several years old but I must have passed over looking at it because I am a GC snob and Vala uses reference counting (like Objective-C and now Swift).

My first program was a tool to read from stdin and write to stdout removing duplicate lines.  Here's the code:

using Gee; // a collections library (written in Vala!)
int main() {
    var seen = new HashSet<string>();
    while (true) {
        string line = stdin.read_line();
        if (line == null) {
            break;
        }
        if (!seen.contains(line)) {
            seen.add(line);
            stdout.printf("%s\n", line);
        }
    }
    return 0;
}
(I'm probably missing a question mark in the declaration of line but it still built my binary (though I got some other warning from the underlying C compiler that can apparently be ignored). I'm formatting things a little more Java like than most of the Vala samples I've seen (they put a space between the method name and the parens around the arguments).

What's pretty cool is this compiles into a very small binary: 13,472 BYTES! I haven't done a speed test except for pretty small test files but there was virtually no "start up" costs, a complaint often made of Java (and possibly other VM systems like Mono).

Another advantage Vala should have is that although you still need a "vapi" it looks like Vala can call C code without a big impedance mismatch (which should make it high performance for certain tasks).

I wrote another utility to find numbers in each line and append a comment to each line that that would print out the corresponding date if those numbers are millis since the epoch. This was also small and fairly simple though I did run into a reference counting related quirk:

This won't work:
var builder = new StringBuilder().append("xyz");
Because of reference counting and owned versus unowned type system stuff, this needed to be:
var builder = new StringBuilder();builder.append("xyz");
(But then method chaining should work fine).

Working my way through more little utilities, a place I suspect Vala could really shine, I hit what is probably a documentation versus version issue trying to create threads. I'm using Linux Mint 17 and don't appear to have the latest glib or something. I figured maybe I could get stuff working with an IDE to tell me what methods are really there, not what the Vala documentation says so I grabbed monodevelop which again, versioning is not helping me here. I got version 4.0 in Linux Mint 17 but that doesn't support Vala yet. So, then I tried to download monodevelop from git and build it myself. I bet you can't guess what happened when I tried to run config? You guessed it, another versioning problem, this time glib-sharp needs to be at least 2.12 but I guess I only have 2.0 (though I see a 3.0 version available and I installed that but autoconf doesn't seem to be picking it up).

OK, I really don't do much Linux development (my day job is hacking Java code). Maybe this is kind of normal for linux development?

Vala isn't at version 1.0 yet (just .25 released on Sept 1, 2014) but since apparently Elementary OS uses it, I was hoping to give it a more serious look. Some non trivial apps have been written in it.

Summary

Naturally, Vala needs a stable 1.0 release that is included in popular distros, the documentation is not terrific right now and the website is ugly. There is some tutorial type documentation but they don't have many advanced examples to learn from yet. And of course, they need a great language specification and those are pretty difficult to create even if the compiler itself and the concepts are sound. These are real criticisms but they are definitely ones that can be solved especially if Vala's popularity grows and more people become involved.

I really like where Vala seems to be headed and look forward to getting an environment set up so I can play with it more. I wouldn't bother writing this blog post if I thought Vala is a waste of time or just another stupid language. It honestly seems like Vala could be a great practical niche language that could grow in popularity. My brain has been warped by Java but the ability to write some "C code" in a Java-ish way is appealing to me.

I'll probably look a little more at mono and C# under linux too. If you're writing a sufficiently complex program, run time environments like the JVM or Mono aren't really that big a deal.



Saturday, August 30, 2014

Complex prompts in bash

I've been using a a bash script to generate my prompts for quite a while though I didn't realize that it wasn't working in normal shells (anything besides Emacs).

The flawed setup is as follows:

.bashrc
export PROMPT_COMMAND="my_custom_bash_prompt_script"
export PS1=""

This setup works fine from within an Emacs shell but from a "normal" shell, when I did an up arrow, the prompt itself would disappear!

The corrected setup is very similar:

.bashrc
function bash_command_function {
  PS1=`my_custom_bash_prompt_script`
}
export PROMPT_COMMAND="bash_prompt_script"

Once I realized that I needed a bash function anyways (because you can't export something from a subshell to the super shell), I just inlined my rather simple script.

Keep in mind, if my_custom_bash_prompt_script ever returns anything with a % character in it, bash may interpret it, so you will need to escape the %s.

Friday, December 30, 2011

Sous-vide gyros

I tried to make gyros today - 1 lbs ground lamb and .75 lbs ground pork, small onion plus basic spices in the mortar (marojam powder, rosemary, garlic powder, tiny piece of clove, a hint of paprika, oregano, greek seasoning (oregano, msg, etc.), juice from one lemon - I forgot the ground black pepper...). Processed quite a bit in my food processor to change the texture.

The first use was just to fry some of the mixture in the frying pan - it came out OK. I didn't have a wrap to put it in so I just ate it with store made tzatziki (which was actually pretty good). The texture was OK but lacking in flavor (not enough fat in the meat is at least part of the problem).

Since I was already sous-vide'ing a Jamacian skirt steak at 131F (trying for a much longer cooking time to see if it will help break down the tasty skirt steak), I came up with a plan to sous-vide some of the mixture for my second course. This turned out pretty interesting. If you put a scoop of the mixture in a plastic bag, and then make a .25" flat "patty" about the thickness of gyros meat but filling up the entire bad. After cooking for 45 minutes, it retained it's shape and I cut it into 2x5" strips. I finished those strips on a really hot grill for a minute a side.

Flavor wise, no improvement but it was pretty interesting that I was able to get the shape to be like a sub-shop gyro.

I put a couple of bags of the mixture into bags as before and threw them into the freezer - figure I'll have gyros later in the week (they just need a little more time in the water bath to defrost).

Next time I will try a 80/20 beef plus lamb mixture and more spices (especially garlic and maybe allspice and other ingredients I saw in some of the recipes).

Sous-vide eggs

I tried my first sous-vide eggs yesterday - 144F for 45 minutes - crumbled seasoned seaweed and a little soy sauce. The whites were a little under-cooked, but the yolks seemed about right. I think next time I'll try 146 for 35 minutes and then cheese and a buttered English muffin.

Wednesday, October 19, 2011

How Lisp Programmers View the World




I couldn't help but improve the graph from someone's Haskell post. http://neugierig.org/software/blog/2011/10/why-not-haskell.html