Show HN: I'm making a dynamic language in Rust

ridiculous_fish · on April 25, 2022

Very interesting. I'm curious how you solved the following interaction, which touches on both Rust's borrow checker and a classic footgun for GC'd languages:

You have a GCPtr to some cell on the heap, and you want to dereference the pointer to modify the cell. But the GC also needs to dereference the cell, e.g. to update object references after moving. So while a GCPtr is dereferenced, you must never trigger a collection, which means no allocation. If you do, you violate Rust's "only one &mut" rule, and also risk a dangling pointer. How do you enforce this?

One way you could enforce this is to require a shared reference to the GC to dereference any GCPtr. Instead of `cell.count += 1` you would write `cell.deref(gc).count += 1`. This works but is verbose.

Another way you could enforce it is dynamically, by setting a value in a cell whenever it is dereferenced; but this incurs a runtime cost.

How does Sphinx solve this?

celeritascelery · on April 25, 2022

Not OP, but I recently wrote about solving this problem in my interpreter[1]. Essentially using a combination of LCell (compile time interior mutability) and taking a shared reference to the GC as you suggest.

[1] https://coredumped.dev/2022/04/11/implementing-a-safe-garbag...

harpiaharpyja · on April 26, 2022

It works for Sphinx because the scope of the GC is restricted to Sphinx code. The Sphinx language runtime itself is typical Rust code - no GC used. So it is fairly straightforward to ensure that GCPtrs are never dereferenced during a collection cycle. Essentially, collection only ever happens between instructions.

If we were writing a GC for general Rust code then this and other issues around ensuring values are rooted would become much bigger problems. For example, the rust-gc crate has to deal with these. Just to be safe I took a page out of rust-gc's book and implemented a guard to ensure that the runtime will at least panic if I make a mistake and that happens.

nu11ptr · on April 25, 2022

> This was inspired by the flexstr crate

I'm honored my string crate was able to inspire. Kinda surprised to see it mentioned as it isn't all that popular (yet? I hope...)

rockwotj · on April 25, 2022

Another programming language that is implemented in Rust is Gleam[1]. I think it's a really slick language, but it doesn't implement GC or it's own VM but is more of a source to source transpiler.

[1]: https://github.com/gleam-lang/gleam

alilleybrinker · on April 25, 2022

For a longer list of languages written in Rust, see https://github.com/alilleybrinker/langs-in-rust

IshKebab · on April 25, 2022

Rhai and Gluon are the other two big ones. I've used Rhai for a configuration system and it was pretty good. Very very easy to integrate. Unfortunately it is also dynamically typed and doesn't support type annotations yet, but I reckon they'll add that eventually.

Gluon is statically typed but it's also functional and a bit weird.

cercatrova · on April 25, 2022

So Gleam moved to Rust, interesting. I remember it was supposed to be a type safe language for Erlang, because while Elixir exists, it's not type safe, one of the main reasons I didn't start using Elixir, much as I like the Erlang philosophy.

mkishi · on April 25, 2022

It still targets Erlang—but the compiler's written in Rust (from the first v0.1 release in 2019).

the_duke · on April 25, 2022

It also targets JavaScript now, and maybe eventually Webassembly.

rockwotj · on April 25, 2022

Tracking issue for native code is here: https://github.com/gleam-lang/gleam/issues/109 unlikely they will directly compile to WASM.

A similar language to gleam built on WASM is grain: https://github.com/grain-lang/grain

SaulJLH · on April 25, 2022

Is there a good guide somewhere, that explains all the different "types" of languages? When I read dynamic/static etc, I often have NFI what that means, something better than Wikipedia, perhaps.

shepherdjerred · on April 25, 2022

There’s two main attributes of languages as far as typing goes:

Static vs dynamic: when a variable is declare does the compiler know and enforce the type? e.g. can I declare a variable as a string and then assign it an int to it? If you can then you have a dynamic language. If you can’t then you have a static language. Examples of static are Java, Rust, and C++. Examples of dynamic are Python, JavaScript, and Ruby.

The other attribute is weak vs strong typing. Weakly typed languages will coerce types where appropriate. for example if I try to compare “1” with 1 a runtime error occur for a strongly typed language, a weakly typed language will coerce “1” -> 1 and return “true”. Examples of strongly typed languages are Ruby and Java. Examples of weakly typed languages are JavaScript.

Mostly statically typed languages are strongly typed.

My definitions probably aren’t rigorous, but I think they’re good enough.

alexobenauer · on April 25, 2022

One way to get at this: making your own language will show you not only what each thing means, but also what the underlying mechanics are!

Crafting Interpreters is a great guide on this: http://www.craftinginterpreters.com

capableweb · on April 25, 2022

> One way to get at this: making your own language will show you not only what each thing means, but also what the underlying mechanics are!

True, but if everyone who asked any question about something had to implement those things to even understand the basics of them, we'd never get anything done :)

eatonphil · on April 25, 2022

For what it's worth, most CS degrees do involve building a programming language.

You don't need a degree to be a developer (I don't have one) but I think it's a good indication that it is actually reasonable to think that every developer could implement a programming language. (And I mean developer as in someone who writes code professionally as their main job, not necessarily sysadmins or designers or whatnot).

To reiterate, I think many more people than think they can can develop their own language.

harpiaharpyja · on April 26, 2022

It's a big topic. Not only static/dynamic, but also things like structural vs nominal typing.

In this case, it comes down to whether the compiler tracks the "type" of every value when compiling (static typing) or whether types are stored somewhere in memory that is read from when your program is actually running.

hcta · on April 25, 2022

> something better than Wikipedia, perhaps

You can probably ask a large language model like GPT-N to generate a summary.

asgeir · on April 25, 2022

You might be interested in these articles if you want to experiment with alternative approaches to the intrusive linked-list and are OK with using object pools. Very handy if you want to have cheap alloc/free and want good memory locality for updating the objects.

https://slideplayer.com/slide/4470263/

https://gamesfromwithin.com/managing-data-relationships

harpiaharpyja · on April 26, 2022

Thanks! I started with an intrusive linked list because I just wanted to get something working but replacing it with something else has been on my mind for the same reasons you mention.

saghm · on April 25, 2022

> (Yes, the name might not be the best, being also used by a well-known ReST docs generator, I'll take suggestions. I do like the name though, both as a reference to the mythological creature and the cat :D)

Maybe another cat and mythology related name? If you want to stick with the Egyptian theme, maybe something do to with Bastet: https://en.wikipedia.org/wiki/Bastet

harpiaharpyja · on April 26, 2022

Thanks!

harpiaharpyja · on April 24, 2022

Actual clickable link to Github: https://github.com/mwerezak/sphinx-lang

heavyset_go · on April 25, 2022

Did you encounter any difficulties writing this with safe Rust? When or where did you find it necessary to use unsafe code in your implementation?

speed_spread · on April 25, 2022

The only situations that _require_ unsafe are FFI (talking with C libs) and bare-metal access (assembler, embedded, drivers, etc.). An interpreter is totally in the clear on these subjects and should not have anyone reaching for unsafe.

Of course, unsafe _can_ also be used to implement code that takes shortcuts beyond what the borrow checker may be able to handle. But if you're building an interpreter, speed is probably not a primary concern anyway.

celeritascelery · on April 25, 2022

> An interpreter is totally in the clear on these subjects and should not have anyone reaching for unsafe.

This is true for a toy interpreter. You don’t “need” unsafe. But anything other then a toy interpreter has many reasons to reach for unsafe: garbage collection, flexstr, object dispatch, tagged pointers, etc. Not to mention the performance gains from using unchecked functions in hot sections of the interpreter loop.

speed_spread · on April 25, 2022

The arbitrary difference between a "toy" and a "tool" is the completeness of the proposed solution. Correctness, ease of use and documentation concerns could come well before execution speed.

I agree that writing an interpreter for an existing language (e.g. Python) one would want to match the performance of existing interpreters and thus would need to use the techniques you mention.

heavyset_go · on April 25, 2022

I ask because many implementations of interpreters in Rust that I've seen will reach for unsafe for aspects of the systems that might be inefficient or awkward to implement in safe Rust. I'm curious if or where the author felt the need to do the same or not.

celeritascelery · on April 25, 2022

> Statically dispatched type object model using a newtype wrapper and Rust's declarative macros. Ok, what that means is that I have a MetaObject trait that I can use to easily add new data types and define the behavior for specific types.

If I understand correctly, this approach is try and address the expression problem[1]. It makes it easier to define new types for `Variant` (everything is defined in one impl block) at the cost of making it harder to add new `MetaObject` functions (you need to update every applicable impl block).

Also static dispatch seems like the wrong word here, because you are still doing runtime dispatch on `Variant`, even if not using `dyn` Objects. Static dispatch typically refers to monomorphization.

[1]https://craftinginterpreters.com/representing-code.html#the-...

harpiaharpyja · on April 26, 2022

My bad about the incorrect terminology. I figured if it's not dynamic dispatch it must be static but I probably should have done my homework before writing up a description.

chaosprint · on April 25, 2022

Interesting project. Have you considered making an REPL using WASM in browsers as a playground?

I am developing this music live coding language and audio library with Rust and it runs in browsers:

https://glicol.org

I am now using Rhai.rs as the embedding language to write `meta' node in the audio graph. But for audio programming, the running time for each block should not exceed 3ms. In some cases, I found Rhai quite struggling with that. Perhaps dynamic languages have an inherit limitation on that? Wondering how do you see this issue and will performance be part of the future consideration of sphinx-lang?

harpiaharpyja · on April 26, 2022

That would be really cool. I don't know very much about WASM but this seems like a great excuse to learn.

jiyinyiyong · on April 27, 2022

Rust really has a great support for WASM. I made a tiny language as well, and managed to build targeting WASM in a few steps https://github.com/calcit-lang/wasi-calcit/blob/main/.github... .

Tips to notice is WASM runs inside a sandbox, which means threads, random numbers generator(not sure for WASI), FFIs, etc. have to be moved out of the core to prevent them being compiled to WASM, which would probably fail. For major part of the code, they can be compiled to WASM and WASI(WASM System Interface) very easily.

smitty1e · on April 25, 2022

> I'll take suggestions.

Phixns (fixins), "the dyslexic sphinx".

pwdisswordfish9 · on April 25, 2022

> nonlocal

Oh please, no. This is probably the worst feature of Python.

harpiaharpyja · on April 26, 2022

I'd love to improve this, but without knowing what makes it a misfeature in Python I don't have any guidance.

Maybe the choice of keyword is bad because of the association but it's also not quite the same as in Python. You don't have to "nonlocal x" up front in your function to access a nonlocal variable, which is a pretty huge difference IMO.

My purpose for "nonlocal" was to have a visual highlight of whenever a nonlocal variable is getting modified, because that's the exact point where assignment becomes a side-effect. So you only ever use it when assigning.

AlchemistCamp · on April 25, 2022

Do you have any kind of end goal with the project or are you just winging it and seeing where the language ends up?

harpiaharpyja · on April 26, 2022

Winging it :D

AlchemistCamp · on April 29, 2022

respect!

ahmed_ds · on April 25, 2022

It is your project but naming conflicts are super frustrating. Please explore the suggestions for something that is a cat and a mythological creature -

- Bastet (Bast) - The beautiful goddess of cats, women's secrets, childbirth, fertility, and protector of the hearth and home from evil or misfortune.

- Mau - The divine cat who, in some stories, is present at the dawn of creation as an aspect of Ra.

https://www.worldhistory.org/article/885/egyptian-gods---the...

https://en.wikipedia.org/wiki/Cultural_depictions_of_cats

https://moderncat.com/articles/cats-mythology/#:~:text=What%....

webmaven · on April 25, 2022

+1 vote for Mau, since many cats make that sound. (=^･ｪ･^=)

asicsp · on April 25, 2022

Mau reminds me of this markdown-like implementation for ebooks: https://www.thedigitalcatonline.com/blog/2021/02/22/mau-a-li...

Cristan · on April 25, 2022

What about Harpy? It's another mythological creature and also part of your handle.

harpiaharpyja · on April 26, 2022

Thanks, I'll make it more of a priority to explore an alternate name. Someone else also suggested Bastet so its on my mind.

Mau is cool too but for a lot of people I'd suspect it would be a meaningless word.

adontz · on April 25, 2022

Goals are unclear. LuaJit fits definition of current goals. Or just want to [ab]use Rust?

P.S. I know both Lua and Python and you are making syntax intentionally confusing. Do either "fun name() {}" or "def name():". Make it familiar.