Submitted by throwaway957280 t3_1102t34 in MachineLearning

I'm not arguing against Python's speed when it's asynchronously launching C++ optimized kernels. I just think it's kind of wild how 50% of practical machine learning is making sure your tensor shapes are compatible and there's no static shape checking. It kind of blows my mind given the amount of Python comments I've seen of the form # [B, Z-1, Log(Q), 45] -> [B, Z, 1024] or something like that.

Plus you have the fact that the two major machine learning frameworks have both had to implement, like, meta-compilers for Python to support outputting optimized graphs. At that point it seems kinda crazy that people are still trying to retrofit Python with all these features it just wasn't meant to support.

Feel free to let me know I have no idea what I'm talking about, because I have no idea what I'm talking about.

84

Comments

You must log in or register to comment.

OptimizedGarbage t1_j86zehe wrote

People talking about "can't you just do this in X?" are seriously underestimating how difficult a problem this actually is. There's a good reason why it's hard to do, which is that this kind of static type checking requires refinement types (where typechecking is NP-hard). Basically, by including values in the types, typechecking becomes way harder -- the types contain a limited program that involves arithmetic, and sound typing requires you to prove that the dimensions add up correctly. So your type system needs to include a notion of arithmetic, except that because of Godel's incompleteness theorem, any logical system that includes integer arithmetic is undecidable. So now this is basically stepping from traditional static typechecking to something like an automated theorem prover, unless you get very careful and clever with how you set up the problems. That can mean one of two things -- either you write a ton of type annotations (like more code than the actual program) to prove to the type system that your program is valid. Or you can hook up an automated theorem prover to prove the soundness of your program automatically, at the cost of type-checking being NP-hard or worse.

This can be worth it, and there are potentially ways of making this tractable, but it's very non-trivial -- you basically need a type system that's dedicated to this problem specifically, not something that you can stick onto an existing language's type system.

That said, there are some things that try to do this. Haskell has a port of torch called HaskTorch that includes this kind of typed tensor shapes, and calls the Z3 theorem prover on the backend to solve type inference. It can get away with this because of the LiquidHaskell compiler extension, which has refinement types capable of solving this kind of typing problem, and is already pretty mature. Dex is a research language from Google that's based on Haskell and built to explore this kind of typechecking. Really you'd want to do this in Rust, because that's where the tradeoff of speed and safety for convenience makes the most sense, but rust is just barely on the edge of having a type system capable of this. You have to get really clever with the type system to make it work at all, and there's been no sustained push from any company to develop this into a mature solution. Hopefully something better comes along soon

118

throwaway957280 OP t1_j89eiz0 wrote

Thank you for the detailed answer! This is really interesting.

6

DoxxThis1 t1_j87h9gx wrote

I wonder if GPT could be leveraged to create an NLP-based type system. The programmer annotates the types in plain English, and the AI hallucinates the appropriate theorem-proving axioms! It would be an interesting "dog-fooding" of AI/ML for easier AI/ML development.

EDIT Holy cow what did I say to deserve so many downvotes? The one response below makes me think it's not such a wild idea.

−23

OptimizedGarbage t1_j87jazn wrote

So, yes and no. You really do not want to make the type annotations be in plain English, because in the Curry-Howard correspondence, the types correspond to theorems, and the code corresponds to a proof of those theorems. It's one thing if you know the theorem, but don't see the proof. You can often trust that the proof is right without seeing it. But you really need to know what the theorem is. If you start with English and generate some type behind the scenes, you don't see what the actual theorem is, but just know that the system has proved 'some theorem' about your code. As the programmer you have no idea what this actually tells you, so it kinda defeats the point of using static typing in the first place.

That said, you *can* write down a desired type and have a system write down a ton of type annotations or generate a bunch of code to prove that the type you wrote down is satisfied by your program. There's been recent work on this in deep learning for theorem proving, such as this work which uses GPT for proving theorems in Lean, a dependently type programming language and theorem prover. A better approach though would be to combine this with an actual tree search algorithm to allow a more structured search over the space of proofs, instead of trying to generate full correct proofs in one shot. Hypertree Proof Search does this, using a variant of AlphaZero to search and fine-tune the neural net. Unfortunately it hasn't been open-sourced though, and it's pretty compute intensive, so we can't use this for actual type inference yet. But yeah there's active interest in doing this kind of thing, both as a proving ground for using RL for reasoning tasks and from mathematicians for theorem-proving.

15

WokeAssBaller t1_j86xz9r wrote

Dex is the closest that comes to mind.

With how deep the python ecosystem is, and how fast LLMs are moving, the next language for ML will likely be English

32

squareOfTwo t1_j89e6u8 wrote

so I get softmax as the activation function when i write 'sigmoid' together with funky errors? No thanks I like to manually type "torch.sigmoid()"

1

met0xff t1_j87vd58 wrote

Think the problem is that we don't want a language that can only do X. Imho that's one of the big issues with the adoption of Julia. Because it just doesn't offer such a big ecosystem like Python otherwise. That's why Torch went from Lua to Python. And why Swift for Tensorflow didn't become more popular.

Because the deep learning code generally doesn't stand on its own. I once ported our inference engine to Rust to some degree by using tch-rs to call torchscript exported models. But that's only half of the game, before, between and after the networks there are lots of processing activities that were a pain without a lot of python libs. Just finding something solid like SpaCy is pretty tough in almost any other language.

I think Swift 4 TF looked awesome. But if nobody builds good platform support, tooling, plotting libraries, integration of SDKs (like AWS), experiment tracking, configuration management blah blah around it, it doesn't help much. If I look at my current work project there's pytorch as dependency that's directly related to DL and then some 100 others that are not ;).

Ok so the other option is to use an embedded language like we had it with Lua. Suddenly you have to deal with the main language, this embedded language and probably with the lower C++ and CUDA layers. Also what exactly does the embedded language cover? Just differentiable programming code? And then you got to inferface with data loader code that might have to load specific point cloud data formats, extract Mel spectrograms or pitch contours or run complex text analysis pipelines or get stuff from hdf5 or read from some exotic DB or azure or whatever.

As Jeremey Howards has been mentioned - yeah he had high hopes for Swift and then Julia but now it's back to "well, seems we're stuck with Python after all" (check Julia vs Python here https://wandb.ai/wandb_fc/gradient-dissent/reports/Jeremy-Howard-The-Simple-but-Profound-Insight-Behind-Diffusion--VmlldzozMjMxODEw)

26

mil24havoc t1_j86qv1s wrote

Well, sort of. There's CUDA and Julia and Scala, right?

12

suflaj t1_j886fei wrote

Neither of those are for ML strictly. CUDA is a GPGPU platform that uses C to compile to some GPU high level assembly, Julia is a parallel computing maths language, and Scala is your general purpose OOP/FP hybrid.

2

mil24havoc t1_j890jwb wrote

I know what they are. What are they missing natively that would make them "ML languages"? A "maths language" as you call it sounds an awful lot like an ML language to me...

5

suflaj t1_j890obw wrote

It's not that they are missing something, it's that they're too general purpose to be considered "specifically for Machine Learning", i.e. DSLs.

They're about as specifically for ML as Python, only Python is better at it because there's a bigger community and better support, meaning wider coverage of ML.

1

swappybizz t1_j881ena wrote

Ahh, you guys fight it over, just send me when it's production ready

12

SwayStar123 t1_j87sb2z wrote

Rust has a crate called dfdx which does static checking of tensor shapes I think

11

rust_dfdx t1_j8bd9bu wrote

Hah this was a nice surprise - yes the whole point of dfdx is to do just this!

2

SatoshiNotMe t1_j88fz8v wrote

I agree, some of the things that make ML code inscrutable are that (a) every tensor has a shape that you have to guess, and keep track of as it goes through the various layers, plus (b) layers or operations that you have to constantly look up how they change the tensor shapes.

I’ve settled on two best practices to mitigate these:

  1. Always include the tensor dimensions in the variable name: e.g. x_b_t_e is a tensor of shape (b,t,e), a trick I learned at a Berkeley DRL workshop many years ago.
  2. Einops all the things! https://einops.rocks/

With einops you can express ops and layers in a transparent way by how the tensor dims change. And now suddenly your code is refreshingly clear.

The Einops page gives many nice examples but here’s a quick preview. Contrast these two lines:

`

y= x.view(x.shape[0], -1) # x: (batch, 256, 19, 19)

y_b_chw = rearrange(x_b_c_h_w, b c h w -> b (c h w)’)

`

Yes a little verbose but I find this helps hugely with the two issues mentioned above. YMMV :)

5

harharveryfunny t1_j88kg27 wrote

>some of the things that make ML code inscrutable are that (a) every tensor has a shape that you have to guess, and keep track of as it goes through the various layers

That's not inherent to ML though - that's a library design choice to have tensor shape be defined at runtime vs compile time. A while back I wrote my own framework in C++ and chose to go with compile-time shapes, which as well as preventing shape errors is more in keeping with C++'s typing. For a dynamically typed language like Python maybe runtime-defined types/shapes seems a more natural choice, but still a choice nonetheless.

2

__lawless t1_j86qlmc wrote

Swift for tensorflow. Didn’t workout though

4

AsIAm t1_j88rlv3 wrote

While S4TF died, the idea (autodiff in the lang) still lives and is slowly and quietly being worked on: https://github.com/apple/swift/issues?q=is%3Aissue+is%3Aopen+autodiff

5

__lawless t1_j88tevq wrote

Very true. It’s just a shame that it did not make it. I was very looking forward to it

1

AsIAm t1_j88u9rb wrote

Me too, but it was Google project, and what Google does to its projects..?

I think relying on TF would have been a mistake. This deep integration approach will be more fruitful in the long run. Also, if anybody wants to do ML in Swift on Apple platforms, there is awesome MPS Graph.

3

jloverich t1_j86oj73 wrote

Fortran with better syntax I think would do it. They'd probably have to go the way of carbon and support legacy fortran, but change many other things quite a bit. Still it has matrix operations similar to numpy, whereas, carbon still has matrices as second class citizens... Agreed that there should be a better language for this than Python.

3

tyranids t1_j86xbn7 wrote

Imo Fortran is very capable for this and I’m surprised there isn’t more than Neural Fortran real. Nvfortran will even compile for you to offload to nvidia GPU

6

Maximum-Geologist-98 t1_j87o5t0 wrote

I’ll help make a language. Take my axe.

3

Disastrous_Elk_6375 t1_j87rwu5 wrote

As a large language model I have to caution against using sharp objects in programming languages, as it would pose a great risk to the developers unknowingly hurting themselves with it. Furthermore, it can be said that axes are typically not very sharp, and as we know blunt objects are objects that are not very sharp and also might not be extremely sharp. Sharp is a now defunct company that used to produce TV sets. A TV set is like a modern TV but it used to also be old. /s?

7

digikar t1_j887j8f wrote

I've been wishing this from the time I ran into "errors" involving numpy broadcasting along incorrect dimensions. Errors of the kind: you wanted to add a 10-length vector to a (10,10)-matrix by treating the vector as a (10,1) matrix. But instead, no one told you about such errors and you spent hours debugging only to learn that a 10-length vector is treated as a (1,10)-matrix in this particular case.

But yup, computing on dimension information to provide static autocompletion as well as dimension checks themselves seems like a huge plus.

Besides compile time checks, another feature I have wished for is to index the array dimensions by the name of the dimension rather than its axis-number.


For me, Common Lisp coupled with CLTL2 and defacto-libraries is the closest language that comes to make this a possibility. The built-in Common Lisp array types are already fairly extensive in that they actually allow specifying the per-axis dimensions, which the compiler can then use to type check. Common Lisp's SBCL compiler does this in a fair number of cases. For example -

(declaim (inline add-vectors))
(defun add-vectors (a b out)
  (declare (type (simple-array single-float (10)) a b out))
  (loop for i below 10
        do (setf (aref out i)
                 (+ (aref a i)
                    (aref b i))))
  out)

Consider the add-vectors function defined above that takes three arrays each with element type single float and a single axis of length 10, adds the first two arguments x and y element-wise and stores the result into out.

Then, if you try to compile the following function:

(defun add-to-first (x y)
  (declare (type (simple-array single-float (10)) x)
           (type (simple-array single-float (05)) y))
  (add-vectors x y x))

The compiler SBCL actually signals an error during compilation itself:

; processing (DEFUN ADD-TO-FIRST ...)

; file: /tmp/slimeBh9nFb
; in: DEFUN ADD-TO-FIRST
;     (ADD-VECTORS X Y X)
;
; note: deleting unreachable code
;
; caught WARNING:
;   Derived type of COMMON-LISP-USER::Y is
;     (VALUES (SIMPLE-ARRAY SINGLE-FLOAT (5)) &OPTIONAL),
;   conflicting with its asserted type
;     (SIMPLE-ARRAY SINGLE-FLOAT (10)).
;   See also:
;     The SBCL Manual, Node "Handling of Types"
;
; compilation unit finished
;   caught 1 WARNING condition
;   printed 1 note

But that said, Common Lisp leaves a lot many things wanting. Not only are there no parametric types, its type system also has no formal structure like the Hindley-Milner type system. There's an attempt at Coalton to bridge this and bring HM-based type checking to Common Lisp.

However, even with HM, the notion of per-axis dimensions is hard, although doable. With a fair bit of macrology over the past two years, some of us* have been able to come up with something that allows for the following:

(in-package :polymorphic-functions)

(push (lambda (x)
        (member x '(<m> <len> <type>)))
      *parametric-type-symbol-predicates*)

(defpolymorph pf-add-vectors ((a   (simple-array <type> (<len>)))
                              (b   (simple-array <type> (<len>)))
                              (out (simple-array <type> (<len>))))
    (simple-array <type> (<len>))
  (loop :for i :below <len>
        :do (setf (aref out i)
                  (+ (aref a i)
                     (aref b i))))
  out)

And then if one tries to compile the add-to-first defined above:

(defun add-to-first (x y)
  (declare (type (simple-array single-float (10)) x)
           (type (simple-array single-float (05)) y)
           (optimize safety))
  (pf-add-vectors x y x))

One gets the following compiler note:

; processing (DEFUN ADD-TO-FIRST ...)
; While compiling
;     (PF-ADD-VECTORS X Y X)
;   Following notes were encountered:
;
;     No applicable POLYMORPH discovered for polymorphic-function
;       PF-ADD-VECTORS
;     and ARGS:
;
;       (X Y X)
;
;     derived to be of TYPES:
;
;       ((SIMPLE-ARRAY SINGLE-FLOAT (10)) (SIMPLE-ARRAY SINGLE-FLOAT (5))
;        (SIMPLE-ARRAY SINGLE-FLOAT (10)))
;
;     Available Effective-Type-Lists include:
;
;       ((SIMPLE-ARRAY <TYPE> (<LEN>)) (SIMPLE-ARRAY <TYPE> (<LEN>))
;        (SIMPLE-ARRAY <TYPE> (<LEN>)))

And the following compiles successfully:

(defun add-to-first (x y)
  (declare (type (simple-array single-float (10)) x)
           (type (simple-array single-float (10)) y)
           (optimize speed))
  (pf-add-vectors x y x))

And a fair bit optimally when declared so.:

; disassembly for ADD-TO-FIRST
; Size: 149 bytes. Origin: #x53BD456C                         ; ADD-TO-FIRST
.
.
.
; 5D0: L0:   F30F104C4E01     MOVSS XMM1, [RSI+RCX*2+1]
; 5D6:       F30F10544F01     MOVSS XMM2, [RDI+RCX*2+1]
; 5DC:       F30F58D1         ADDSS XMM2, XMM1
; 5E0:       F30F11544E01     MOVSS [RSI+RCX*2+1], XMM2
.
.
.

Note that there are no parametric types in the sense of HM types. This is rather a symbol substitution and declaration-propagation strategy that is being employed here. Regardless, this is very much rudimentary, has no formal semantics, and at the current rate, I will expect it to take several years for reaching maturity. But yeah, someone with the background in (dependent) type theory and the time to implement and debug it is certainly welcome to experiment with Common Lisp and SBCL to see what might be possible :D.

3

0x00A0C0 t1_j88td3a wrote

Not really an answer to your question, but there are Python packages that try to solve the problem of tensor shapes that you mentioned, e.g. https://github.com/patrick-kidger/torchtyping or https://github.com/deepmind/tensor_annotations

3

patrickkidger t1_j8fdtx2 wrote

Heads-up that my newer jaxtyping project now exists.

Despite the name is supports both PyTorch or JAX; it is also substantially less hackish than TorchTyping! As such I recommend jaxtyping instead of TorchTyping regardless of your framework.

(jaxtyping is now widely used internally.)

3

wittfm t1_j86o0ks wrote

I haven't heard about attempts but I remember seeing in Jeremy Howard's classes about his ideas on designing one, more towards a descriptive paradigm

1

__lawless t1_j86xhtr wrote

That was swift for tensorflow. Did not pan out.

4

[deleted] t1_j888739 wrote

Not a programming language, but a database solution, called MindsDB.

>MindsDB brings machine learning into databases by employing the concept of AI Tables.
>
>AI Tables are machine learning models stored as virtual tables inside a database. They facilitate making predictions based on your data. You can perform the time series, regression, and classification predictions within your database and get the output almost instantly by querying an AI Table with simple SQL statements.

Edit: yes, I've been watching FireShip))

1

cajmorgans t1_j88d068 wrote

That’s actually pretty cool! I first red ”MongoDB” and thought, not this discussion again…

1

jerha202 t1_j88xka2 wrote

I absolutely agree with the OP. Out of the same frustration I actually ended up designing my own language and wrote a compiler for it, and now I use it for all my ML modelling. It probably only solves my particular problems and I don't expect it to be very useful for anyone else, but here goes, in case anyone is curious: https://github.com/waveworks-ai/fl

1

MrEloi t1_j89q28f wrote

PROLOG, LISP, Lua may be candidates.

1

solidavocadorock t1_j8eeikl wrote

Julia is a perfect match for scientific computations.

1

patrickkidger t1_j8fde35 wrote

On static shape checking: have a look at jaxtyping, which offers compile-time shape checks for JAX/PyTorch/etc.

(Why "JAX"typing? Because it originally only supported JAX. But it now supports other frameworks too! In particular I now recommend jaxtyping over my older "TorchTyping" project, which is pretty undesirably hacky.)

In terms of fitting this kind of stuff into a proper language: that'd be lovely. I completely agree that the extent to which we have retrofitted Python is pretty crazy!

1

sumguysr t1_j8h1p7g wrote

Yeah that's differentiable programming. Julia is probably the biggest one but there's others.

1

boadie t1_j8hhtwi wrote

In the opposite direction from your question is a very interesting project, TinyNN all implemented as close to the metal as possible and very fast: https://github.com/NVlabs/tiny-cuda-nn

Also in the vague neighbourhood of your question is the Triton compiler, while on the surface being a Python jit compiler is language coverage is much smaller than Python and you can view it as a small dsl, all the interesting bits are way below that level: https://openai.com/blog/triton/

1

Calm_Motor4162 t1_j86v280 wrote

JavaScript may work, even there is a complete book on TensorFlow.js

−4

AsIAm t1_j88rd2i wrote

While TF.js is performant and godsend, it's ugly because JS lacks operator overloading and native tensor type, so you have to do tf.add(tf.tensor1d([1,2,3]), tf.tensor1d([10,20,30])).

2

jloverich t1_j87aynr wrote

Meta is also working on shumai which is javascript/typescript and looks like pytorch.

1