WELL, I killed my laptop doing …
Killed it doing what I believe myself to be wise enough to avoid. Don’t jump to conclusions, I know what you are thinking, it must have been crypto or a menacing and unspeakable nefarious act. I should have, with a little more push back to my advisor, not killed my personal equipment as a result of a series of sequential experiments. Hold on, stay on topic, this is aside the point, and it will make sense as I get into the details. I would like to direct the attention onto Python. Particularly a dive into the world of Machine learning and those various tools within that domain.
“I underestimated Python Libraries.”
– A man from Pakistan
As I started writing this, I get a message over discord from a Friend that reads “I underestimated Python Libraries.” alongside the follow up message “Py has great libraries for we-dev, animation tools”. Libraries, python libraires are an absolute nightmare, they are complicated to install, use, and even just find out what you need.
In a conversation with another friend about the minimal things I have to do in order to have my code running: ensuring compatibility with various tools and producing reliable results. Often, the results felt sketchy, mainly due to the plethora of libraries and their diverse functions. A simple update could change the behavior of major functions or methods, complicating the situation further.
These are the dreams of coders and programmers alike, when we can just run data into a machine, and get a result reflecting our intended actions.
– Me [but probably someone else as first].
When I started my Ph.D. I wanted my projects to have good working code, it would work and didn’t require a series of crazy odd features to actually compline and run. These are the dreams of coders and programmers alike, when we can just run data and code on any machine, then get a result reflecting our intended actions. Because at the end of the day, having code that is usable by other machines will show my continued work as it grows, and maybe just maybe these practices might even help others with their projects in the future.
Now, what does this have to do with python. mind you python as a language is well, I made a post on this a while back. The tools, however, can just have a bit of an oddity, hoping everything in the requirement’s text is loadable and you can get the right version. With a quick rebase, how did running some machine learning on a computer cause you to kill it? Well good question, and it actually isn’t set in stone for that being the final nail in the coffin on a 2-year-old machine, at least from the time of this writing. Well, this isn’t the point of the writing here. But what is, software, and why I couldn’t just do things with what I had.
So, I hear you like to compute things?
“If your software is so great, it should work on everything”.
– Various [But given to Johnathan Blow]
There is an adage that goes along the lines of “if your software is so great, it should work on everything”. This is a bit of a parable of many outspoken critics of the software domain. As we live in a world, where software cannot just exist and work. We kind of had this during the early days of the ‘C’ on everything, printers, routers, the home computer; this did change overtime as thing changed; but more on that another time.
Take versioning for instance, oh if only things just worked the first time and were at peak optimization and never needed to be updated. There is another adage “you are only x number of people removed from y”, though this was related to social networks, I like to extend this picture as scenario where versions and you are only one executive decision away from an entire software suit just going away in the middle of you using it. As in the world of open sources tools or even cheap software, this is a reality many of us live with, that at any movement the full and utter removal of a tool that was holding up your entire project might just go away. The reality is python libraries are this way. So much time to make our code work and produce great results, but at some point, the various software tools and their support will scale back or fall all together.
These unexpected your world changing events just hit you like a truck and send the entire (in some cases solo) production into a rumble. Then you’re finding yourself going down rabbit holes desperately looking for a solution sone one else has already come up with. Switching franticly to different libraries and try anything that might be consider a work alternative. It’s almost as if these tools were all somehow developed on exotic machines out in the ethereal plan, constructed within the depths of a sketchy laboratory. Then comes the most tedious and absurd actions we have to talk on in this situation, containers.
Runing software on bear metal, no no no.
I have (or had) a fairly powerful machine on my hands. Given it was a, it was a laptop and not a monstrous workstation or compute cluster, but it was also my laptop. Jokes aside, this machine was more than fine for my general ownership task, gaming, browsing, and yes, even coding. It would stand on its own merit that I could just run whatever engineered solution as I come up with it and work that solution out further. With the limitation of whatever library, I haven’t made myself as the core backbone of the target goal.
Here you go about adding up the critical thinking needed…
Well, here is where things get fun. If you want to say, I want to work on a machine learning project, you’ll find yourself needing sturdy and reliable hardware to compute the complex math behind the operations. Halt, you have already made an error, you have to also worry about compatibility of your hardware, software and platform. Here you go about adding up the critical thinking needed to install the tools, learn how things are put together, then figure out what it takes to actually have running code.
The ridiculous nature of installing compatible tools
The ridiculous nature of installing compatible tools, with a focus on ML libraries, only gets more complex. As you might need to test out different versions to make sure you can actually get to your target goal. to do this you will want virtual environments ready to go ahead of time, this way you can test out updates or beta versions as they come around. After fishing up the grueling hours downloading and setting up your virtual environments for each of the various projects you keep telling yourself you are doing to get to in your own time and have your now become site reliability engineer on the side. You can now start to consider having the thought of actually running your first line of code, exempt of the debugging and testing lines.
I got us off track again, didn’t I? What does it take to kill a personal machine from just running Machine learning code, what does this have to do with crazy python libraries and virtual environments. Well, virtual environments, are unfortunately not containers. This begin, if you want to access the full range of your hardware, and you don’t have the right platform, you might as well be dead in the water before you begin.
Containerized, deployed, … failed to initialize.
Docker is a fine too for what it is. However, to the same extend that software isn’t guaranteed doesn’t work on any machine in any configuration for the various reasons (the mysterious laboratory in the middle of nowhere existed in a vacuum), is the same reason why we have to complain about containerization exist. Because if the software was so great, it should just work. But Docker exist anyways, yes docker does more than make jank code work on any machine, for having this feature built into any operating system just isn’t allowed for historical raisins. Then given a situation where say you’re on a windows machine and didn’t have time to set up a container because results were needed yesterday, you’ll be pressured to consider drastic options.
tools you’ve never heard of … are all around you.
Here is a quick aside I felt was relevant. I was peeking into a video about a few audio tools, it had got me with its kick bait title it went along the lines of “tools you’ve never heard of” but the jest of the video wasn’t about tools you should use, but software tools that either were made on esoteric legacy system so far removed from the modern zeitgeist that the video maker had to really dig but the past to find the tools for show. I brough up feelings of what we lose when not everything makes its way forward in the software space, we might not ever see a working, this doesn’t mean it was good, or we would be able to learn from it, but it could be the difference of a future someone getting inspired or a one-off question getting answered in a late-night working session.
But don’t fear traveler, for containerization is here. The symptomatic representation of what happens to software when it’s still needed but won’t just run on any machine. The epidemy of the “it works on my machine“, as for an unexplained situation rose up where everyone ended up on different versions or even sub versions of a particular piece of software. Where sole reason why we have to keep that one guys machine going till deployment, is a result of lacking the mass deployment of testing which hasn’t started just yet.
With this said, you loaded up docker. Because why not, it isn’t sexy, it takes quite a bit of time to figure out. Then if you don’t use it every day, you’ll be asking yourself, why is there a docker file and a compose file, it’s getting confusing and stressful why don’t I just leave. But then it works, or turns on at least, you go into the container, you dive into the virtual world with a virtual world, and everything is great but still scary. What does it take to navigate such a place. Quite a bit actually and it only gets more complicated the moment you get another container running.
what does this have to do with the compatibility issues of all of those libraries you keep going on about and the tools with the computer breaking.
– You just now
Turns out containers are a mistake, not the technology itself, but the cotangent reasons of why we need them. We’ll just have to live with it for now. Oh, but why you ask, what does this have to do with the compatibility issues of all of those libraries you keep going on about and the tools with the computer breaking. I’m getting there, it’s taking some time, and I’m building more of a dialog.
GPUs are for the cool kids.
All I wanted to do was run my code through Windows, on my GPU. But no, I wasn’t cool enough to get it running, I guess—at least not with my Nvidia/AMD laptop.
I never did find out if I did a bad install, but after countless attempts and many moons, I concluded that machine learning libraries like TensorFlow and PyTorch just don’t have reliable GPU support on Windows. I installed all the CUDA tools, put everything in place, and ensured my libraries were on compatible versions, but nothing worked. So, I had to return to square one.
I made my own GPU support for TensorFlow and PyTorch.
– No one, not even me.
If you want to run your own code with your own hardware and you work on a Windows based machine, you will need to build your own GPU support, why? [beep] you that’s why. For some reason, it’s as complicated as software development can get. I wish I could say, “I made my own GPU support for TensorFlow and PyTorch,” but alas, I don’t understand the underlying source code or what it takes to get it working on a GPU within the Windows environment, perhaps I’m not cool enough for that just yet. It could be a Microsoft issue [just like how they exist in general], or perhaps the developers don’t see a market for it, as most production-level tools are run on Linux-based containers.
Making my journey come to a rough and sudden stop. Spent so much time setting up the virtual environments, getting the dependencies solved, and downloading all the tools. Just to let myself believe, for a few moments, that when I hit run in a dedicated terminal within VS Code, I’ll get an impactful traceback error to some line of code that was missing some type of formatting. It always felt inevitable that I would have to containerize the experiment and treat it as some kind of ready for deployment software.
Wrap it up team.
I will lay this out now as it is a bit relevant, this note is more of those that might not understand the Machine Learning side of python. But these tools will use up whatever resources you allow it to have (in the given context of CPU/GPU support). Alos, I will make another note here, that I was limiting myself to TensorFlow and PyTorch, I know tools like JAX exist, but I lack the details enough to know if it would have made a difference.
As I discovered, running code in a GPU environment and having it work within a container takes time. This solution was out of the question for me because I didn’t have the time to make it work. Testing would require my CPU on Windows to start producing results. This led to less-than-stellar performance followed by a warming computer climate, as the tasks I was performing seemed quick only because they were simplified to a point that might not have actually doing much. Thus, my code wasn’t optimized; it was just simple and perhaps not accurate.
Resulting from this, I had rough, but working, code that wasn’t properly containerized and didn’t work on the GPU. Despite setting up Docker and installing all the necessary tools, I didn’t have the time to convert the code and get the pipeline working correctly. The focus was on producing results first and working code second.
Initially, I ran a fairly light experiment that wouldn’t have harmed even my older laptop with a tenth of the performance of my more powerful machine. However, as I continued, I kept it on the CPU, leading to an unexpected death spiral. This resulted in what I believe was heat damage to the power delivery system, causing the laptop to fail to power up. When I left the battery in and tried to power it on, the machine would heat up but never fully power on. The final nail in the coffin came when the machine powered on but showed a blank screen with idle fans. I watched it for a few minutes before pressing the power button again, hoping to force a reset. Nothing happened. I was very upset.
Why did I kill it?
Eventually, I packaged the machine in its original box and sent it to the manufacturer, like boxing up a dead pet to send to a taxidermist. In this case, I’m expecting a hefty bill to get back my hardware, which I killed. If it gets repaired, I’ll have a zombie machine. Why did I kill it? For personal gain, and it was necessary.
I believe the power delivery system failure was the main issue, as the machine entered a powered-on state and debug phase. This is likely due to potential battery issues and possible overheating. My second theory is that overheating killed the BIOS, which would explain the power-on issue.
Why do the good die young?
I didn’t push hard enough for a proper working system due to time constraints. Imagine if goals were so clear to start out, we would all just have known how to spend our time optimizing every moment to the fullest.
“that’s some good software”
– Me, one day.
To put everything together, if we just had software tools that would just work on whatever and wherever (within some reasonable specifications of hardware) regardless of platform then we can say “that’s some good software”. Till then we will be in the world of frustrating version control, decencies, esoteric ways to load up the various tools, and finding out that if we want to run the code, we’ll have to do a serious of triable dances and Latin chanting prayers in some vain hope that we’ve made the moons alight.
Till then, I’m going to get docker on an appropriate computer cluster to work properly. Then pay the dues as they come at me. So, take care of your machines, and if you are expected to run some brutal code, tell who ever needs to hear it you can’t and lack the computational power needed. If they can’t oblige then take it from me, don’t kill your hardware just to make it work.
TLDR:
This got a bit rant at times. Essentially because of the rough GPU support on the ML libraries I was pressured into running the code on my own machine when all I needed was about an extra week of set up time or even less. I didn’t focus on the set-up time, because I was aware of the constrains, and wasn’t getting access to an of the external machines. Resulting in me pushing forward with sharp aim towards an output that wasn’t very fruitful, at least for now wasn’t fruitful leading to a repair request getting made with the original manufacturer. I’ll plan or would like to make a shorter more technically focused part 2 as time permits.
I have read some excellent stuff here Definitely value bookmarking for revisiting I wonder how much effort you put to make the sort of excellent informative website