Friday, June 26, 2009

And still more 9800GT ...

As a last ditch attempt to get some credit out of my much disliked Galaxy 9800GT card, I have taken it out of the AM2 machine running 64-bit Linux and put it into a 775 machine with a single core 3.06GHz Celeron running 32-bit Windoze. This machine previously was home to a heat-pipe cooled 9400GT running on the NVidia 185.85 driver. On start-up, the driver defaulted to the VGA driver but after the hard-drive churning away for a minute or so, a restart occurred and the screen resolution came right and off it went straight into a GPUGrid work unit.
I left it running overnight but cant see any returned work units this morning. Hopefully it just hasn't done an update yet ... I'll see when I get to the office!

Update: Nope, it's trashing work units even quicker than normal ...GRRRR!

Update 2: Upgraded the NVidia drivers from 185.85 to 186.18 and BOINC from 6.6.31 to 6.6.36. So far have completed 3 SETI work units without a failure. Holding thumbs again!

Update 3: No, still trashing work units in SETI and GPUGrid ...

Monday, June 22, 2009

Admitting defeat...

Well, the 9800GT just goes from bad to worse! It hung for 19 hours this weekend on a SETI work-unit at 0% complete. Re-installing the drivers and upgrading the kernel had no effect. I am going to give up on GPUGrid as I haven't got any credit from this GPU for a while now. I am just going to run SETI when I am actually in the office as about 60% of those actually complete. To do my bit for the planet, I'll turn this machine off at night.
Guess I am out of the credit race :-(

Still having problems with QMC work-units but so are most other Linux users! I'm happy to see that their new Orca application is available as a 64-bit application ...

Friday, June 19, 2009

The 9800GT saga continues...

The number of work-units failing on my Nvidia 9800GT card have been steadily increasing to a point earlier this week where it trashed about 80 SETI work units and 8 GPUGrid units in a row and won't complete anything new. Yesterday afternoon I noticed that my display was missing some text and all the screen effects (fading windows etc) were taking ages. I turned off all the effects and the problem went away, it then occured to me that the Nvidia drivers would be used for any animation effects etc. I have now re-installed the 180.44 driver available for Ubuntu and I have already seen a couple of SETI work-units being returned as complete.

I'm hoping this improves my situation and allows me to get some credit for a change. Here's holding thumbs!!

Saturday, June 13, 2009

Future Plans ... "Junk Mountain"

I was looking for a serial port back-plane in my store-room during the week and it struck me that there are at least 3 PCs lying in there. Nothing really wrong with them, just old single core machines (Celerons/Athlons) that have been "decommissioned". There is a link on the BOINC website to a Linux distro (based on Ubuntu) from Dotsch_UX that allows one to set up a very basic OS with Boinc-client that is diskless, on a USB stick or else on a hard drive.
My plan is to investigate the most power efficient way of putting three mother boards in close proximity (no cases etc) with a single drive on a "server" to run the whole show. Running Milkyway's optimized science app, this should churn out a useful bit of credit for a relatively low KW/hr rating.

On the 9800GT front, the GPU seems to be behaving as long as I reboot the PC on a regular basis. Every second day seems to keep it happy.

I'm still getting a lot of client errors on QMC work units. Its not just from one machine, they are coming off 64-bit Linux, 32-bit Linux and 32-bit Windows. There is still nothing much being said on the QMC forums ... it can't just be me this time??

Monday, June 8, 2009

9800GT strikes again!

It has been a long weekend in NSW (Queens Birthday) and my Ubuntu 9.04 AMD X2 with the 9800GT has been on since Friday morning. Sometime on Sunday the GPU card went mindless and returned 240 SETI work-units with 60 something seconds of work and "The number of results detected exceeds the storage space allocated". It also returned 11 GPUGrid work units with a multitude of "unspecified launch failure" errors.
I thought that this machine is no longer on a UPS due to the GPU's power requirements and maybe a power glitch caused the problem. I have seen a fair number of entries on the UPS logs over weekends. There is, however, a Windows machine with a 9400GT on the same supply and it seems to be unaffected, so I think that ruins my theory.
The 240 trashed units actually moved me to go into the office on a public holiday and re-start the machine. There have been no work units returned to either SETI or GPUGrid since. I'm hoping this just means that there hasn't been a need to report completed units yet ... either that or some serious PC surgery awaits me tomorrow morning!
I'm running out of ideas to get some reliability out of this machine! It isnt overclocked, GPU seems to run around 70'C and the CPU work units seem to have no problems (which in my mind rules out HDD/RAM/Motherboard issues)

Saturday, June 6, 2009

QMC work units failing on Linux?

My new server is humming along beautifully, apart from a error in a webmin (remote web based admin) configuration module for clamav (anti-virus), all has been well this last week. I'm happy with the decision to go for Ubuntu instead of MS Small Business Server and have actually considered offering to build servers for Sydney clients as I believe there is a market out there.
The one issue that hasn't been resolved is that QMC work units crash most the time on this machine. I have installed the libc++ ver 5 library as well now and the problem still exists. The weird aspect is that this machine derives most its credit from Einstein which is also a 32-bit application (and that runs fine!). There are no QMC failures on my Windows machines. The work units that fail on the Linux machines also fail on other computers that receive them (both Windows and Linux) with "exit code 24 (0x18)" but are eventually completed after a couple of attempts. There does not seem to be much on the QMC forums that suggests other people are noticing this.
The one aspect I have noticed is that the problem is worse in the machines that are throttled via the BOINC setting. Unfortunately this is not the cause as the problem also occurs in an AMD X2 that runs at 100%. My current course of action is to install all the GLUT (OpenGL Utility Toolkit) libraries and see if that makes a difference as the QMC application does call "freeglut" according to the logs.