We’re Back!…Forum is Upgraded and Online Now!
[update17 — Mon 1600 Central time] I’m please to announce forum services have been restored following a major infrastructure upgrade and move. We are now hosted in the cloud, in a premium, enterprise facility with all new equipment. I hope you will find forum performances to be much better and will once again joining us on the CSW Forum!
I will continue to monitor performance and be doing some necessary configuration work, but all in all, I hope everyone will note the improved performance of our site.
Thanks again for all your patience. It’s a privilege to serve this hobby, and to support all of you, as you have supported CSW all these years.
[update16 — Sun 2030 Central time] We have the CSW Forum moved to it’s new enterprise-level hosting facility, running on new hardware, etc. I’m doing initial tests on a private sandbox environment and things are looking very positive. Speeds are good, but keep in mind it’s only me navigating the site so it’s not a far performance test without all our members hitting the forum. However, I’m very hopeful this decision was the right one to solve the performance issues.
The only other item I am considering is ditching our buttons for navigation, etc. and going with a text-based toolbar which would help speed up the site further. I’m going to try different configuration to see how they look, and make a determination at that point if I want to give them a try. After tomorrow, and if all things remain looking this good, I’ll begin the process of restoring forum services after we update our DNS profile with the new IP address. We are looking at Wednesday AM or Tuesday PM to have our services restored.
We are getting closer!
[update15 — Fri 0953 Central time] Search indexing has just completed but we see we have hardware issues persisting. Given that is the case, we are shutting down the Forum today for the move and upgrade that will be completely early next week. No exact ETA yet, but we hope we may be back online by Wednesday AM. I will post an update here when we are back online and hopefully everyone will experience that our performance issues have been resolved.
[update14 — Tues 1255 Central time] To ensure we have all resources on deck, we had to reschedule our Forum move for Monday, Aug. 7. I’ll have to ask everyone for their continued patience as the server will be running s-l-o-w until we move it since it’s still rebuilding the search index. I’m looking forward to next week!!!
[update13 — Tues 2030 Central time]
I appreciate everyone’s efforts and patience with the long-standing issues with the Forum. They came to a head one week ago, and we are still suffering from the results (lost subscriptions due to dbase corruption, slow forum performance as search index rebuilds, etc.).
Everyone has been so supportive of me all these years, it’s time for me to put up or shut up.
So starting tomorrow, we are taking the first step to migrate our forum service to Elliptics, who is the developers of this forum solution. They have much larger communities, and they have a much better hosting environment, so my goals are that we will have not only much better, enterprise level support, but they will actively help resolve the issues we’ve had with the forum all these years.
I’m basically putting the forum in their lap to finally have them solve it. Otherwise, I have no choice but to move quickly to find a new solution, which I understand would be extremely disruptive to everyone hear and involve an entire new interface and not contain all the discussions we have here.
SO HERE IS THE PLAN
Starting tomorrow (sometime), we are shutting down our server.
We are then going to be working to get all files to the new hosting facility. This alone could take a day, perhaps two at the most.
I’ll be working on some testing and updating our DNS IP address settings to make sure we are good to go in the new facility, and we will most likely get rid of our button toolbars we have now and go with Text-based toolbars instead to rapidly speed up the performance and loading of the site. We are going to do all we can to get this forum performing MUCH better. And again, if that is not the end result, then it’s clearly a problem with the forum software and it will be time to move on to a new solution.
So please bear with me as this is going to be a major move, along with a serious investment from my side to get the best hosting and resources around this forum community. If all goes well, I will be investing further, such as getting email subscription services added to this forum as I understand that is a popular feature.
My apologies again for all the problems we’ve experienced.
We will have another outage starting tomorrow, but my hope is that when we come out on the other end, you will notice a big difference in performance and hopefully we will be good to go from there!
[update12 — Mon 2035 Central time] There are a few issues to be aware of regarding the reboot of the CSW Forum.
- Member subscriptions were cleared as a result of the database rebuild, due to a data integrity issue. Members will need to rebuild their subscription list. You can subscribe at the by individual topic, or as a time-saver, you can subscribe to a folder which subscribes you to all topics contained in the folder. Then when you click on “Check Messages” — you can unsubscribe to topics individually that you no longer want to be subscribed to. Hopefully this will be a good tip for some of you.
- All postings appear as unread. In order to mark postings as read, we recommend from the top-level/home page of the Forum that you click on the “Mark as Read” link which appears at the end of each topic/folder name so that topics will no longer be marked as unread.
- The forum performance is very SLOW right now. Yes, this is expected. The issue is that the search index is rebuilding from scratch, going through millions of words to index. This will take 2 more days at least and the server will be performing slowly during this time.
- Upgrading our hardware. Given that the search index is taking so long to rebuild, this indicates that we need to upgrade our entire server environment — hardware and system components. We are investigating the best investment option to upgrade our environment at this time and we hope to have this resolved next week.
Thanks to everyone for their continued patience as we work to not only recover, but improve the CSW Forum.
[update11 — Sun 1940 Central time] I’m pleased/relieved to state that the CSW Forum is back online. We’ve done some configurations checking and made some changes, and now we are going to keep an eye on things. We hope things will run well, and we will be sure to respond if any hiccups occur as we are just getting back online.
[update11 — Sat 1250 Central time] We have a new ETA on when we should have the CSW Forum back online. We expect to have services restored tomorrow, Sunday evening. Please understand we may be doing ongoing troubleshooting and configuration work when the forum services get restored. I wanted to give you this update so you don’t have to bother checking if the Forum is back online before 6pm Central Sunday (tomorrow). Thank you again for your continued patience!
[update10 — Fri 1345 Central time] Initial testing completed on the test site that was created for the Forum in the developer’s own sandbox/testing environment. The results seem favorable. We are now discussing the process of importing the rebuilt Forum database back on our server hosting environment where I will need to quickly run through a series of tests again to see if everything is configured and performing properly. I don’t have a set ETA yet, but my assumption is we may be back online this weekend/late Saturday evening. Assuming all goes as hoped and there are no issues with our own server environment, such as server connections dropping as we experienced, then we may be nearing the finish line.
[update9 — Thur 2225 Central time] The database import is now complete and restored. One of the lead engineers is going to spend a full day tomorrow running the database through a battery of scripts and tests to check the integrity of the database. It is a positive sign where we are at this time, but tomorrow will be equally telling.
What did the database rebuild do that we know already?
Apparently, a lot. For one thing, part of the database rebuild is to strip deleted postings and to optimize/compact the overall data storage. The software developers confirmed the database rebuild includes the same content as where we stood when the outage occurred. The original database size was 12GB. It is now 8.35GB. There alone the size of the database has been reduced by nearly 33%.
We are not out of the woods yet. Tomorrow the developers will be heads-down running tests and letting us know where we stand and if any problems have been identified. If all goes well tomorrow, we can then move the database back to our environment and test it under the specific settings and configurations we have and run more tests to make sure that performance has not degraded.
I ask for your continued patience as we enter our next phase of troubleshooting and testing for the CSW Forum!
[update8 — Thur 0914 Central time] Oops…to clarify the statement below since some have been asking. Once we get the Forum up and running again, it will be back to business as usual. Everyone can post away and there will be no changes. What I meant by the Forum going to read-only is that will occur once we have a new Forum platform available as our new home. Only then will take the ‘old’ Forum and make it read-only so everyone will be posting on the new platform. I hope that clears things up as I didn’t want to suggest Forum services were going away!
In related news, the database rebuild continues. The database is 8.2GB in size and is proceeding without incident. It is presently rebuilding the Marketplace folder. Once that is complete, we have one more folder to rebuild — Member Services. So that means we are starting to get close to the database rebuild being complete. My guess is it will take another 24-48 hours to complete at this rate.
Thanks again for your continued patience during this outage!
[update7 — Wed 1535 Central time] On a related topic, what impact does this outage have on CSW’s future plans when it comes to the Forum?
Well, as I’m sure you can guess, this major incident has created an immediate action trigger for us to realize it’s time to consider a new forum platform. This effort is underway. The plan is to select a new platform we will move to and then work with our publishers to help them get their forum support services ready for the forum launch. Please understand that the current CSW Forum, once resurrected as we hope it will be when we complete troubleshooting, will remain online in read-only status when we move to a new platform, but we will NOT be able to migrate our current forum to the new service. You will be able to freely navigate and read any postings on our current site once we retire it, but it will be read-only and our new forum site will become our new official home. We will make sure that we provide an easy way to visit the CSW Forum to review all the content we have collected there so you can visit past conversations, but it will be time to move on to a new, much more powerful and easier-to-use platform end of the day.
This change will not be overnight. It will take several weeks to finalize a vendor selection, and then time to work with a third-party to develop the new forum site and work with all publishers before we officially launch the new site, but it’s become readily apparent that it’s become High Time ™ that we move to a newer forum technology. If all goes well after we can fix our current Forum site, that we will be able to launch a new forum service around end of year or perhaps after the New Year.
This will be a big change for many of our members, but it’s come time to make sure we are on the latest technologies with solid performance and support available. We won’t be picking a minor player in the forum sofware industry, but a well known and respective platform with many clients. This type of outage — and hopefully it’s not a fatal one and we will successfully recover the forum site as we are working around-the-clock now — this type of outage can not be tolerated or allowed to happen again. Hence, changes are ‘comin….
[update6 — Wed 1205 Central time] The database rebuild is up to 7.7 GB now and import is proceeding without incident. At this time, it’s processing the Literary Folder and processing all file attachment links.
[update5 — Tues 1012 Central time] The developer was not kidding when they estimated the database rebuild could take days and not hours. It’s still rebuilding and at 7.13GB in size…and it is still churning away. The wait continues. Below is a screenshot of the rebuild in progress on the QA server (no functional testing started yet as the database rebuild is not yet complete).
[update4 — Mon 1436 Central time] Here is an update from the vendor as we continue to work on rebuilding the forum….
The reconstruction of the latest database from scratch, using the export from the database and re-import into a new, empty database, continues to progress normally.
So far about 6.4 GB have been imported. We are monitoring to see when it’s completed. There is no way to know how much longer this will take. It could finish any moment; it could still be hours from now. That’s because in the export/import process a lot of old garbage gets removed. So the new database is likely to be considerably smaller than 12 GB. But we don’t know how much smaller.
When it finishes, we’d like to monitor it for a while to make sure connectivity is good from various browsers.
Of course since we don’t have your hundreds of thousands of attachments on our QA server, you won’t be able to confirm attachments until the database is moved back to your server. But they should all be reconnected automatically because the unique ids of the nodes remain the same.
[update3 — Mon 0850 Central time] We are in the process of a rebuild of the entire database from export-and-import into a fresh database. This process is still proceeding normally. The export/import procedure removes a lot of garbage that accumulated over time. So far we are up to 4.65 GB imported, and we are of course hoping the entire rebuild process is successful, and if so, that it may have a positive impact on the network related issues we have encountered. I hope to have an update again this evening (Monday).
[update2 — Sun 1800 Central time] Troubleshooting continues. The issue is that the IP connections for the site are dropping or being blocked, and we are not yet able to identify the source of this problem. It does not appear to be hardware related, however, since we have been testing the site on the developers own QA environment and they are experiencing the same issues on their devices as well. Attention has turned to the database, including the multiple back-ups of the database that we have. We would expect backups to work fine (as our site worked fine with them), but the same issue with connections dropping is also occurring with the database back-ups we have. Connections are dropping or being blocked until the system only allows 1 connection.
The developer is trying to run repairs on our database as they feel that is the root cause of connections dropping. I am still pressing for them to determine what network settings could be impacting our site as well. We continue to troubleshoot and I expect this to continue through tomorrow at a minimum as we work through the results.
My apologies again for this outage. We are all working to find the root cause of the problem so we can restore services.
[update1] We have been working all day with the software vendor and our hosting provider on the major outage we experienced overnight (four separate individuals have been on our case). Unfortunately, what we have encountered is still undergoing investigation. While the good news is that all data is safe and backed up, we are still not sure if it’s a hardware issue or if it is code-related/possible corruption issue.
Testing continues as we work to diagnose the issue. I am committed in investing in whatever the best solution is to resolve this, even if it means a full server replacement should it come down to a hardware problem. Whatever it takes to solve this problem, and my goal is to not only have the issue resolved but find a way to improve performance further (so that it’s worth the collective wait).
This is all going to take some time, so I ask for your continued patience. Especially if new hardware is required and needs to be ordered. I can’t project how soon we will be able to restore forum services other than stating our goal is to have services restored as soon as we can. Most likely we may require a few days to get this all resolved.