Dump all your data in a data lake, throw a thousand GPUs in there too, and you’ll have your answer in 2 hours max.
Thanks, I’ll tell the client it’ll be ready in 2h.
Mr. Manager calling from a group
Hi everyone, I know we’re all busy, but I just wanted to align on this 2 hr estimate. Can we put our heads together and do this faster somehow?
No, but since we wasted an hour talking about it, the estimate is now 3 hours. Do you want to have more meetings about it?
Two thousand GPUs
And the data they want is the entire FY, is 3,000,000 records and they need every single data attribute making the file like 250 MBs. Then you put it in their SharePoint and they get mad they can’t just view it in the browser despite the giant “This file is too large to view online, download it” message.
“Just email it to me!”
Newspaper: Hackers are announcing a trove of personal data leaked from [company] after a forwarded spreadsheet inadvertently contained more data than the sender realised.
Hey! I just started looking at SQL and this is the first SQL joke I’ve ever seen or at least ever gotten!
So, congratulations me!
Welcome! Please complete your setup by placing this on your wall: https://xkcd.com/327
Lolz got that one too
Same feel as “how long is this going to take to pull?” Well I don’t know if part of what you’re asking for exists, how clean it is, and if can join the data you’re talking about, so anywhere from 5 minutes to never?
deleted by creator
deleted by creator
deleted by creator
Especially sick of the users who ask for the same data over and over again.
Use something like Apache Airflow to automate it :)
If it’s regular, I recommend
cron
+mailutils
. Have the cron job call a script with a variable sleep in it if you want to make it look more manual.
Man I don’t regret leaving this behind at my last job. You start out by doing someone a one-off like “sure I can pull the top 5 promotional GICs broken down by region for your blog article - I love supporting my co-workers!”
Then requests become increasingly esoteric and arcane, and insistent.
You try to build a simple FE to expose the data for them, but you can’t get the time approved so you either have to do it with OT or good ol’ time theft, and even then there’s no replacement for just writing SQL, so you’ll always be their silver bullet.
At that point you teach them how to do it themselves. Isn’t there a way to give them an account that only has read access so they can’t inadvertently screw up the database?
I like that idea, and it actually did work for our Marketing guy (Salesforce has a kind of SQL). Near the end there, I just had to debug a few of his harder errors, or double check a script that was going to be running on production.
Never thought of it for Postres or Mysql, etc, but I suppose there’s got to be an easy enough way to get someone access
phpmysqladmin 😆
In Oracle you’d just set up a user that has limited access and give them those credentials. Creating a few views that pulls in the data they want is a bonus.
At work, I am currently dealing with a table that has no primary key, no foreign key, duplicate (almost) serial numbers, booleans stored as strings, and so on. It’s a nightmare of a table.
Entity framework is acting like I’m on meth for using such a table.
My all-time favorite database table was a table named STATE, meant to store all US states. It had 531 rows.
Relatable. 🤣
well, there’s confusion, paranoia, agitation and so many others…
Confusion … Your medical condition Disorganized
I have been trying to get people in my area to make their new table generically named, since it’s going to be the only table that can map a date range to a different date range, but I’m on holidays now, and they can’t imagine anything other than their little project needing this table, so it’s going to be named for this one project, and it’s columns will be named for the specific data they’ll hold :(
I’ve been there and you know what’s worse about it? When you fix it only you or a handful of people notice the astronomical labor you did.
“It worked before why did you change it? You are just doing busywork”
Yeah. Luckily the work I am doing is to fix some really bad work that the entire company has been complaining about. So once it’s fixed it will hopefully be a little bit more recognition than that. Plus my boss is pretty level headed.
But who fucking knows? There is always the likelihood that people will say things along those lines. And it ain’t my job to fight them on that.
How about a date stored as an integer?
Edit: and I’m not taking about a timestamp
No, we have worse. Dates sometimes stored as strings, sometimes as datetimes, and sometimes as integers. There is no consistency, logic, or forethought to the schema.
It’s rough.
Worked on a enterprise medical database, had thousands of tables, and some of the most corrupt data possible. This triggers me :(
Only 3h? What kind sql magician are you?!
Somebody tell this dude about views.
If they existed for tons of random usecases. When was the last time you created views for “just in case someone asks” situations?
So my work is archaic and doesn’t even use SQL. What are views?
Basically scripts you can run on the fly to pull calculated data. You can (mostly) treat them like tables themselves if you create them on the server.
So if you have repeat requests, you can save the view with maybe some broader parameters and then just SELECT * FROM [View_Schema].[My_View] WHERE [Year] = 2023 or whatever.
It can really slow things down if your views start calling other views in since they’re not actually tables. If you’ve got a view that you find you want to be calling in a lot of other views, you can try to extract as much of it as you can that isn’t updated live into a calculated table that’s updated by a stored procedure. Then set the stored procedure to run at a frequency that best captures the changes (usually daily). It can make a huge difference in runtime at the cost of storage space.
It can really slow things down if your views start calling other views in since they’re not actually tables
They can be in some cases! There’s a type of view called an “indexed” or “materialized” view where the view data is stored on disk like a regular table. It’s automatically recomputed whenever the source tables change. Doesn’t work well for tables that are very frequently updated, though.
Having said that, if you’re doing a lot of data aggregation (especially if it’s a sproc that runs daily), you’d probably want to set up a separate OLAP database so that large analytical queries don’t slow down transactional queries. With open-source technologies, this is usually using Hive and Presto or Spark combined with Apache Airflow.
Also, if you have data that’s usually aggregated by column, then a column-based database like Clickhouse is usually way faster than a regular row-based database. These store data per-column rather than per-row, so aggregating one column across millions or even billions of rows (eg average page load time for all hits ever recorded) is fast.
A view is a saved query that pretends it’s a table. It doesn’t actually store any data. So if you need to query 10 different tables, joining them together and filtering the results specific ways, a view would just be that saved query, so instead of “SELECT * FROM (a big mess of tables)” you can do “SELECT * FROM HandyView”
Predefined queries that you can interact with like another table more or less
deleted by creator
Me this morning. I’m gonna take a look at why this Jenkins pipeline is failing. This one job starts a dozen others. Half are failing. For different reasons. After starting rewriting a job that someone half assed. Realize the original error was caused by missing input but some are still valid. Still can’t figure out why my rewritten program is erroring. Get pulled away because another program did something weird… I completed nothing today but worked a ton.
My day…
It’s OKAY to say no.
I’ve gotta get better at this…
Removed by mod
Because Jen in accounting doesn’t believe in it, and Tom the CIO likes his data stored raw in TXT Amphibious Delineated. Then our biggest client prefers data as Jason so we swapped half of our database to that to speed things up.
But the real problem is high turnover because we don’t pay anyone enough to work on things they are proud of. After 2 years we stop doing even 3% COL raises so they go elsewhere. So every 2-4 years each position gets a new opinionated asshole.
our biggest client prefers data as Jason so we swapped half of our database to that
the app I work with currently stores json as the only column in a sql table and it hurts me so very much. like watching someone pick up a screwdriver and try to bash a nail in with the handle.
Garcia From Criminal Minds: consider it done 😂
Enhance
Hang on, what’s that? Click on the
ip_address
column!