If you're modeling relational data, it doesn't seem like you can get around using a DB that uses SQL, which to me is the worst: most programmers aren't DB experts and the SQL they output is quite often terrible.
Not to dunk on the lemmy devs, they do a good job, but they themselves know that their SQL is bad. Luckily there are community members who stepped up and are doing a great job at fixing the numerous performance issues and tuning the DB settings, but not everybody has that kind of support, nor time.
Also, the translation step from binary (program) -> text (SQL) -> binary (server), just feels quite wrong. For HTML and CSS, it's fine, but for SQL, where injection is still in the top 10 security risks, is there something better?
Yes, there are ORMs, but some languages don't have them (rust has diesel for example, which still requires you to write SQL) and it would be great to "just" have a DB with a binary protocol that makes it unnecessary to write an ORM.
Does such a thing exist? Is there something better than SQL out there?
No, SQL is great for relational data. There's like decades of research into it. It's the best.
relational databases have years of reseach into them, not the query language itself.
sql was built so people other than devs can use it, but we got stuck with it.
sql was built so people other than devs can use it, but we got stuck with it.
Not really. Being designed with UX in mind, so that it sacrifices conciseness for readability, does not make it something for "people other than devs".
Likewise, BASIC was also developed with UX in mind, and no one in their right mind would ever claim that it's not a programming language.
SQL, where injection is still in the top 10 security risks
This is absolutely true, but it's not what it looks like on the surface, and if you dig into the OWASP entry for this, you'll see they talk about mitigation.
You can completely eliminate the possibility of injection attacks using well-understood technologies such as bind variables, which an ORM will usually use under the covers but which you can also use with your own queries. There are many, many database applications that have never once had a SQL injection vulnerability and never will.
The reason SQL injection is a widespread security risk, to be blunt, is that there are astonishingly large numbers of inexperienced and/or low-skill developers out there who haven't learned how to use the tools at their disposal. The techniques for avoiding injection vulnerability are simple and have been well-documented for literally decades but they can't help if a lousy dev decides to ignore them.
Now, a case could be made that it'd be better if instead, we were using a query language (maybe even a variant of SQL) that made injection attacks impossible. I agree in principle, but (a) I think this ends up being a lot harder than it looks if you want to maintain the same expressive power and flexibility SQL has, (b) given that SQL exists, "get bad devs to stop using SQL" doesn't seem any more likely to succeed than "get bad devs to use bind variables," and (c) I have too much faith in the ability of devs to introduce security vulnerabilities against all odds.
It's also worth noting that the current entry on OWASP is injection generically. It includes SQL, but it also covers things like HTTP links where you concatenate in unchecked user input. SQL injection by itself may no longer be prominent enough make the list.
I'm also going to put the blame squarely on PHP for SQL injection attacks hanging on for so long, particularly when combined with MySQL. That DB didn't support bind variables for a long time (maybe still doesn't?). Other languages may have used a library that simulated bind variables for you. Barring that, they tended to always always always show how to use a sanitation function even if it was the first mention in the first tutorial for the top Google result for "sql [language]". That creates a culture in the language of writing safe SQL code. Not PHP, though; the sanitation functions were there, but they never gave them the prominence that they so badly needed.
I’m absolutely biased as a data engineer who loves SQL, but there are some good reasons why SQL has been the de facto standard for interacting with databases since the 80s.
One of its draws is that it’s easy to understand. I can show a stakeholder that I’m selecting “sum(sale_amount) from transactions where date=yesterday” and they understand it. Many analysts are even able to write complicated queries when they don’t know anything else about programming.
Since it’s declarative, you rarely have to think about all the underlying fuckery that lets you query something like terabytes of data in redshift in minutes.
Debugging is often pretty nice too. I can take some query that didn’t do what it was supposed to and run it over and over in a console until the output is right.
I’m absolutely biased as a data engineer who loves SQL, but there are some good reasons why SQL has been the de facto standard for interacting with databases since the 80s.
I find it funny how the people who actually have to wrangle data swear by SQL as awesome, but there are always random hacks coming out of the woodwork, who don't even look at SQL at all, with sweeping statements claiming SQL sucks because reasons.
It's like the most opinionated people against SQL are the ones who don't use SQL.
it's just a drag to veer from. it's not particularly good, but good enough to stick around.
I have found that usually the problem isn’t SQL itself, it’s the design of the database or the application or both that makes it slow or hard to use because databases are tricky and it’s very easy to make them slow and difficult to optimize. This goes for any database, SQL or otherwise.
I think the reason that you see so many people use relational databases for relational data is that they were designed for it and have been optimized to do that job for almost 50 years at this point, the right tool for the job if you will. That’s also why I don’t think any language builds it in, there isn’t one type of database that is good at every scenario so it would be difficult to pick a default as a language. Also keep in mind, most large applications are deployed across many servers so anything that uses a local file is out of the question.
As far as performance directly though, I’m definitely not a DBA but I have spent a lot of time helping people troubleshoot slow databases and it seems many people write apps and design databases based only on how they want to store the data and not how the actual data is consumed. The other thing I think most folks don’t seem to realize (including myself for a long time) know the need for creating good indexes based on how you are accessing the data. It’s not an easy problem to solve for sure though.
it would be great to “just” have a DB with a binary protocol that makes it unnecessary to write an ORM.
Other people have talked about other parts of the post so I want to focus on this one.
The problem an ORM solves is not a problem of SQL being textual. Just switching to a binary representation will have little or no impact on the need for an ORM. The ORM is solving the problem that's in its name: bridging the conceptual gap between an object-oriented data model and a relational data model. "A relational data model" isn't about how queries are represented in a wire protocol; instead, it is about how data, and relationships between pieces of data, are organized.
So, okay, what if you get rid of the relational data model and make your database store objects directly? You can! NoSQL databases had a surge in popularity not too long ago, and before that, there have been lots of object databases.
What you're likely to discover in an application of any real complexity, though, and the reason the industry has cooled somewhat on NoSQL databases after the initial hype cycle, is that the relational model turns out to be popular for a reason: it is extremely useful, and some of its useful properties are awkward to express in terms of operations on objects. True, you can ditch the ORM, but often you end up introducing complex queries to do things that are simple in SQL and the net result is more complex and harder to maintain than when you started. (Note "often" here; sometimes non-relational databases are the best tool for the job.)
And even in an object database, you still have to know what you're doing! Storing objects instead of relational tuples won't magically cause all your previously-slow queries to become lightning-fast. You will still need to think about data access patterns and indexes and caching and the rest. If the problem you're trying to solve is "my queries are inefficient," fixing the queries is a much better first step than ditching the entire database and starting over.
The difficulty is in database design being tricky not the language. There's no getting around that by using a different language
I disagree. Database design is tricky, yes, but with SQL, you now have two problems.
I used to be full on the ORM train. Now I’m a little less enthusiastic. What I actually think people need most of the time is something closer to ActiveRecord. Something that can easily map a result set into a collection of typed objects. You still generally write parameterized SQL, but the work of translating a db decimal into the correct target type on a record object in your language is handled for you (for example). In .net, Dapper is a good example.
I also think most people overemphasize or talk about how other programmers “suck at SQL” waaayy too much.
IMO, for most situations, these are the few high-level things that devs should be vigilant about:
- parameterize all sql.
- consider the big-o of the app-side lookup/write methods (sometimes an app join or pulling a larger set and filtering in memory is better than crafting very complex projections in sql). This is a little harder to analyze with an ORM, but not by much if you keep the mappings simple and understand the loading semantics of the ORM.
- understand the index coverage of queries and model table keys properly to maintain insert performance (monotonically increasing keys).
- stop fixating on optimizing queries that run in a few seconds, a few times a day. Optimize the stuff that you run on every transaction - if you need to.
On most of those points, if you don’t have aggregate query counts/metrics on query performance on your clusters, starting to get cute with complex queries is flying blind, and there’s no way to prioritize what to optimize.
For the vast majority of cases, simple, obvious selects that don’t involve special db features are going to do the job for most applications. When the database becomes a bottleneck, there are usually much more effective ways to handle them than to try to hand optimize all the queries.
Lastly, I have a little bit of a theory that part of the reason people do/do not like looking at SQL in code is because it’s a hard context switch from one language to another, often requiring the programmer to switch to “stringly-typed” mode, something we all learn causes huge numbers of headaches in our first few months of programming. Some developers accept that there’s going to be different languages/contexts and not all of them are going to be as fluent or familiar, but accept that this is par for the job. Others recoil from the unfamiliar and want to burn it down. IMO, the former attitude is a lot more productive.
the translation step from binary (program) -> text (SQL) -> binary (server)
Your concern about this is misguided. Inter-process communication always has to cross a barrier, by definition.
I take it http also feels wrong to you?
Binary protocols do exist, e.g. gRPC, but they incur costs of their own.
the translation step from binary (program) -> text (SQL) -> binary (server)
Also, the translation step from binary (program) -> text (SQL) -> binary (server), just feels quite wrong. For HTML and CSS, it’s fine, but for SQL, where injection is still in the top 10 security risks, is there something better?
Does this not answer your question?
I agree that SQL is hard, and optimising it is harder, but is a bespoke binary protocol per DB really easier?
You can usually connect to the DB directly via an ODBC driver, but that will involve SQL anyway.
A protocol that forces the use of a query builder or ORM would be easier and better, I think. The DB developer would have to provide a library, which means one would have to learn the library, not a new language.
Making raw sql access harder for applications is probably a good idea. Perhaps with something like rusts "unsafe", so you can still do it, but you have to deliberately make the decision to do so.
The other users for raw SQL are DB administrators, and I dont think youll be able to take SQL away from them as easily. I dont actually know what or how DB administrators work, but I dont think they are developers a lot of the time, so requiring them to write code to do their job might be a non-starter?
Datalog. Basically relational, but have much simpler syntax and semantics,and is able to convey more complicated queries in a composable manner. Mainly used in Clojure-based databases where actual programs instead of strings are used.
🤔 interesting. First order logic for querying. I found CozoDB which might be useful. Still not sure about how to map objects to the DB. The tutorial is not very relatable. Lots of numbers and single letters.
Thanks for the hint though. I'll check it out.