• blobjim [he/him]
    ·
    edit-2
    3 years ago

    My gripe with text-based stuff is it only benefits developers who want to look at stuff in debug tools (in the browser, since the packets are encrypted). For everything else it's just a nuisance and takes way more computational power and takes up more space.

    It also makes parsing more complex because text has a character set and there's lots of weird little things about text parsing. It also leads to specifications being under-specified, and makes it easier to have security vulnerabilities, etc. Binary protocols are usually a lot more well-defined, because they have to be. Look at how long DNS, SSL/TLS, and almost every binary file format (PNG, JPEG, etc.) have been working for. Then look at how many text-based protocols have lots of incompatible implementations/features: XML, HTML, JavaScript, HTTP headers, markdown, GLSL, etc. have all had parsing and feature incompatibilities. You can't really have incompatibilities with binary protocols because you can't just shove more text in between other stuff. Extensibility has to be explicitly provided for in a binary protocol.

    Binary protocols are also just easier to implement, even though people think the opposite. It's so much easier to read a series of integer values from a byte array than trying to parse text in a specific format. I was able to write a basic WAV file parser really easily, but writing a parser for anything requires a lot more thought.

    Text formats also have the problem that you can't always efficiently include binary data inside them. Encoding stuff in base 64 is really inefficient.

      • blobjim [he/him]
        ·
        edit-2
        3 years ago

        Well, even JPEG files are intelligible to humans. You just need to use an image viewer, one that can show metadata and so on. The file formats themselves have specifications that specify their exact format, often in a pretty understandable way. Text basically just gets replaced with integer enums and counts and so on. It's having the right tools that's important. Most web content would be impossible to parse if web browsers didn't come with fancy developer tools that show everything nicely formatted and everything, with an element picker tool that shows bounds. Java is a good example of a binary system that has tooling that makes it intelligible. It's a big complicated VM with thousands of objects and things, but if you use a debugger like in IntelliJ or Eclipse, or use a monitoring tool like VisualVM, you can basically inspect the entire execution of the program even though the class file format is in binary (with many string names and stuff). You can pause execution and view the stack trace, you can dump all the objects in the heap and look at their fields and see the objects they point to, etc.

        The systems themselves are always going to be more complex than is immediately comprehensible. It's pretty commonly pointed out how it's nearly impossible to implement a web browser from scratch today, at least not without an immense amount of funding and people and determination (because all these text formats are extremely over complicated, because it's really easy to just add another string value to something as a "feature" and call it good, even though that feature takes 1000 lines of code to implement).

        Also, the electronics in every computer are already using binary communication protocols, and those protocols are just about as important to understand. But a lot of things use standard protocols like PCIe or I2C, which anyone can read the specs for (sometimes for a fee).

        Here's the spec for the WAVE file format (which is admittedly much simpler than other formats, since it uses no compression): http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html

        You basically just read it top-to-bottom and implement it as you go.

        edit: I just remembered another amazing tool. Wireshark! You've used it already maybe. Probably the most awesome tool I've ever used. You can inspect internet/ethernet packets and it parses them using its understanding of tons of different protocols. You can inspect exact information about all sorts of things and it shows you the exact bytes those fields correspond to.