Secure Coding: Basic Concepts
Today’s organizations and individuals rely heavily on information technology for day-to-day operations and activities. Software development supplies such demand, which concomitantly makes it a frequent target for cyber attacks.
Cybersecurity for application development encompasses organizational and technical solutions. For clarity, this post segmented the discussion between those two domains. However, this doesn’t imply that organizational concepts won’t have technical components and vice versa.
On the organizational side, the discussion targeted the software development process in general, provisioning, de-provisioning, change management, integrity measures, automation, and secure DevOps. On the other hand, technical solutions regarding secure coding included input normalization and validation, memory management, data exposure, code reuse, dead code, use of third-party libraries and SDKs, code signing, code encryption, obfuscation, and camouflage, and, finally, error-handling.
Software Development: General Definitions
Conceptually, software development is defined as a project with established objectives and stages. These stages happen in different environments, which is necessary for optimization and to avoid coding errors that could affect production, including its security. Such environments are development, testing, staging, and production.
It’s important to note the role of cybersecurity during this process. As a good practice, security demands must be addressed holistically, i.e., from the program’s parameters definition through all the development stages. In other words, security must be viewed not as an obstacle to good software development but as a fundamental parameter of code development, like performance and reliability.
The development environment is mainly for writing code. Coding is often a team effort and is iterative, i.e., code is written and re-written due to testing and changing objective parameters. With such demands, the development environment is frequently set as sandboxes, giving flexibility for code changes and avoiding overwriting by different team members. Version control is a suitable mechanism for keeping track of code changes. This concept is presented in more detail below.
The testing environment is a clean copy of the development environment. They aim for integration testing, i.e., to verify how the code interacts with a typical environment. Testing may incorporate a more systematic approach called Quality Assurance (QA), a set of processes that establish code quality reference standards, constantly measure code effectiveness and efficacy (including security), and propose development process improvements if necessary. One may argue for the high cost of QA services, which, on the other hand, is frequently compensated by the delivery of high-quality and secured code.
From a security standpoint, the development and testing environments typically have less restrictive security controls since developers need deep system access for proper writing and testing. Because of this, segmenting those environments from the rest of the network is a good practice, which prevents an attacker from establishing a foothold for privilege escalation. This separation may happen through VLANs and security groups/firewalls that will control traffic to and from development.
Staging is primarily used for unit testing. This process individually scrutinizes an application’s smallest testable parts (aka units), verifying proper operation. Secondly, it aims to check how the software runs under defined security settings. This environment consists of settings similar to production and is the last opportunity to validate code and correct flaws before the application is deployed.
Production is the development endpoint where the program is expected to meet its requirements. This environment must have all the necessary security settings and contain only properly tested applications. Additionally, only senior and experienced developers must have access to production.
Provisioning and De-provisioning
Provisioning means creating or making a resource (e.g., an application) available. In contrast, de-provisioning is removing a resource or turning it unavailable.
Both processes are usually automated. For example, it’s common practice to automatically deploy software packages and their following updates, offering them through self-service portals. They can also be integrated with other automated technologies, primarily through the cloud’s software-defined data centers. Automating
Change Management
Applications, like projects in general, are constantly being modified and updated to meet the scope and the ever-changing consumer requirements. So, controls must exist to monitor for code changes in development and testing and provide compatibility during deployment. Software change management and its IT operation counterpart are very similar.
Logically, change management is closely related to version control. As discussed briefly above, version control’s primary objective is to monitor for code changes during development. Other roles and functionalities arise from this basic premise. Version control tools provide detailed code branching information, merging capabilities, and performance measurement. It must also perform constant security checks, including scanning and addressing vulnerabilities that can become attack vectors, while providing source code integrity checks to prevent tampering.
Today, like provisioning and de-provisioning, change management and version control are primarily automated processes. This aspect is even more relevant in software development companies aligned with DevOps: change management must be responsive to requests for updates and implement them as swiftly as possible. The role of automation for software development and security, in general, is presented in a dedicated topic below.
Integrity Measurement
Integrity measurement consists of collecting measurements of systems, applications, hardware, and configuration settings to detect undue system modifications caused by an application. Those measurements are computed hash values used for attestation challenges, i.e., to query the integrity of a computing element. Such control is necessary since malware may be installed and modify a system at a significantly low level.
The Trusted Computing Group (TCG) is a group aimed at proposing and developing concepts and standards for system trust, which, at its core, encompasses integrity measurement. TCG proposed two basic mechanisms for implementing trust:
- The Trusted Platform Module (TPM) is a cryptoprocessor installed in the motherboard that, at load time, collects and stores system and application integrity measurements. While the TPM isn’t able to act on those measurements, it’s a fundamental mechanism for effective and efficient integrity checks;
- The Core Root of Trust for Measurement (CRTM) is a pivotal metric for all systems implementing it. CRTM is a code that runs when a system boots, providing the starting point for integrity measurements collected and stored by the TPM.
The CRTM integrity check process happens as follows:
- CRTM collects integrity measurements of the bootloader, forwarding it to the TPM’s Platform Configuration Register (PCR), which is a memory space to store hash values;
- After the bootloader executes and before the operating system (OS) runs, its integrity measurements are also collected and transferred to the TPM’s PCR;
- All of the following applications go through the same process, with their integrity measurements being collected and stored in the TPM;
- If an integrity attestation challenge is issued, the TPM provides the necessary integrity data signed with its Attestation Identity Key (AIK). The challenger entity then compares this data with known-good values and decides whether to trust the system.
One important idea is that all measures taken after the CRTM depend on its integrity. If the CRTM is compromised or invalid, they will be too.
Automation, Software Development, and Cybersecurity
Automation can be defined as any procedure, such as testing, deployment, and configurations, executed without direct human intervention. Its mechanisms and technologies vary from simple scripts to entire collections of systems. Technically, they are often a combination of detection machine learning algorithms and automated responses.
The relationship between automation and software development is twofold, all aimed at improving both management and security:
- Automation software performs significant cybersecurity functions, like IDS and IPS, SOAR, and vulnerability scans. So, automation here is understood as a product of software development used for cybersecurity practices. It’s often an essential tool for compliance, facilitating dealing with all the demands necessary when aligning practices with standards and laws;
- Automation technologies apply to software development. Here, they are still referred to as programs but designed to help improve application development, including its security aspects.
Regardless of its role, automation significantly contributes to software development optimization in all stages, including version control, testing, provisioning, de-provisioning, and all the necessary security concerns. It’s also the basis for consolidating DevOps (discussed below), given that automation enables a more agile approach to application development.
Significant cybersecurity concerns regarding automation implementation exist. Organizations should view automation solutions as any other software, including controlling potential vulnerabilities and adequately integrating them with different technologies (i.e., interoperability and extensibility). Automation is often implemented to detect possible violations and errors, so it must sufficiently deal with false positive and negative results. Since it accesses all kinds of data, it must have built-in mechanisms that ensure privacy. Without such care, automation may add more problems than solve the original ones.
Another central aspect of proper automation management and implementation is following industry and international standards. One example is the National Institute of Standards and Technology (NIST) proposal for the Security Content Automation Protocol (SCAP). SCAP is an adequate illustration of applying automation to organizational security. It is a reference for automating system security checks, such as vulnerability scanning and validating configuration settings, while facilitating policy compliance.
DevOps
DevOps is a set of thought processes and tools derived from the Agile methodology. It’s an abbreviation for development and operations. So, as implied, it couples development and IT operations into a faster and more reliable Software Development Life Cycle (SDLC), including all the necessary development stages and IT operations feedback for further application improvements. For that, DevOps heavily relies on automated configuration and deployment processes.
The core of DevOps SDLC is the Continuous Integration and Continuous Delivery (CI/CD), as illustrated in the image below.
CI consists of constant code integration inside the development and testing processes. After constructing each code, developers commit it to a repository server. A CI server (aka build server) pulls the stored code, compiles it, tests it, and integrates it into a new application build. All those processes are done automatically, which confers fast and reliable testing, better developer productivity, and lower costs.
After the build is successful, the next step in the SDLC is the CD, which prepares the code for production. Again, automated processes provide quick and reliable application deployment, although manual code release may be allowed if necessary. Application monitoring by IT operations provides feedback for further application improvements, re-initiating the CD/CI SDLC.
It’s worth noting that security concerns weren’t initially considered inside DevOps and were viewed as an obstacle to the aimed agility. However, automated and scripted security testing is designed into all SDLC stages today, with continuous checking and versioning, including security parameters. Such practice follows the holistic approach to software development security mentioned above. The DevSecOps concept shows such integration, aiming to develop secure and resilient software.
Secure Coding Practices, Methodologies, and Technologies
Developing secure code requires that organizational processes address security concerns. It also demands the implementation of specific controls when writing code, establishing how the program will deal with the input, process and store data, handle exceptions and errors, protect the source code’s confidentiality and integrity, and validate the developer’s identity.
Normalization and Input Validation
An application accepts input and processes it to execute its roles. If not correctly configured, this process may become a possible attack vector. Attackers can provide malformed or malicious input, which may cause different anomalous application behaviors (e.g., disclosure of information and application crashing, both through buffer overflow), provide unauthorized data access (e.g., database access through SQL injection), or compromise user’s and machine’s operations (e.g., cross-site scripting [XSS]).
Normalization is the first step in securing input acceptance. It means converting data into its simplest form. The same user input may have different Unicode correspondents, which can be exploited. Normalization involves the program converting such input into its binary correspondents, increasing computational predictability.
Next, the application performs input validation by limiting strings’ size and restricting which characters are authorized. This last procedure may be done through an allow or denylist:
- Allowlisting means defining previously which inputs are accepted. It’s a very secure method but, if not properly configured, can be excessively restrictive;
- Denylisting consists of establishing which inputs cannot be used. While this method is more flexible, the developer has to be thorough when elaborating the list. Forgotten invalid inputs may provide an attack vector.
Normalization must happen before input validation since their different (and potentially malicious) Unicode counterparts may still pass even if characters are filtered.
An important aspect to consider is where the input validation will occur: client-side or server-side.
- Client-side validation validates data entered into a form in the user’s interface, with instantaneous feedback in case of information outside input validation’s rules. While convenient, it’s unsecured since (1) the application must trust a third party, i.e., the web browser, (2) attackers may easily bypass such a mechanism, and (3) the user may disable it.
- Server-side validation occurs on the server where the application is hosted. It’s more secure since it protects against malicious inputs from the server while being more difficult to bypass.
A possible solution for which method is more adequate is to adopt both. The client performs a preliminary check and forwards the data to the server, which validates a second time.
Stored Procedures
Stored procedures are pre-compiled SQL statements stored on the database server, so different user queries may use the same statement. It increases reliability and prevents SQL injections since the stored procedure already has the necessary execute rights, dictating how the client will interact with the database. From a performance standpoint, it also provides faster results and reduces network traffic.
Memory Management
For a program to work, it must have a space in memory to execute its functions. Developers should assign sufficient memory so the application doesn’t crash or behave unexpectedly. Additionally, mechanisms should be designed into the program to adequately manage memory.
Attackers may exploit memory assignment and management flaws to cause a buffer overflow, which, in turn, may disrupt application functioning (denial of service) or expose sensitive data.
Memory assignment has some best practices:
- The memory buffer needs to be large enough for the application;
- There must be proper input and output control;
- After the completion of a function, there must be free memory blocks;
- Developers must avoid using vulnerable functions (see table below);
- Clear sensitive data from memory to avoid information leaks;
Data Exposure
Applications process, stores, and sends data to other applications. It’s good practice to implement encryption in data at rest and in transit to avoid data exposure. Logically, developers should use solid and peer-reviewed encryption algorithms.
As a general rule, secure software development should address data protection measures since the beginning of the SDLC, incorporating mechanisms such as data validation, authentication, authorization, and encryption.
Code Reuse and Dead Code
Code reuse means implementing code previously developed for other applications in new programs. While this practice saves time and money, the main problem of code reuse is proper versioning control: old code not adequately patched for vulnerabilities may introduce those into the new application being developed. Organizations should seek proper vulnerability control and monitor for good sources of code, which brings the relation between code reuse and third-party libraries. Both are conceptually similar, including their associated security best practices. Third-party libraries are discussed below.
On the other hand, dead code encompasses pieces of instructions that can be executed but don’t bring any contributions to the rest of the program’s functions. Typically, this scenario results from developers who haven’t taken the necessary time and executed due diligence to review and clean the code after changes and updates.
Dead code may excessively consume computing time and resources, hampering application performance and potentially causing additional costs to a company. Additionally, it may cause or trigger unforeseen vulnerabilities.
Third-party Libraries and SDKs
It’s common practice to incorporate into the application code developed by third-parties. Libraries store and distribute pre-made code for developers. While practical, such an approach may expose the organization to library vulnerabilities. So, it’s heavily recommended that companies carefully choose and monitor third-party libraries’ security issues, updating them as necessary. Common Vulnerabilities and Exposures (CVE) databases list libraries’ vulnerabilities, which makes them a good source for good security practices.
As said above, third-party libraries and code reuse are very similar since both work to make previously developed code available for future projects. Note that both methods may have a widespread impact when plagued by vulnerabilities since multiple applications often use the same libraries. This was the case with the OpenSSL Heartbleed and GNU Shellshock vulnerabilities.
Software Development Kits (SDKs) are a set of tools used to develop applications for specific environments and software packages. Android SDKs are a good example. Like libraries, organizations must be mindful of potential vulnerabilities introduced through specific SDKs. CVE databases and constant monitoring and updating are also sound security measures for dealing with SDK use.
Code Signing
Essentially, code signing relies on public-key infrastructure (PKI) digital certificates to sign code and prove that it came from the actual developer. Code signing can be individual (for independent developers) or organizational, i.e., for tech companies that develop software.
Since code signing relies on digital certificates, organizations must protect against possible stolen certificates, which can be used to make a malicious program pass as legitimate.
Code Encryption, Obfuscation, and Camouflage
Encryption, obfuscation, and camouflage are measures to prevent code reverse engineering and, consequently, to protect trade secrets and intellectual property.
Code encryption, as the name implies, means transforming code into ciphertext, which makes it unintelligible to a potential attacker. The code is decrypted when the application runs.
Obfuscation’s methodology makes clear source code terms difficult to read and understand. It may involve using abbreviated names, excessively long words, and removing whitespace. Developers working with interpreted languages (e.g., Python) have long used obfuscation.
Finally, camouflage consists of creating a fake code from a piece of legitimate code. When looking at the code, the attacker will only see the fake one while the original code is run.
Additional Security Controls
Aside from all the processes and techniques addressed above, adding some final procedures for secure coding is important. In general, security teams working with developers must:
- Anticipate possible attack vectors;
- Couple manual with automated reviews,
- Implement best practices regarding code handling, tracking change management closely, and ensuring confidentiality, i.e., only authorized personnel can access the source code.
A future post will discuss specific software security testing techniques, like fuzzing and black, white, and grey box testing.