├── CODE_OF_CONDUCT.md
├── FAQ.md
├── README.md
├── java.md
├── javascript.md
└── satellite-2020-workshops-codeql.pdf
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 |
2 | # Code of Conduct
3 |
4 | The GitHub Satellite 2020 Discussion Forum is intended to be a place for questions, feedback and chat related to sessions at the virtual GitHub Satellite 2020 event. This is a civilized place for connecting with other attendees, and Hubbers from across the world taking part in the event. By participating in this community, you are agreeing to the same [Terms of Service](https://help.github.com/articles/github-terms-of-service) that apply to GitHub.com, as well as the GitHub Satellite 2020 Discussion Forum specific Code of Conduct.
5 |
6 | With this Code of Conduct, we hope to help you understand how best to collaborate in Discussions, what you can expect from moderators, and what type of actions or content may result in temporary or permanent suspension from this project. We will investigate any abuse reports and may moderate public content within the discussion that we determine to be in violation of either the GitHub Terms of Service or this Code of Conduct.
7 |
8 | GitHub users worldwide bring wildly different perspectives, ideas, and experiences, and range from people who created their first "Hello World" project last week to the most well-known software developers in the world. We are committed to making GitHub Satellite a welcoming environment for all the different voices and perspectives here, while maintaining a space where people are free to express themselves.
9 |
10 |
11 | ### Pledge
12 |
13 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to make participation in the GitHub Satellite Discussions a harassment-free experience for everyone, regardless of age, body size, ability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
14 |
15 |
16 | ### Standards
17 |
18 | Treat the GitHub Satellite Discussions with respect. The following are not hard and fast rules, merely aids to the human judgment of our Community. Use these guidelines to keep this a clean, well-lighted place for civilized public discourse.
19 |
20 |
21 | #### _Best Practices for Building a Strong Community_
22 |
23 |
24 |
25 | * Be respectful and considerate.
26 | * Be welcoming and open-minded. Other GitHub members may not have the same experience level or background as you, but that doesn't mean they don't have good ideas to contribute. We encourage you to be welcoming to new members and those just getting started.
27 | * Respect each other. Nothing sabotages healthy conversation like rudeness. Be civil and professional, and don’t post anything that a reasonable person would consider offensive, abusive, or hate speech. Don’t harass or grief anyone. Treat each other with dignity and consideration in all interactions. \
28 | You may wish to respond to something by disagreeing with it. That’s fine. But remember to criticize ideas, not people. Avoid name-calling, ad hominem attacks, responding to a post’s tone instead of its actual content, and knee-jerk contradiction. Instead, provide reasoned counter-arguments that improve the conversation.
29 | * Communicate with empathy. Disagreements or differences of opinion are a fact of life. Being part of a community means interacting with people from a variety of backgrounds and perspectives, many of which may not be your own. If you disagree with someone, try to understand and share their feelings before you address them. This will promote a respectful and friendly atmosphere where people feel comfortable asking questions, participating in discussions, and making contributions.
30 | * Contribute in a positive and constructive way.
31 | * Improve the discussion. Help us make this a great place for discussion by always working to improve the discussion in some way, however small. If you are not sure your post adds to the conversation, think over what you want to say and try again later. \
32 | The topics discussed here matter to us, and we want you to act as if they matter to you, too. Be respectful of the topics and the people discussing them, even if you disagree with some of what is being said.
33 | * Be clear and stay on topic. Communicating with strangers on the Internet can be awkward. It's hard to convey or read tone, and sarcasm is frequently misunderstood. Try to use clear language, and think about how it will be received by the other person. \
34 | This applies to sharing links, as well. Any links shared in the discussions should be shared with the intent of providing relevant and appropriate information. Links should not be posted to simply drive traffic or attention to a site. Links should always be accompanied by a full explanation of the content and purpose of the link. Posting links, especially unsolicited ones, without relevant and valuable context can come across as advertising or serving even more malicious purposes.
35 | * Share mindfully. Don't share sensitive information. This includes your own email address. We don't allow the sharing of such information in this discussion forum, as it can create security and privacy risks for the poster, as well as other users.
36 | * Keep it tidy. Make the effort to put things in the right place, so that we can spend more time discussing and less time cleaning up. So:
37 | * Don’t cross-post the same thing in multiple topics.
38 | * Don’t post no-content replies.
39 | * Don’t divert a topic by changing it midstream.
40 | * Rather than posting “+1” or “Agreed”, use the Reaction emoji button.
41 | * Be trustworthy.
42 | * Always be honest. Don’t knowingly share incorrect information or intentionally mislead other GitHub members. If you don’t know the answer to someone’s question but still want to help, you can try helping them research or find resources instead. GitHub staff will also be active in the discussions, so if you’re unsure of an answer, it’s likely a moderator will be able to help.
43 |
44 | #### _What is not Allowed_
45 |
46 | * Threats of violence. You may not threaten violence towards others or use the site to organize, promote, or incite acts of real-world violence or terrorism. Think carefully about the words you use, the images you post, and even the software you write, and how they may be interpreted by others. Even if you mean something as a joke, it might not be received that way. If you think that someone else might interpret the content you post as a threat, or as promoting violence or terrorism, stop. Don't post it. In extraordinary cases, we may report threats of violence to law enforcement if we think there may be a genuine risk of physical harm or a threat to public safety.
47 | * Hate speech and discrimination. While it is not forbidden to broach topics such as age, body size, ability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation, we do not tolerate speech that attacks a person or group of people on the basis of who they are. Just realize that talking about these or other sensitive topics can make others feel unwelcome, or perhaps even unsafe, if approached in an aggressive or insulting manner. We expect our community members to be respectful when discussing sensitive topics.
48 | * Bullying and harassment. We do not tolerate bullying or harassment. This means any habitual badgering or intimidation targeted at a specific person or group of people. In general, if your actions are unwanted and you continue to engage in them, there's a good chance you are headed into bullying or harassment territory.
49 | * Impersonation. You may not impersonate another person by copying their avatar, posting content under their email address, intentionally using a deceptively similar username or otherwise posing as someone else. Impersonation is a form of harassment.
50 | * Doxxing and invasion of privacy. Don't post other people's personal information, such as phone numbers, private email addresses, physical addresses, credit card numbers, Social Security/National Identity numbers, or passwords. Depending on the context, such as in the case of intimidation or harassment, we may consider other information, such as photos or videos that were taken or distributed without the subject's consent, to be an invasion of privacy, especially when such material presents a safety risk to the subject.
51 | * Prurient/Sexually explicit content. Basically, don't post pornography. This does not mean that all nudity or sexual content is prohibited. We recognize that sexuality is a part of life and non-pornographic sexual content may be a part of your project, or may be presented for educational or artistic purposes. If you have any questions or concerns about something you want to post, [feel free to reach out and ask](https://support.github.com/contact) beforehand.
52 | * Spam. Respect the GitHub Satellite Discussions. Don’t post advertisements, link to spammy websites, or otherwise vandalize the community. This community is meant for GitHub Satellite participants to talk about the sessions, to provide feedback, as questions, learn, and share ideas with one another - not for advertising or other spam-like content. Content that we deem spammy will be removed.
53 | * Copyrighted or illegal content. Only post your own stuff. You are responsible for what you post. If you post something you didn’t create yourself, you must have the right to post it. You may not post illegal content, including content illegal under copyright and trademark laws, links to illegal content, or methods for circumventing the law.
54 | * Active malware or exploits. Being part of this community includes not taking advantage of other members of the community. We do not allow anyone to use our platform for exploit delivery (e.g. Using the community as a means to deliver malicious executables) or as attack infrastructure (e.g. Organizing denial of service attacks or managing command and control servers). Note, however, that we do not prohibit the posting of source code which could be used to develop malware or exploits, as the publication and distribution of such source code has educational value and provides a net benefit to the security community.
55 | * Anyone under the age of 13. If you're a child under the age of 13, you may not have an account on GitHub. GitHub does not knowingly collect information from or direct any of our content specifically to children under 13. If we learn or have reason to suspect that you are a user who is under the age of 13, we will unfortunately have to close both your GitHub.com account. We don't want to discourage you from learning to code, but those are the rules. Please see our [Terms of Service](https://help.github.com/articles/github-terms-of-service) for information about account termination.
56 | * Other conduct which could reasonably be considered inappropriate in a professional setting. The GitHub Satellite Discussions is a professional space and should be treated as such.
57 |
58 | ### Enforcement
59 |
60 |
61 | #### _What GitHub Satellite Discussions members Can Do_
62 |
63 | * If you see a problem, report it. Moderators have special authority; they are responsible for this community. But so are you. With your help, moderators can be community facilitators, not just janitors or police. \
64 | When you see bad behavior, don’t reply. It encourages the bad behavior by acknowledging it, consumes your energy, and wastes everyone’s time. Just report it by copying a direct link to the reply in question and emailing it to events@github.com
65 |
66 | #### Our Responsibilities
67 |
68 |
69 | There are a variety of actions that we may take in response to inappropriate behavior or content. It usually depends on the exact circumstances of a particular case. We recognize that sometimes people may say or do inappropriate things for any number of reasons. Perhaps they did not realize how their words would be perceived. Or maybe they just let their emotions get the best of them. Of course, sometimes, there are folks who just want to spam or cause trouble.
70 |
71 | Each case requires a different approach, and we try to tailor our response to meet the needs of the situation. We'll review each situation on a case-by-case basis. In each case, we will have a diverse team investigate the content and surrounding facts and respond as appropriate, using this Code of Conduct to guide our decision.
72 |
73 | Actions we may take in response to a flag or abuse report include, but are not limited to:
74 |
75 |
76 |
77 | * Content Removal
78 | * Content Blocking
79 | * GitHub Account Suspension
80 | * GitHub Account Termination
81 |
82 | ### Contacting GitHub Staff
83 |
84 |
85 | If, for any reason, you want to contact GitHub Staff, the Community Managers, Administrators, or Moderators of this forum privately, you can send an email to events@github.com.
86 |
87 | Let's work together to keep the discussion a place where people feel safe to participate by being respectful of them and their time.
88 |
89 |
90 | ### Legal Notices
91 |
92 | Yes, legalese is boring, but we must protect ourselves – and by extension, you and your data – against unfriendly folks. We have a [Terms of Service](https://help.github.com/articles/github-terms-of-service/) and [Privacy Statement](https://help.github.com/articles/github-privacy-statement/) describing your (and our) behavior and rights related to content, privacy, and laws. To use this service, you must agree to abide by our [Terms of Service](https://help.github.com/articles/github-terms-of-service/) and the [Privacy Statement](https://help.github.com/articles/github-privacy-statement/).
93 |
94 | This Code of Conduct does not modify our [Terms of Service](https://help.github.com/articles/github-terms-of-service/) and is not intended to be a complete list. GitHub retains full discretion under the [Terms of Service](https://help.github.com/articles/github-terms-of-service/) to remove any content or terminate any accounts for activity that is "unlawful, offensive, threatening, libelous, defamatory, pornographic, obscene or otherwise objectionable or violates any party's intellectual property or these Terms of Service." This Code of Conduct describes when we will exercise that discretion.
95 |
--------------------------------------------------------------------------------
/FAQ.md:
--------------------------------------------------------------------------------
1 | # Frequently Asked Questions - CodeQL :artificial_satellite: workshops
2 |
3 | ## General
4 | - **Will the slides be available?**
5 | - Yes, [here](https://github.com/githubsatelliteworkshops/codeql/blob/master/satellite-2020-workshops-codeql.pdf)
6 |
7 | ## CodeQL setup
8 | - **I’m getting `could not resolve module java` and queries don’t seem to be running… did I miss something obvious in setting this up?**
9 | - Make sure to get all sub-modules: `git clone --recursive https://github.com/github/vscode-codeql-starter/`
10 | - You might have an old version of the `codeql cli` installed in your path. Delete that and let the vscode extension install it
11 |
12 | ## CodeQL
13 |
14 | - **It is possible to create custom code ql queries that run as part of CI/CD?**
15 | - Yes. For open-source projects you can configure the CodeQL GitHub Action to include custom queries you have added to your repository. For closed-source/enterprise code, you can do something similar once you have a license. The enterprise deployment of GitHub Advanced Security allows custom queries to be added and can be integrated into developer workflows.
16 | - **Can CodeQL queries be run on the output of binary tools, such as LLVM or IDA, rather than on source code?**
17 | - Usually no. CodeQL databases are produced by extracting the source code during the build process - the CLI listens to the compiler and processes all source code that is compiled and built.
18 | - **Is there human readable documentation outside VSCode where one can browse the available API (methods, class hiearchy etc)?**
19 | - The queries and standard libraries are open-sourced at http://github.com/codeql, and the documentation is available at https://help.semmle.com/QL/learn-ql/ and https://help.semmle.com/QL/ql-libraries.html.
20 | - **Is there a repo/report/archive somewhere of you all running CodeQL and various vulnerabilities against a large number of open source repos already, or have you just run it on a few?**
21 | - https://securitylab.github.com has comprehensive information on vulnerabilities discovered on OSS with CodeQL.
22 | - https://LGTM.com runs CodeQL analysis for free on over 130k open-source repos.
23 | - This scanning will now be enabled directly on the GitHub.com platform for our users, via the Code Scanning Action.
24 | - **Is it possible to analyze the dependencies as dependencies? Namely to identify all code that has a specific dependency?**
25 | - In general it is possible to identify dependencies. The exact mechanism depends on the language being analysed. For example, we have the content of Maven POM files in Java databases, and package.json in JavaScript databases, and you can query those to find out what your code depends on. Identifying specifically which code uses those dependencies is more involved, though I think mostly possible. From a security scanning point of view, there are some other complementary GitHub tools (dependency graph, dependency insights) that give you an overview of this information on your repositories.
26 | - For the GitHub security feature: https://help.github.com/en/github/visualizing-repository-data-with-graphs/listing-the-packages-that-a-repository-depends-on
27 | - For the CodeQL Java library that lets you examine POM files: https://help.semmle.com/qldoc/java/semmle/code/xml/MavenPom.qll/type.MavenPom$Dependency.html or https://github.com/github/codeql/blob/master/java/ql/src/semmle/code/xml/MavenPom.qll.
28 | - **Preference between `getName() = string`/`hasName(string)` ?**
29 | - Both are available for convenience. `hasName` with a specific string is shorter, but `getName` allows you to easily continue the condition, say, if you want to restrict the name with a regex like so: `.getName().regexpMatch(...)`.
30 | - **What's the best way to extend a backend library to identify a new source of untrusted user input (in order to hopefully benefit from all the existing codeql queries)?**
31 | - The main out-of-the-box definition of untrusted input in the java QL libraries is called `RemoteFlowSource` and it is defined in `FlowSources.qll`. This class allows you to extend it with more cases, by following the pattern in that file.
32 | - Custom extensions can be conveniently put in the file `Customizations.qll` where they'll be visible by all queries.
33 | - **CodeQL has few "common" concepts, but are all differently named. Makes the learning curve higher (for example Java `IfStmt/Block/getNumStmts` vs JavaScript `IfStmt/BasicBlock/getNumLines`). Wish there was a higher level of abstraction so that queries were a bit more portable**
34 | - Actually, in this case JavaScript does have the same classes as Java (both `Block` and `getNumStmts()`).
35 | - One challenge we have is where different languages have standard names for things that are different. For example, the Java language spec defines a "call" as a "method access", which is why the class name is `MethodAccess`.
36 | - The other problem is that the concept may not be identical. In particular, a JavaScript `CallExpr` is quite different from a Java method call, because the target of the call can be defined dynamically.
37 |
38 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
Finding security vulnerabilities with CodeQL
2 | @adityasharad and @lcartey
3 |
4 |
5 | Prerequisites •
6 | Resources
7 |
8 |
9 | > CodeQL is GitHub's expressive language and engine for code analysis, which allows you to explore source code to find bugs and security vulnerabilities. During these beginner-friendly workshops, you will learn to write queries in CodeQL and find known security vulnerabilities in open-source Java and JavaScript projects.
10 |
11 | > There are two workshops on this topic. Both will cover the basics of writing queries in CodeQL. The first will focus on Java, and the second will focus on JavaScript.
12 |
13 | ## Workshop materials
14 |
15 | Please complete the **Prerequisites** section (below) before the workshop.
16 | The following links contain the content that will be covered during the workshop:
17 | 1. Thursday May 7 / 7:00am PDT: [Finding security vulnerabilities in Java with CodeQL](/java.md)
18 | 1. Thursday May 7 / 9:30am PDT: [Finding security vulnerabilities in JavaScript with CodeQL](/javascript.md)
19 |
20 | ## :mega: Prerequisites
21 | - Install [Visual Studio Code](https://code.visualstudio.com/).
22 | - Install the [CodeQL extension for Visual Studio Code](https://docs.github.com/en/code-security/codeql-for-vs-code/getting-started-with-codeql-for-vs-code/installing-codeql-for-vs-code).
23 | - You do _not_ need to install the CodeQL CLI: the extension will handle this for you.
24 | - Set up the [CodeQL starter workspace](https://github.com/github/vscode-codeql-starter).
25 | - **Important:** Don't forget to use `git clone --recursive` or `git submodule update --init --remote` to update the submodules when you clone this repository. This allows you to obtain the standard CodeQL query libraries.
26 | - Open the starter workspace in Visual Studio Code: **File** > **Open Workspace** > Browse to `vscode-codeql-starter/vscode-codeql-starter.code-workspace` in your checkout of the starter workspace.
27 | - Download and add the CodeQL database to be used in the workshop:
28 | - If you are attending **Finding security vulnerabilities in Java with CodeQL**, please download [this CodeQL database](https://github.com/githubsatelliteworkshops/codeql/releases/download/v1.0/apache_struts_cve_2017_9805.zip).
29 | - If you are attending **Finding security vulnerabilities in JavaScript with CodeQL**, please download [this CodeQL database](https://github.com/githubsatelliteworkshops/codeql/releases/download/v1.0/esbena_bootstrap-pre-27047_javascript.zip)
30 | - Unzip the database.
31 | - Import the unzipped database into Visual Studio Code:
32 | - Click the CodeQL icon in the left sidebar.
33 | - Place your mouse over **Databases**, and click the `+` sign that appears on the right.
34 | - Choose the unzipped database directory on your filesystem.
35 |
36 | ## :books: Resources
37 | - [CodeQL docs](https://codeql.github.com/docs/)
38 | - [CodeQL for Java](https://codeql.github.com/docs/codeql-language-guides/codeql-for-java/)
39 | - [CodeQL for JavaScript](https://codeql.github.com/docs/codeql-language-guides/codeql-for-javascript/)
40 | - [CodeQL for Visual Studio Code](https://codeql.github.com/docs/codeql-for-visual-studio-code/)
41 | - More about CodeQL on [GitHub Security Lab](https://securitylab.github.com/get-involved/)
42 | - CodeQL on [GitHub Learning Lab](https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++))
43 |
--------------------------------------------------------------------------------
/java.md:
--------------------------------------------------------------------------------
1 | # CodeQL workshop for Java: Unsafe deserialization in Apache Struts
2 |
3 | - Analyzed language: Java
4 | - Difficulty level: 200
5 |
6 | ## Overview
7 |
8 | - [Problem statement](#problemstatement)
9 | - [Setup instructions](#setupinstructions)
10 | - [Documentation links](#documentationlinks)
11 | - [Workshop](#workshop)
12 | - [Section 1: Finding XML deserialization](#section1)
13 | - [Section 2: Find the implementations of the `toObject` method from ContentTypeHandler](#section2)
14 | - [Section 3: Unsafe XML deserialization](#section3)
15 |
16 | ## Problem statement
17 |
18 | _Serialization_ is the process of converting in memory objects to text or binary output formats, usually for the purpose of sharing or saving program state. This serialized data can then be loaded back into memory at a future point through the process of _deserialization_.
19 |
20 | In languages such as Java, Python and Ruby, deserialization provides the ability to restore not only primitive data, but also complex types such as library and user defined classes. This provides great power and flexibility, but introduces a signficant attack vector if the deserialization happens on untrusted user data without restriction.
21 |
22 | [Apache Struts](https://struts.apache.org/) is a popular open-source MVC framework for creating web applications in Java. In 2017, a researcher from the predecessor of the [GitHub Security Lab](https://securitylab.github.com/) found [CVE-2017-9805](http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9805), an XML deserialization vulnerability in Apache Struts that would allow remote code execution.
23 |
24 | The problem occurred because included as part of the Apache Struts framework is the ability to accept requests in multiple different formats, or _content types_. It provides a pluggable system for supporting these content types through the [`ContentTypeHandler`](https://struts.apache.org/maven/struts2-plugins/struts2-rest-plugin/apidocs/org/apache/struts2/rest/handler/ContentTypeHandler.html) interface, which provides the following interface method:
25 | ```java
26 | /**
27 | * Populates an object using data from the input stream
28 | * @param in The input stream, usually the body of the request
29 | * @param target The target, usually the action class
30 | * @throws IOException If unable to write to the output stream
31 | */
32 | void toObject(Reader in, Object target) throws IOException;
33 | ```
34 | New content type handlers are defined by implementing the interface and defining a `toObject` method which takes data in the specified content type (in the form of a `Reader`) and uses it to populate the Java object `target`, often via a deserialization routine. However, the `in` parameter is typically populated from the body of a request without sanitization or safety checks. This means it should be treated as "untrusted" user data, and only deserialized under certain safe conditions.
35 |
36 | In this workshop, we will write a query to find CVE-2017-9805 in a database built from the known vulnerable version of Apache Struts.
37 |
38 | ## Setup instructions for Visual Studio Code
39 |
40 | To take part in the workshop you will need to follow these steps to get the CodeQL development environment setup:
41 |
42 | 1. Install the Visual Studio Code IDE.
43 | 2. Download and install the [CodeQL extension for Visual Studio Code](https://help.semmle.com/codeql/codeql-for-vscode.html). Full setup instructions are [here](https://help.semmle.com/codeql/codeql-for-vscode/procedures/setting-up.html).
44 | 3. [Set up the starter workspace](https://help.semmle.com/codeql/codeql-for-vscode/procedures/setting-up.html#using-the-starter-workspace).
45 | - ****Important****: Don't forget to `git clone --recursive` or `git submodule update --init --remote`, so that you obtain the standard query libraries.
46 | 4. Open the starter workspace: File > Open Workspace > Browse to `vscode-codeql-starter/vscode-codeql-starter.code-workspace`.
47 | 5. Download and unzip the [apache_struts_cve_2017_9805.zip database](https://github.com/githubsatelliteworkshops/codeql/releases/download/v1.0/apache_struts_cve_2017_9805.zip).
48 | 6. Choose this database in CodeQL (using `Ctrl + Shift + P` to open the command palette, then selecting "CodeQL: Choose Database").
49 | 7. Create a new file in the `codeql-custom-queries-java` directory called `UnsafeDeserialization.ql`.
50 |
51 | ## Documentation links
52 | If you get stuck, try searching our documentation and blog posts for help and ideas. Below are a few links to help you get started:
53 | - [Learning CodeQL](https://help.semmle.com/QL/learn-ql)
54 | - [Learning CodeQL for Java](https://help.semmle.com/QL/learn-ql/cpp/ql-for-java.html)
55 | - [Using the CodeQL extension for VS Code](https://help.semmle.com/codeql/codeql-for-vscode.html)
56 |
57 | ## Workshop
58 |
59 | The workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step. Each step has a **hint** that describes useful classes and predicates in the CodeQL standard libraries for Java. You can explore these in your IDE using the autocomplete suggestions (`Ctrl + Space`) and the jump-to-definition command (`F12`).
60 |
61 | ### Section 1: Finding XML deserialization
62 |
63 | [XStream](https://x-stream.github.io/index.html) is a Java framework for serializing Java objects to XML used by Apache Struts. It provides a method `XStream.fromXML` for deserializing XML to a Java object. By default, the input is not validated in any way, and is vulnerable to remote code execution exploits. In this section, we will identify calls to `fromXML` in the codebase.
64 |
65 | 1. Find all method calls in the program.
66 |
67 | Hint
68 |
69 | - A method call is represented by the `MethodAccess` type in the CodeQL Java library.
70 |
71 |
72 |
73 | Solution
74 |
75 | ```ql
76 | import java
77 |
78 | from MethodAccess call
79 | select call
80 | ```
81 |
82 |
83 | 1. Update your query to report the method being called by each method call.
84 |
85 | Hints
86 |
87 | - Add a CodeQL variable called `method` with type `Method`.
88 | - `MethodAccess` has a predicate called `getMethod()` for returning the method.
89 | - Add a `where` clause.
90 |
91 |
92 |
93 | Solution
94 |
95 | ```ql
96 | import java
97 |
98 | from MethodAccess call, Method method
99 | where call.getMethod() = method
100 | select call, method
101 | ```
102 |
103 |
104 | 1. Find all calls in the program to methods called `fromXML`.
105 |
106 |
107 | Hint
108 |
109 | - `Method.getName()` returns a string representing the name of the method.
110 |
111 |
112 |
113 | Solution
114 |
115 | ```ql
116 | import java
117 |
118 | from MethodAccess fromXML, Method method
119 | where
120 | fromXML.getMethod() = method and
121 | method.getName() = "fromXML"
122 | select fromXML
123 | ```
124 | However, as we now want to report only the call itself, we can inline the temporary `method` variable like so:
125 | ```ql
126 | import java
127 |
128 | from MethodAccess fromXML
129 | where fromXML.getMethod().getName() = "fromXML"
130 | select fromXML
131 | ```
132 |
133 |
134 | 1. The `XStream.fromXML` method deserializes the first argument (i.e. the argument at index `0`). Update your query to report the deserialized argument.
135 |
136 |
137 | Hint
138 |
139 | - `MethodCall.getArgument(int i)` returns the argument at the i-th index.
140 | - The arguments are _expressions_ in the program, represented by the CodeQL class `Expr`. Introduce a new variable to hold the argument expression.
141 |
142 |
143 |
144 | Solution
145 |
146 | ```ql
147 | import java
148 |
149 | from MethodAccess fromXML, Expr arg
150 | where
151 | fromXML.getMethod().getName() = "fromXML" and
152 | arg = fromXML.getArgument(0)
153 | select fromXML, arg
154 | ```
155 |
156 |
157 | 1. Recall that _predicates_ allow you to encapsulate logical conditions in a reusable format. Convert your previous query to a predicate which identifies the set of expressions in the program which are deserialized directly by `fromXML`. You can use the following template:
158 | ```ql
159 | predicate isXMLDeserialized(Expr arg) {
160 | exists(MethodAccess fromXML |
161 | // TODO fill me in
162 | )
163 | }
164 | ```
165 | [`exists`](https://help.semmle.com/QL/ql-handbook/formulas.html#exists) is a mechanism for introducing temporary variables with a restricted scope. You can think of them as their own `from`-`where`-`select`. In this case, we use it to introduce the `fromXML` temporary variable, with type `MethodAccess`.
166 |
167 |
168 | Hint
169 |
170 | - Copy the `where` clause of the previous query.
171 |
172 |
173 | Solution
174 |
175 | ```ql
176 | import java
177 |
178 | predicate isXMLDeserialized(Expr arg) {
179 | exists(MethodAccess fromXML |
180 | fromXML.getMethod().getName() = "fromXML" and
181 | arg = fromXML.getArgument(0)
182 | )
183 | }
184 |
185 | from Expr arg
186 | where isXMLDeserialized(arg)
187 | select arg
188 | ```
189 |
190 | ### Section 2: Find the implementations of the toObject method from ContentTypeHandler
191 |
192 | Like predicates, _classes_ in CodeQL can be used to encapsulate reusable portions of logic. Classes represent single sets of values, and they can also include operations (known as _member predicates_) specific to that set of values. You have already seen numerous instances of CodeQL classes (`MethodAccess`, `Method` etc.) and associated member predicates (`MethodAccess.getMethod()`, `Method.getName()`, etc.).
193 |
194 | 1. Create a CodeQL class called `ContentTypeHandler` to find the interface `org.apache.struts2.rest.handler.ContentTypeHandler`. You can use this template:
195 | ```ql
196 | class ContentTypeHandler extends RefType {
197 | ContentTypeHandler() {
198 | // TODO Fill me in
199 | }
200 | }
201 | ```
202 |
203 |
204 | Hint
205 |
206 | - Use `RefType.hasQualifiedName(string packageName, string className)` to identify classes with the given package name and class name. For example:
207 | ```ql
208 | from RefType r
209 | where r.hasQualifiedName("java.lang", "String")
210 | select r
211 | ```
212 | - Within the characteristic predicate you can use the magic variable `this` to refer to the RefType
213 |
214 |
215 |
216 | Solution
217 |
218 | ```ql
219 | import java
220 |
221 | /** The interface `org.apache.struts2.rest.handler.ContentTypeHandler`. */
222 | class ContentTypeHandler extends RefType {
223 | ContentTypeHandler() {
224 | this.hasQualifiedName("org.apache.struts2.rest.handler", "ContentTypeHandler")
225 | }
226 | }
227 | ```
228 |
229 |
230 | 2. Create a CodeQL class called `ContentTypeHandlerToObject` for identfying `Method`s called `toObject` on classes whose direct super-types include `ContentTypeHandler`.
231 |
232 |
233 | Hint
234 |
235 | - Use `Method.getName()` to identify the name of the method.
236 | - To identify whether the method is declared on a class whose direct super-type includes `ContentTypeHandler`, you will need to:
237 | - Identify the declaring type of the method using `Method.getDeclaringType()`.
238 | - Identify the super-types of that type using `RefType.getASuperType()`
239 | - Use `instanceof` to assert that one of the super-types is a `ContentTypeHandler`
240 |
241 |
242 |
243 | Solution
244 |
245 | ```ql
246 | /** A `toObject` method on a subtype of `org.apache.struts2.rest.handler.ContentTypeHandler`. */
247 | class ContentTypeHandlerToObject extends Method {
248 | ContentTypeHandlerToObject() {
249 | this.getDeclaringType().getASupertype() instanceof ContentTypeHandler and
250 | this.hasName("toObject")
251 | }
252 | }
253 | ```
254 |
255 |
256 | 3. `toObject` methods should consider the first parameter as untrusted user input. Write a query to find the first (i.e. index 0) parameter for `toObject` methods.
257 |
258 | Hint
259 |
260 | - Use `Method.getParameter(int index)` to get the i-th index parameter.
261 | - Create a query with a single CodeQL variable of type `ContentTypeHandlerToObject`.
262 |
263 |
264 |
265 | Solution
266 |
267 | ```ql
268 | from ContentTypeHandlerToObject toObjectMethod
269 | select toObjectMethod.getParameter(0)
270 | ```
271 |
272 |
273 | ### Section 3: Unsafe XML deserialization
274 |
275 | We have now identified (a) places in the program which receive untrusted data and (b) places in the program which potentially perform unsafe XML deserialization. We now want to tie these two together to ask: does the untrusted data ever _flow_ to the potentially unsafe XML deserialization call?
276 |
277 | In program analysis we call this a _data flow_ problem. Data flow helps us answer questions like: does this expression ever hold a value that originates from a particular other place in the program?
278 |
279 | We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two nodes.
280 |
281 | Consider this example Java method:
282 |
283 | ```c
284 | int func(int tainted) {
285 | int x = tainted;
286 | if (someCondition) {
287 | int y = x;
288 | callFoo(y);
289 | } else {
290 | return x;
291 | }
292 | return -1;
293 | }
294 | ```
295 | The data flow graph for this method will look something like this:
296 |
297 |
298 |
299 | This graph represents the flow of data from the tainted parameter. The nodes of graph represent program elements that have a value, such as function parameters and expressions. The edges of this graph represent flow through these nodes.
300 |
301 | CodeQL for Java provides data flow analysis as part of the standard library. You can import it using `semmle.code.java.dataflow.DataFlow`. The library models nodes using the `DataFlow::Node` CodeQL class. These nodes are separate and distinct from the AST (Abstract Syntax Tree, which represents the basic structure of the program) nodes, to allow for flexibility in how data flow is modeled.
302 |
303 | There are a small number of data flow node types – expression nodes and parameter nodes are most common.
304 |
305 | In this section we will create a data flow query by populating this template:
306 |
307 | ```ql
308 | /**
309 | * @name Unsafe XML deserialization
310 | * @kind problem
311 | * @id java/unsafe-deserialization
312 | */
313 | import java
314 | import semmle.code.java.dataflow.DataFlow
315 |
316 | // TODO add previous class and predicate definitions here
317 |
318 | class StrutsUnsafeDeserializationConfig extends DataFlow::Configuration {
319 | StrutsUnsafeDeserializationConfig() { this = "StrutsUnsafeDeserializationConfig" }
320 | override predicate isSource(DataFlow::Node source) {
321 | exists(/** TODO fill me in **/ |
322 | source.asParameter() = /** TODO fill me in **/
323 | )
324 | }
325 | override predicate isSink(DataFlow::Node sink) {
326 | exists(/** TODO fill me in **/ |
327 | /** TODO fill me in **/
328 | sink.asExpr() = /** TODO fill me in **/
329 | )
330 | }
331 | }
332 |
333 | from StrutsUnsafeDeserializationConfig config, DataFlow::Node source, DataFlow::Node sink
334 | where config.hasFlow(source, sink)
335 | select sink, "Unsafe XML deserialization"
336 | ```
337 |
338 | 1. Complete the `isSource` predicate using the query you wrote for [Section 2](#section2).
339 |
340 |
341 | Hint
342 |
343 | - You can translate from a query clause to a predicate by:
344 | - Converting the variable declarations in the `from` part to the variable declarations of an `exists`
345 | - Placing the `where` clause conditions (if any) in the body of the exists
346 | - Adding a condition which equates the `select` to one of the parameters of the predicate.
347 | - Remember to include the `ContentTypeHandlerToObject` class you defined earlier.
348 |
349 |
350 |
351 | Solution
352 |
353 | ```ql
354 | override predicate isSource(Node source) {
355 | exists(ContentTypeHandlerToObject toObjectMethod |
356 | source.asParameter() = toObjectMethod.getParameter(0)
357 | )
358 | }
359 | ```
360 |
361 |
362 | 1. Complete the `isSink` predicate by using the final query you wrote for [Section 1](#section1). Remember to use the `isXMLDeserialized` predicate!
363 |
364 | Hint
365 |
366 | - Complete the same process as above.
367 |
368 |
369 |
370 | Solution
371 |
372 | ```ql
373 | override predicate isSink(Node sink) {
374 | exists(Expr arg |
375 | isXMLDeserialized(arg) and
376 | sink.asExpr() = arg
377 | )
378 | }
379 | ```
380 |
381 |
382 | You can now run the completed query. You should find exactly one result, which is the CVE reported by our security researchers in 2017!
383 |
384 | For this result, it is easy to verify that it is correct, because both the source and sink are in the same method. However, for many data flow problems this is not the case.
385 |
386 | We can update the query so that it not only reports the sink, but it also reports the source and the path to that source. We can do this by making these changes:
387 | The answer to this is to convert the query to a _path problem_ query. There are five parts we will need to change:
388 | - Convert the `@kind` from `problem` to `path-problem`. This tells the CodeQL toolchain to interpret the results of this query as path results.
389 | - Add a new import `DataFlow::PathGraph`, which will report the path data alongside the query results.
390 | - Change `source` and `sink` variables from `DataFlow::Node` to `DataFlow::PathNode`, to ensure that the nodes retain path information.
391 | - Use `hasFlowPath` instead of `hasFlow`.
392 | - Change the select to report the `source` and `sink` as the second and third columns. The toolchain combines this data with the path information from `PathGraph` to build the paths.
393 |
394 | 3. Convert your previous query to a path-problem query.
395 |
396 | Solution
397 |
398 | ```ql
399 | /**
400 | * @name Unsafe XML deserialization
401 | * @kind path-problem
402 | * @id java/unsafe-deserialization
403 | */
404 | import java
405 | import semmle.code.java.dataflow.DataFlow
406 | import DataFlow::PathGraph
407 |
408 | predicate isXMLDeserialized(Expr arg) {
409 | exists(MethodAccess fromXML |
410 | fromXML.getMethod().getName() = "fromXML" and
411 | arg = fromXML.getArgument(0)
412 | )
413 | }
414 |
415 | /** The interface `org.apache.struts2.rest.handler.ContentTypeHandler`. */
416 | class ContentTypeHandler extends RefType {
417 | ContentTypeHandler() {
418 | this.hasQualifiedName("org.apache.struts2.rest.handler", "ContentTypeHandler")
419 | }
420 | }
421 |
422 | /** A `toObject` method on a subtype of `org.apache.struts2.rest.handler.ContentTypeHandler`. */
423 | class ContentTypeHandlerToObject extends Method {
424 | ContentTypeHandlerToObject() {
425 | this.getDeclaringType().getASupertype() instanceof ContentTypeHandler and
426 | this.hasName("toObject")
427 | }
428 | }
429 |
430 | class StrutsUnsafeDeserializationConfig extends DataFlow::Configuration {
431 | StrutsUnsafeDeserializationConfig() { this = "StrutsUnsafeDeserializationConfig" }
432 | override predicate isSource(DataFlow::Node source) {
433 | exists(ContentTypeHandlerToObject toObjectMethod |
434 | source.asParameter() = toObjectMethod.getParameter(0)
435 | )
436 | }
437 | override predicate isSink(DataFlow::Node sink) {
438 | exists(Expr arg |
439 | isXMLDeserialized(arg) and
440 | sink.asExpr() = arg
441 | )
442 | }
443 | }
444 |
445 | from StrutsUnsafeDeserializationConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
446 | where config.hasFlowPath(source, sink)
447 | select sink, source, sink, "Unsafe XML deserialization"
448 | ```
449 |
450 |
451 | For more information on how the vulnerability was identified, you can read the [blog disclosing the original problem](https://securitylab.github.com/research/apache-struts-vulnerability-cve-2017-9805).
452 |
453 | Although we have created a query from scratch to find this problem, it can also be found with one of our default security queries, [UnsafeDeserialization.ql](https://github.com/github/codeql/blob/master/java/ql/src/Security/CWE/CWE-502/UnsafeDeserialization.ql). You can see this on a [vulnerable copy of Apache Struts](https://github.com/m-y-mo/struts_9805) that has been [analyzed on LGTM.com](https://lgtm.com/projects/g/m-y-mo/struts_9805/snapshot/31a8d6be58033679a83402b022bb89dad6c6e330/files/plugins/rest/src/main/java/org/apache/struts2/rest/handler/XStreamHandler.java?sort=name&dir=ASC&mode=heatmap#x121788d71061ed86:1), our free open source analysis platform.
454 |
455 | ## What's next?
456 | - Read the [tutorial on analyzing data flow in Java](https://codeql.github.com/docs/codeql-language-guides/analyzing-data-flow-in-java/#analyzing-data-flow-in-java).
457 | - Go through more [CodeQL training materials for Java](https://codeql.github.com/docs/codeql-language-guides/codeql-for-java/).
458 | - Try out the latest CodeQL Java Capture-the-Flag challenge on the [GitHub Security Lab website](https://securitylab.github.com/ctf) for a chance to win a prize! Or try one of the older Capture-the-Flag challenges to improve your CodeQL skills.
459 | - Try out a CodeQL course on [GitHub Learning Lab](https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++)).
460 | - Read about more vulnerabilities found using CodeQL on the [GitHub Security Lab research blog](https://securitylab.github.com/research).
461 | - Explore the [open-source CodeQL queries and libraries](https://github.com/github/codeql), and [learn how to contribute a new query](https://github.com/github/codeql/blob/master/CONTRIBUTING.md).
462 |
--------------------------------------------------------------------------------
/javascript.md:
--------------------------------------------------------------------------------
1 | # CodeQL workshop for JavaScript: Finding unsafe calls to the jQuery `$` function
2 |
3 | - Analyzed language: JavaScript
4 | - Difficulty level: 200
5 |
6 | ## Overview
7 |
8 | - [Problem statement](#problemstatement)
9 | - [Setup instructions](#setupinstructions)
10 | - [Documentation links](#documentationlinks)
11 | - [Workshop](#workshop)
12 | - [Section 1: Finding calls to the jQuery `$` function](#section1)
13 | - [Section 2: Finding jQuery plugin options](#section2)
14 | - [Section 3: Finding XSS vulnerabilities](#section3)
15 |
16 | ## Problem statement
17 |
18 | jQuery is an extremely popular, but old, open source JavaScript library designed to simplify things like HTML document traversal and manipulation, event handling, animation, and Ajax. The jQuery library supports modular plugins to extend its capabilities. Bootstrap is another popular JavaScript library, which has used jQuery's plugin mechanism extensively. However, the jQuery plugins inside Bootstrap used to be implemented in an unsafe way that could make the users of Bootstrap vulnerable to cross-site scripting (XSS) attacks. This is when an attacker uses a web application to send malicious code, generally in the form of a browser side script, to a different end user.
19 |
20 | Four such vulnerabilities in Bootstrap jQuery plugins were fixed in [this pull request](https://github.com/twbs/bootstrap/pull/27047), and each was assigned a CVE.
21 |
22 | The core mistake in these plugins was the use of the omnipotent jQuery `$` function to process the options that were passed to the plugin. For example, consider the following snippet from a simple jQuery plugin:
23 |
24 | ```javascript
25 | let text = $(options.textSrcSelector).text();
26 | ```
27 |
28 | This plugin decides which HTML element to read text from by evaluating `options.textSrcSelector` as a CSS-selector, or that is the intention at least. The problem in this example is that `$(options.textSrcSelector)` will execute JavaScript code instead if the value of `options.textSrcSelector` is a string like `"
".` The values in `options` cannot always be trusted.
29 |
30 | In security terminology, jQuery plugin options are a **source** of user input, and the argument of `$` is an XSS **sink**.
31 |
32 | The pull request linked above shows one approach to making such plugins safer: use a more specialized, safer function like `$(document).find` instead of `$`.
33 | ```javascript
34 | let text = $(document).find(options.textSrcSelector).text();
35 | ```
36 |
37 | In this challenge, we will use CodeQL to analyze the source code of Bootstrap, taken from before these vulnerabilities were patched, and identify the vulnerabilities.
38 |
39 | ## Setup instructions for Visual Studio Code
40 |
41 | To take part in the workshop you will need to follow these steps to get the CodeQL development environment set up:
42 |
43 | 1. Install the Visual Studio Code IDE.
44 | 1. Download and install the [CodeQL extension for Visual Studio Code](https://help.semmle.com/codeql/codeql-for-vscode.html). Full setup instructions are [here](https://help.semmle.com/codeql/codeql-for-vscode/procedures/setting-up.html).
45 | 1. [Set up the starter workspace](https://help.semmle.com/codeql/codeql-for-vscode/procedures/setting-up.html#using-the-starter-workspace).
46 | - **Important**: Don't forget to `git clone --recursive` or `git submodule update --init --remote`, so that you obtain the standard query libraries.
47 | 1. Open the starter workspace: File > Open Workspace > Browse to `vscode-codeql-starter/vscode-codeql-starter.code-workspace`.
48 | 1. Download the [esbena_bootstrap-pre-27047_javascript CodeQL database](https://github.com/githubsatelliteworkshops/codeql/releases/download/v1.0/esbena_bootstrap-pre-27047_javascript.zip).
49 | 1. Unzip the database.
50 | 1. Import the unzipped database into Visual Studio Code:
51 | - Click the **CodeQL** icon in the left sidebar.
52 | - Place your mouse over **Databases**, and click the + sign that appears on the right.
53 | - Choose the unzipped database directory on your filesystem.
54 | 1. Create a new file, name it `UnsafeDollarCall.ql`, save it under `codeql-custom-queries-javascript`.
55 |
56 | ## Documentation links
57 | If you get stuck, try searching our documentation and blog posts for help and ideas. Below are a few links to help you get started:
58 | - [Learning CodeQL](https://help.semmle.com/QL/learn-ql)
59 | - [Learning CodeQL for JavaScript](https://help.semmle.com/QL/learn-ql/javascript/ql-for-javascript.html)
60 | - [Using the CodeQL extension for VS Code](https://help.semmle.com/codeql/codeql-for-vscode)
61 |
62 | ## Workshop
63 |
64 | The workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step.
65 |
66 | Each step has a **Hint** that describe useful classes and predicates in the CodeQL standard libraries for JavaScript and keywords in CodeQL. You can explore these in your IDE using the autocomplete suggestions (`Ctrl+Space`) and jump-to-definition command (`F12`).
67 |
68 | Each step has a **Solution** that indicates one possible answer. Note that all queries will need to begin with `import javascript`, but for simplicity this may be omitted below.
69 |
70 | ### Finding calls to the jQuery `$` function
71 |
72 | 1. Find all function call expressions, such as `alert("hello world")` and `speaker.sayHello("world")`.
73 |
74 | Hint
75 |
76 | A function call is called a `CallExpr` in the CodeQL JavaScript library.
77 |
78 |
79 | Solution
80 |
81 | ```ql
82 | from CallExpr dollarCall
83 | select dollarCall
84 | ```
85 |
86 |
87 | 1. Identify the expression that is used as the first argument for each call, , such as `alert()` and `speaker.sayHello()`.
88 |
89 |
90 | Hint
91 |
92 | - Add another variable to your `from` clause. This can be named `dollarArg` and have type `Expr`.
93 | - Add a `where` clause.
94 | - `CallExpr` has a predicate `getArgument(int)` to find the argument at a 0-based index.
95 |
96 |
97 | Solution
98 |
99 | ```ql
100 | from CallExpr dollarCall, Expr dollarArg
101 | where dollarArg = dollarCall.getArgument(0)
102 | select dollarArg
103 | ```
104 |
105 |
106 | 1. Filter your results to only those calls to a function named `$`, such as `$("hello world")` and `speaker.$("world")`.
107 |
108 |
109 | Hint
110 |
111 | - `CallExpr` has a predicate `getCalleeName()` to find the name of the function being called.
112 | - Use the `and` keyword to add conditions to your query.
113 | - Use the `=` operator to assert that two values are equal.
114 |
115 | Solution
116 |
117 | ```ql
118 | from CallExpr dollarCall, Expr dollarArg
119 | where
120 | dollarArg = dollarCall.getArgument(0) and
121 | dollarCall.getCalleeName() = "$"
122 | select dollarArg
123 | ```
124 |
125 |
126 | 1. So far we have looked for the function name `$`. Are there other ways of calling the jQuery `$` function? Perhaps the CodeQL library can handle these for us?
127 |
128 | The CodeQL standard library for JavaScript has a built-in predicate `jquery()` to describe references to `$`. Expand the hint for details, and modify your query to use it.
129 |
130 | Hint
131 |
132 | - Calling the predicate `jquery()` returns all values that refer to the `$` function.
133 | - To find all calls to this function, use the predicate `getACall()`.
134 | - Notice that when you call `jquery()`, `getACall()`, and `getAnArgument()` in succession, you get return values of type `DataFlow::Node`, not `Expr`. These are **data flow nodes**. They describe a part of the source program that may have a value, and let us do more complex reasoning about this value. We'll learn more about these in the next section.
135 | - You can change your `dollarArg` variable to have type `DataFlow::Node`, or convert the data flow node back into an `Expr` using the predicate `asExpr()`.
136 |
137 | Solution
138 |
139 | ```ql
140 | from Expr dollarArg
141 | where
142 | dollarArg = jquery().getACall().getArgument(0).asExpr()
143 | select dollarArg
144 | ```
145 |
146 | OR
147 |
148 | ```ql
149 | from DataFlow::Node dollarArg
150 | where
151 | dollarArg = jquery().getACall().getArgument(0)
152 | select dollarArg
153 | ```
154 |
155 |
156 |
157 | ### Finding jQuery plugin options
158 | jQuery plugins are usually defined by assigning a value to a property of the `$.fn` object:
159 |
160 | ```javascript
161 | $.fn.copyText = function() { ... } // this function is a jQuery plugin
162 | ```
163 |
164 | In this step, we will find such plugins, and their options.
165 |
166 | Consider creating a new query for these next few steps, or commenting out your earlier solutions and using the same file. We will use the earlier solutions again in the next section.
167 |
168 | 1. You have already seen how to find references to the jQuery `$` function. Now find all places in the code that read the property `$.fn`.
169 |
170 | Hint
171 | - Declare a new variable of type `DataFlow::Node` to hold the results.
172 | - Notice that `jQuery()` returns a value of type `DataFlow::SourceNode`. Source nodes are places in the program that introduce a new value, from which the flow of data may be tracked.
173 | - `DataFlow::SourceNode` has a predicate named `getAPropertyRead(string)`, which finds all reads of a particular property on the same object. The string argument is the name of the property.
174 |
175 |
176 | Solution
177 |
178 | ```ql
179 | from DataFlow::Node n
180 | where n = jquery().getAPropertyRead("fn")
181 | select n
182 | ```
183 |
184 |
185 | 1. Find the functions that are assigned to a property of `$.fn`. These are jQuery plugins.
186 |
187 | Remember the previous example:
188 | ```javascript
189 | $.fn.copyText = function() { ... } // this function is a jQuery plugin
190 | ```
191 |
192 | There might be some variation in how this code is written. For example, we might see intermediate assignments to local variables:
193 |
194 | ```javascript
195 | let fn = $.fn
196 | let f = function() { ... } // this function is a jQuery plugin
197 | fn.copyText = f
198 | ```
199 |
200 | The use of intermediate variables and nested expressions are typical source code examples that require use of **local data flow analysis** to detect.
201 |
202 | Data flow analysis helps us answer questions like: does this expression ever hold a value that originates from a particular other place in the program?
203 |
204 | We have already encountered **data flow nodes**, described by the `DataFlow::Node` CodeQL class. They are places in the program that have a value. They are returned by useful predicates like `jquery()` in the library.
205 |
206 | These nodes are separate and distinct from the AST (Abstract Syntax Tree, which represents the basic structure of the program) nodes, to allow for flexibility in how data flow is modeled.
207 |
208 | We can visualize the data flow analysis problem as one of finding paths through a directed graph, where the nodes of the graph are data flow nodes, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two nodes.
209 |
210 | The CodeQL JavaScript data flow library is very expressive.
211 | It has several classes that describe different places in the program that can have a value. We have seen `SourceNode`s; there are many other forms such as `ValueNode`s, `FunctionNode`s, `ParameterNode`s, and `CallNode`s. You can find our more in the [documentation](https://help.semmle.com/QL/learn-ql/javascript/dataflow.html).
212 |
213 | When we are looking for the flow of information to or from these nodes within a single function or scope, this is called **local data flow analysis**. The CodeQL library has several predicates available on different types of data flow node that reason about local data flow.
214 |
215 | You have already seen one such predicate: `SourceNode.getAPropertyRead()`. To complete this step of the workshop, look at the hint for another useful predicate.
216 |
217 |
218 | Hint
219 |
220 | - `DataFlow::SourceNode` has a predicate named `getAPropertySource()`, which finds a source node whose value is stored in a property of this node.
221 | - In the previous step, we used `getAPropertyRead(string)` to identify the source node `$.fn`. Now try to find a value stored in a property of this source node `$.fn`.
222 |
223 |
224 |
225 | Solution
226 |
227 | ```ql
228 | from DataFlow::Node plugin
229 | where plugin = jquery().getAPropertyRead("fn").getAPropertySource()
230 | select plugin
231 | ```
232 |
233 |
234 | 1. Find the last parameter of the jQuery plugin functions that you identified in the previous step. These parameters are the plugin options.
235 |
236 |
237 | Hint
238 |
239 | - Modify your `from` clause so that the variable that describes that jQuery plugin is of type `DataFlow::FunctionNode`. As the name suggests, this is a data flow node that refers to a function definition.
240 | - `DataFlow::FunctionNode` has a predicate named `getLastParameter()`.
241 | - If you want to add a new variable to describe the parameter, it can be of type `DataFlow::ParameterNode`.
242 |
243 |
244 |
245 | Solution
246 |
247 | ```ql
248 | from DataFlow::FunctionNode plugin, DataFlow::ParameterNode optionsParam
249 | where
250 | plugin = jquery().getAPropertyRead("fn").getAPropertySource() and
251 | optionsParam = plugin.getLastParameter()
252 | select plugin, optionsParam
253 | ```
254 |
255 |
256 | ### Putting it all together
257 |
258 | We have now identified (a) places in the program which receive jQuery plugin options (which may be untrusted data) and (b) places in the program which are passed to the jQuery `$` function and may be interpreted as HTML. We now want to tie these two together to ask: does the untrusted data from a jQuery plugin option ever _flow_ to the potentially unsafe `$` call?
259 |
260 | This is also a data flow problem. However, it is larger in scope that the problems we have tackled so far, because the plugin options and the `$` call may be in different functions. We call this a **global data flow** problem.
261 |
262 | In this section we will create a _path problem_ query capable of looking for global data flow, by populating this template:
263 |
264 | ```ql
265 | /**
266 | * @name Cross-site scripting vulnerable plugin
267 | * @kind path-problem
268 | * @id js/xss-unsafe-plugin
269 | */
270 | import javascript
271 | import DataFlow::PathGraph
272 |
273 | class Config extends TaintTracking::Configuration {
274 | Config() { this = "Config" }
275 | override predicate isSource(DataFlow::Node source) {
276 | exists(/** TODO fill me in from Section 2 **/ |
277 | source = /** TODO fill me in from Section 2 **/
278 | )
279 | }
280 | override predicate isSink(DataFlow::Node sink) {
281 | sink = /** TODO fill me in from Section 1 **/
282 | }
283 | }
284 |
285 | from Config config, DataFlow::PathNode source, DataFlow::PathNode sink
286 | where config.hasFlowPath(source, sink)
287 | select sink, source, sink, "Potential XSS vulnerability in plugin."
288 | ```
289 |
290 | 1. Complete the `isSource` predicate using the query you wrote for [Section 2](#section2).
291 |
292 |
293 | Hint
294 |
295 | - You can translate from a query clause to a predicate by:
296 | - Converting the variable declarations in the `from` part to the variable declarations of an `exists`
297 | - Placing the `where` clause conditions (if any) in the body of the exists
298 | - Adding a condition which equates the `select` to one of the parameters of the predicate.
299 | - Remember that the source of untrusted data is the jQuery plugin options parameter.
300 |
301 |
302 |
303 | Solution
304 |
305 | ```ql
306 | override predicate isSource(DataFlow::Node source) {
307 | exists(DataFlow::FunctionNode plugin |
308 | plugin = jquery().getAPropertyRead("fn").getAPropertySource() and
309 | source = plugin.getLastParameter()
310 | )
311 | }
312 | ```
313 |
314 |
315 | 1. Complete the `isSink` predicate by using the query you wrote for [Section 1](#section1).
316 |
317 | Hint
318 |
319 | - Complete the same process as above.
320 | - We already found a `DataFlow::Node` in Section 1 as the result of calling `jquery()` and predicates on it.
321 | - Remember that the first argument of a call to `$` is a sink for XSS vulnerabilities.
322 |
323 |
324 |
325 | Solution
326 |
327 | ```ql
328 | override predicate isSink(DataFlow::Node sink) {
329 | sink = jquery().getACall().getArgument(0)
330 | }
331 | ```
332 |
333 |
334 | 1. You can now run the completed query. This should find 5 results on the unpatched Bootstrap codebase.
335 |
336 |
337 | Completed query
338 |
339 | ```ql
340 | /**
341 | * @name Cross-site scripting vulnerable plugin
342 | * @kind path-problem
343 | * @id js/xss-unsafe-plugin
344 | */
345 |
346 | import javascript
347 | import DataFlow::PathGraph
348 |
349 | class Configuration extends TaintTracking::Configuration {
350 | Configuration() { this = "XssUnsafeJQueryPlugin" }
351 |
352 | override predicate isSource(DataFlow::Node source) {
353 | exists(DataFlow::FunctionNode plugin |
354 | plugin = jquery().getAPropertyRead("fn").getAPropertySource() and
355 | source = plugin.getLastParameter()
356 | )
357 | }
358 |
359 | override predicate isSink(DataFlow::Node sink) {
360 | sink = jquery().getACall().getArgument(0)
361 | }
362 | }
363 |
364 | from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
365 | where cfg.hasFlowPath(source, sink)
366 | select sink, source, sink, "Potential XSS vulnerability in plugin."
367 | ```
368 |
369 |
370 | We have created a query from scratch to find this problem. A production version of this query can be found as part of the default set of CodeQL security queries: [UnsafeJQueryPlugin.ql](https://github.com/github/codeql/blob/master/javascript/ql/src/Security/CWE-079/UnsafeJQueryPlugin.ql). You can [see the results on a vulnerable copy of Bootstrap](https://lgtm.com/projects/g/esbena/bootstrap-pre-27047?mode=tree&ruleFocus=1511421786841) that has been analyzed on LGTM.com, our free open source analysis platform.
371 |
372 | ## What's next?
373 | - Read the [tutorial on analyzing data flow in JavaScript and TypeScript](https://help.semmle.com/QL/learn-ql/javascript/dataflow.html).
374 | - Try out the latest CodeQL Capture-the-Flag challenge on the [GitHub Security Lab website](https://securitylab.github.com/ctf) for a chance to win a prize! Or try one of the older Capture-the-Flag challenges to improve your CodeQL skills.
375 | - Try out a CodeQL course on [GitHub Learning Lab](https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++)).
376 | - Read about more vulnerabilities found using CodeQL on the [GitHub Security Lab research blog](https://securitylab.github.com/research).
377 | - Explore the [open-source CodeQL queries and libraries](https://github.com/github/codeql), and [learn how to contribute a new query](https://github.com/github/codeql/blob/master/CONTRIBUTING.md).
378 |
379 | ## Acknowledgements
380 |
381 | This is a reduced version of a Capture-the-Flag challenge devised by @esbena, available at https://securitylab.github.com/ctf/jquery. Try out the full version! Thanks to our moderators for valuable feedback on the workshop.
382 |
--------------------------------------------------------------------------------
/satellite-2020-workshops-codeql.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/githubsatelliteworkshops/codeql/70615707c0f8ed06000a89d1d0915d866027f347/satellite-2020-workshops-codeql.pdf
--------------------------------------------------------------------------------